Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Research article

Large-scale analysis of structural, sequence and thermodynamic characteristics of A-to-I RNA editing sites in human Alu repeats

Yoav Kleinberger and Eli Eisenberg*

Author Affiliations

Raymond and Beverly Sackler School of Physics and Astronomy, Tel Aviv University, Tel Aviv, Israel

For all author emails, please log on.

BMC Genomics 2010, 11:453  doi:10.1186/1471-2164-11-453


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2164/11/453


Received:8 January 2010
Accepted:28 July 2010
Published:28 July 2010

© 2010 Kleinberger and Eisenberg; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Alu repeats in the human transcriptome undergo massive adenosine to inosine RNA editing. This process is selective, as editing efficiency varies greatly among different adenosines. Several studies have identified weak sequence motifs characterizing the editing sites, but these alone do not account for the large diversity observed.

Results

Here we build a dataset of 29,971 editing sites and use it to characterize editing preferences. We focus on structural aspects, studying the double-stranded RNA structure of the Alu repeats, and show the editing frequency of a given site to depend strongly on the micro-structure it resides in. Surprisingly, we find that interior loops, and especially the nucleotides at their edges, are more likely to be edited than helices. In addition, the sequence motifs characterizing editing sites vary with the micro-structure. Finally, we show that thermodynamic stability of the site is important for its editing.

Conclusions

Analysis of a large dataset of editing events reveals more information on sequence and structural motifs characterizing the A-to-I editing process

Background

RNA Editing is a post-transcriptional modification of mRNA [1-4], which may result in the synthesis of proteins that are not directly encoded in the genome. There are two major types of RNA Editing in mammals, both of which occur via deamination of a base, either cytidine (which is turned into uridine) or adenosine (which turns into inosine). Inosine is read by the ribosome (and sequencers) as guanosine, and thus A → I modifications at the mRNA level translate into an A → G changes at the genetic code level. In this work we focus exclusively on A-to-I RNA Editing, which is catalyzed by enzymes from the ADAR (Adenosine Deaminases that Act on RNA) family. ADARs are double-stranded RNA (dsRNA) binding proteins, and thus dsRNA is a prerequisite for A-to-I editing [1,2].

RNA Editing is a fine-tuning mechanism, capable of changing only a few nucleotides. Both edited and unedited variants of the same transcript may be present in the cell. A-to-I editing is known to be vital in vertebrates, and important for normal life in invertebrates. In Drosophila, knocking out ADAR activity causes the flies to exhibit defects in locomotion and mating and to suffer tremors [5]. ADAR knockout C. elegans worms exhibit chemotaxis defects [6]. In mice, knocking out ADAR1 causes embryonic death and defects in erythropoiesis [7,8]. ADAR2 -/- mice die shortly after birth and are increasingly seizure prone after postnatal day 12 [9]. The lethal phenotype is accounted for by a single editing site resulting in a single amino acid substitution in the gluR-B gene.

In addition, alteration of A → I editing has been ascribed to several pathological conditions [10], mainly to neuro-psychiatric conditions such as amyotrophic lateral sclerosis (ALS) [11], epilepsy [9,12], major depression disorder [13-15], and glioblastoma multiforme [16]. Reduced A-to-I editing levels have been linked to cancer in various tissues, most strongly to brain tumors. A correlation between the reduction of ADAR3 and the tumor aggressiveness was observed, and overexpression of ADAR1 and ADAR2 resulted in decreased proliferation rate of the glioblastoma multiforme cell-lines [17].

Isolating inosine-containing transcripts from C. elegans and human brain, it has been noticed that most A-to-I editing occurs in non coding regions [18]. Genome-wide bioinformatic searches for A-to-I editing sites have enabled the identification of abundant A-to-I editing in the transcriptome of several vertebrates [19-24]. It was found that editing occurs mainly within repetitive elements. These repetitive elements are likely to base-pair with a neighboring similar element and form the dsRNA structure which is the target of the ADAR enzymes. In particular, virtually all A-to-I editing events in human occur specifically within Alu repeats.

The Alus are a particular set of primate-specific retrotransposons, approximately 280 nucleotides in length. The Alus are the most abundant of all transposable elements in primates, making up more than 10% of the human genome, with some 1.1 million copies. Recent studies [21,23] have demonstrated that the frequency of A-to-I editing in human is much higher than in mouse, rat, chicken and fly. This has to do with the abundance and low diversity of the Alu elements as compared to similar elements in other genomes [24]: since Alu is so common in the human genome, there is a high probability that an Alu and a counterpart, oppositely oriented Alu, exist nearby and are transcribed together. When the RNA transcript folds, these two Alus form a helix, thus becoming a target for the dsRNA binding ADARs.

The physiological significance of A-to-I editing within non-coding repetitive elements is still elusive. Several possible mechanisms have been suggested through which editing of a non-coding repetitive element might affect the fate of a transcript: editing may result in insertion or elimination of a splice site, and may theoretically lead to the alteration of transcriptional start and stop codons [25]. Hyperedited inosine-containing RNAs might be cleaved at specific sites [26-29]. In addition, inosine containing mRNAs were also shown to be retained in the nucleus, suggesting an additional regulatory role for A-to-I editing [30,31]. However, the validity and scope of this last mechanism has been debated recently [32,33]. Finally, while the molecular significance is yet unclear, editing within Alu repeats was shown to be altered in cancerous tissues [17].

A-to-I editing is characterized by a puzzling specificity and selectivity in the adenosines which are edited. In some substrates, e.g. the AMPA receptor gluR-B subunit in mice [34] and the E1 sites within an Alu repeat in the NARF gene [25], RNA Editing is extremely efficient, editing 100% of transcripts at a specific adenosine. In others, such as most of the sites in Alu repeats, a seemingly random editing pattern is observed, where many adenosines are targeted, with varying editing efficiency. However, careful analysis reveals that editing in Alu repeats is also highly reproducible: the variability among healthy individuals in editing level at a given site within a specific Alu repeat is much lower than the site-to-site differences. Sequence preferences for ADARs have been previously documented. C and T are overrepresented at the nucleotide 5' to the editing site, while G is underrepresented. At the nucleotide 3' to the site, G is significantly overrepresented [19,35-39]. These motifs are too weak, however, to fully characterize A-to-I editing. Therefore, the question still stands: what controls the editing level at each given site? ADARs bind to the RNA via double-stranded RNA binding motifs. Thus, dsRNA is a necessity for A-to-I editing. Indeed, it has been shown for the highly selective R/G editing site within the hairpin of the glutamate receptor subunits mRNAs, that the identities of bases in the helical region are evolutionarily conserved, while the bases in the nonhelical part of the hairpin covary so as to maintain their non-helical structure [40]. This distinctive feature demonstrates the importance of the secondary structure to the phenomenon of RNA Editing.

The internal structure of the dsRNA is expected to control the editing efficiency [41]. For example, it has been shown experimentally that internal loops may effectively be equivalent to helix termini in terms of editing efficiency [42]. Thus, internal loops along dsRNA, if large enough, may act as delimiters separating a large dsRNA into many small helices. Since ADARs deaminate fewer A's in shorter helices, their existence (along with the sequence preferences of the ADARs) might be a means to increase the specificity of editing. It is thus plausible that more features of the secondary structure of an RNA molecule play an important role in determining the specificity of adenosine deamination of an ADAR substrate.

In this paper we will characterize the properties of A-to-I editing sites in terms of their secondary structure properties, their sequence properties, and their thermodynamic properties. We describe the building of a database of MFOLD[43] foldings used to query these properties, and then display and discuss the results of those queries.

Results and Discussion

Structural Analysis

We first look at the editing frequency for each substructure type (see Table 1 and fig. 1). We compare a "test set" of A-to-I Editing sites, which we denote by E1, and a control set of sites not known to be edited, denoted by E0. The E1 and E0 sets are defined with precision in the Methods section. Interestingly, while the existence of a helix is well known to be a prerequisite for editing, the overall frequency of E1 is actually more than two fold lower in helices (0.044) than in interior loops (0.091). As the overwhelming majority of E1 sites reside in helices and interior loops, we focus henceforth on these two substructures only. For clarity, we emphasize here that by "interior loop" we mean only the unpaired nucleotides that form the loop's constituent strands.

Table 1. Editing frequency and average structure size for the various substructures.

thumbnailFigure 1. Secondary structure substructures. Bold lines indicate those nucleotides formally included in a given substructure.

Table 1 also suggests length dependence. The editing prevalence as a function of length is given in figs. 2 and 3 (henceforth, error bars represent 95% confidence intervals. Also, some graphs of integer-valued variables have non-integer entries due to data binning). Clearly, longer helices are more likely to be edited, while longer strands of interior loops are less likely to be edited. In addition, the length of the opposite strand (the one the editing site does not reside in) also affects the editing frequency in an interior loop: as shown in fig. 4, symmetric loops are more likely to be edited.

thumbnailFigure 2. E1 frequency vs. helix length.

thumbnailFigure 3. E1 frequency vs. interior-loop strand length.

thumbnailFigure 4. E1 frequency decreases with asymmetry. Editing frequency is presented for sites within interior-loop strands of lengths 1 (circles), 2 (squares), 3 (diamonds), 4 (triangles), as a function of the asymmetry of the loop. Asymmetry is defined as the difference between the length of the strand opposing the editing site and the edited strand length. Frequencies are normalized by the averaged editing frequency for sites having same strand length, regardless of opposite strand length.

Furthermore, we study the effect of the location of the specific nucleotide within its respective substructure. We define cePos as the distance of the site (in nucleotides) from the closest edge of the substructure it is in (cePos = 0 means the very edge of a substructure). Figs. 5 and 6 present the frequency of E1 sites as a function of cePos. For helices, one observes a general trend of enhancement of editing as a site lies deeper in the helix. For interior loops, however, there is dramatic depletion of E1 for cePos > 0. In fact, it should be noted that 91% of edited sites in interior loops lie at the very edge of the loop, i.e. cePos = 0. Most of these are in fact a single mismatch within an almost perfect helix (i.e., opposite strand length is also one nucleotide). Such mismatches were already implicated as preferred targets of ADARs, as previous in-vitro data as well as bioinformatic work indicate that AC mismatches are more favorable substrates than A-T pairs [19,44]. However, it is worthwhile noticing that our analysis shows this trend to persist even for longer interior loops: interior loop strands of length up to five nucleotides are more likely to be edited than the average site in a helix (see fig. 3).

thumbnailFigure 5. E1 frequency vs. cePos for helices.

thumbnailFigure 6. E1 frequency vs. cePos for interior loops.

For these cePos = 0 sites, there is a significant (p < 2.2e-16) effect to the direction of the nearest neighboring helix: A-to-I editing frequency is 0.068 for sites with a helix only in the upstream site, 0.094 for sites with a helix only in the downstream site, and 0.13 for sites with helices on both sides.

The above results hold when controlling for the total length of the substructure: we compared E1 and E0 sites for helices of a given length, and for loops of a given size. The resulting trend was the same: for E1 sites in helices cePos is larger than for E0 sites, whereas in interior loops the connection is reversed. Other location variables tested, such as the position relative to the middle of the substructure, or relative to the 5' end, did not result in noticeable results.

Sequence Analysis

We start with the nucleotide opposite of the editing site. For helices, it is clear what this means: the "opposite" nucleotide of a site is the nucleotide that pairs with that site (and is therefore always T). We expand this idea, however, to sites at the edges of interior loops (i.e., having cePos = 0): for these sites on the most 5' (3') nucleotide of the loop-strand, the opposite nucleotide is the most 3' (5') site of the other strand in the loop. If the site is the only one on its strand, and the opposite strand has more than one nucleotide, the opposite nucleotide is undefined. We shall refer to the opposite nucleotide as opNuc for short.

There is a very strong enrichment for sites with C on the opposite site: we looked at the frequency of E1 for sites with a given opNuc, and obtained a frequency of 18.5% for C, whereas for A the frequency was 5.1% and for G, 3.7%. This is consistent with (but more pronounced than) the data presented in [19-22,37,44].

Next we look at the statistics of the nucleotides upstream and downstream of the A-to-I editing sites. In order to avoid biases due to the underlying nucleotide statistics in Alu repeats we do not look at the raw distribution of nucleotides but rather at the enrichment factor, i.e. how much is the editing frequency increased (compared to the average within the respective substructure) when the neighboring site is any specific nucleotide. The enrichment factors are presented in figs 7, 8 and 9 for the two immediate neighbor nucleotides separately, as well as for the joint variable composed of both upstream and downstream neighbors. Overall, the profiles found are similar to those seen in previous large-scale studies of editing [19-22,24,45]: T is most preferred upstream and is not preferred downstream, while G is most preferred downstream and least preferred upstream (in both helices and loops). However, we do find a significant (p < 1.1e-16 for all comparisons) difference between the profiles for helices and loops. For example, the preference for an upstream T is stronger in helices, whereas the preference for a downstream G is stronger in interior loops.

thumbnailFigure 7. Enrichment factors for upstream nucleotide in helices and interior loops.

thumbnailFigure 8. Enrichment factors for downstream nucleotide in helices and interior loops.

thumbnailFigure 9. Enrichment factors for joint upstream, downstream nucleotides in helices and interior loops.

We also calculated the enrichment factors for the joint variable composed of the site's upstream neighbor, downstream neighbor, and opNuc. The results are displayed in Table 2.

Table 2. Frequency of E1 and enrichment factors for the joint distribution of upstream neighbor, downstream neighbor and opNuc.

In addition, we searched for enrichment in the extended neighborhood of the editing sites, looking at 30 neighboring nucleotides at both sides of the site (upN refers to the nucleotide N sites upstream to the editing site, and dnN refers to the nucleotide N sites downstream to the editing site). Almost all neighbors show a significantly different nucleotide distribution around edited sites, see Tables 3 (helices) and 4 (interior loops). The most significant differences (largest χ2 scores) are observed for neighbors up1, up2, up7 and dn18 in helices and up1, dn1, up2 and up3 in interior loops. We note that while almost all 60 neighbors tested show statistically significant difference, it is hard to tell whether these differences are due to ADARs preference or rather stem from editing hot spots within the Alu. We also present the enrichment factors for seven positions surrounding the editing sites which were reported to show preferences to specific nucleotides when surrounding ADAR2 editing sites [41]. As seen in Table 5, the patterns observed here for Alu editing are somewhat different: for example, locations dn10 and dn13 seem to favor G in contrast to the opposite trend reported in [41] for ADAR2 sites. The differences might be due to the much larger sample we study here. Additionally, it is also possible that editing sites in the coding region, mostly having a functional role, have different characteristics than the ones in Alu repeats. However, these differences could also result from differences between the profiles of ADAR1 and ADAR2. While the sample of editing sites studied in [41] is biased towards ADAR2 targets, the sample studied here, coming from a wide range of tissues, represents a different mix of the two enzymes, with larger weight of ADAR1. Moreover, the different splice-variants of the ADARs possibly have varying editing efficiencies and site preferences. The mix of these variants occurring in our in-vivo sample, could also lead to slight variations in the preferences observed as compared to results of in-vitro studies.

Table 3. Comparison of nucleotide distribution for sites in the vicinity of E1 and E0 sites in helices.

Table 4. Comparison of nucleotide distribution for sites in the vicinity of E1 and E0 sites in interior loops.

Table 5. Nucleotide enrichment for several locations neighboring an editing sites

Thermodynamic Calculations

Finally, we study the effect of thermodynamic stability on editing efficiency. For each genomic neighborhood, we look at the thermodynamic average over all the low free-energy structures. The laws of statistical mechanics give us the probability that the RNA is in a specific secondary structure n,

(1)

where T is the temperature in degrees Kelvin, kB is Boltzmann's constant and Z is defined by the sum

(2)

where the label n runs over all available foldings, and Gn are the respective free energies. In practice, we only use those folds generated by MFOLD which are expected to be all folds relevant at human-body temperature. We may now, for example, calculate the probability of some particular site to be in a helix,

(3)

where is the indical function, defined by

(4)

Similarly, one may calculate the probabilities for all other substructures.

Let S denote the set of possible substructures,

We define a site's structural entropy to be

where is the frequency of site i being in substructure of type s. This entropy is a measure of the thermodynamic volatility of the site's substructure: if a site is always in the same substructure (e.g. the site is always in a helix), it will have zero structural entropy. If, however, the site's substructure fluctuates, for example between a helical structure and a loop structure, it will have higher structural entropy. The structural entropy of a site with equal probability to be in two difference substructures is ln(2) = 0.7. The highest possible structural entropy is of a site which spends equal time in each of the possible substructures. Figs. 10 and 11 show the frequency of E1 as a function of the structural entropy, for sites whose lowest free-energy structure is helix or interior-loop separately. Interestingly, A-to-I editing is enriched for sites with low structural entropy, i.e. having a well-defined low energy micro-structure. A wobbling state, fluctuating between two or more possible structures is less well edited. This holds regardless of the ground-state structure, but the effect is stronger for interior loops: sites with a well-defined interior loops structure are twice more likely to be edited compared with sites whose ground state structure is also an interior loop but having even 1% probability to be in other structures.

thumbnailFigure 10. E1 frequency vs. structural entropy for ground state helix.

thumbnailFigure 11. E1 frequency vs. structural entropy for ground state interior-loop.

Analysis of a large dataset of secondary structures of putatively edited Alu repeats reveals that structural motifs are indeed important in determining the A-to-I editing efficiency of a given site. Most notably, we highlight the strong preference for editing of adenosines within short symmetrical internal loops. Moreover, the microstructure also has modest but noticeable effect on the cis-preferences of the ADARs. Long perfect dsRNA duplexes are often considered to be the best target for editing by ADARs. Here we find that sites adjacent to the edge of helices (cePos = 0) are even more efficiently edited. Averaged over our whole database, adenosines deep within (cePos > 10) long (> 30 bp) perfect helices are indeed edited more efficiently than the average adenosine in a helix - we find 1625 such sites, with editing frequency 8.2%, compared to only 4.4% for the average helix site. However, this is still lower than the average frequency for interior loops, 9.1%. Moreover, focusing on single A-C mismatches within a helix (i.e. cePos = 0 sites having neighboring helices on both sides and C as the opposite nucleotide) raises the frequency to 19%. Finally, choosing also the optimal neighbors, i.e. T upstream and G downstream, raises the editing frequency as observed in our database to 37% ! We stress again that these frequencies should not be regarded as the true editing efficiency, but rather as a relative measure. Yet, one is able to conclude that the best way to engineer an efficient editing site is not to put it in a long perfect duplex, but rather to be in a single mismatch within a duplex.

Interestingly, the 100% edited E1 site in the NARF gene [25], fits nicely with these engineering rules - it is a cePos = 0 site in a symmetric loop, with C opposite to it and T and G in the upstream and downstream sites, respectively. However, the strand length there is 3 and not the optimal 1. An editing site that fully adheres to the above "rules" is the amber/W one of the hepatitis delta virus antigenome (genotype I) [46]. This site is critical for the virus to assemble viral particles and to be infectious [47]. Given the high adaptivity of viruses, it is not surprising to find that this site fits with all of the above: it is located in a single A-C mismatch within a helix (cePos = 0 and loop length = 2), and has T and G as its immediate neighbors.

However, the Q/R site in GluRB does not fit to our observations. It lies within a rather long (17 bp) helix, with cePos = 5, with C (rather than the optimal T) upstream and G downstream [48]. Yet, this site is also 100% edited. Apparently, there is still much more to learn about the characteristics of editing by ADARs, beyond the information presented in the present study. One possible explanation is that this site in known to be edited only by ADAR2 [49]. The two editing enzymes ADAR1 and ADAR2 are known to have overlapping, but distinct, preferences [36-38,50,51]. However, our approach does not allow us to distinguish between them. It was recently shown that editing of mouse B1 and B2 SINEs is mediated by both enzymes [39]. Some sites within these repeats are ADAR1 specific, some are ADAR2 specific and some are edited by both. It is not yet clear which enzyme is the main one in terms of Alu editing in human. Since our database is based on mRNA sequences from a wide range of tissues, it is possible that it characterized mainly the profile of the widely-expressed ADAR1 rather than that of ADAR2 which is expressed mainly in the brain. It is thus likely that some of the preferences identified in this work characterize ADAR1 and are therefore not present in the GluRB ADAR2-specific site. The discrepancies between nucleotide distributions around the editing sites reported above and those reported by [41] for ADAR2 sites might also attest for differences between the ADAR2 profile and the one characterizing our dataset, probably a mix of the two enzymes, with larger weight of ADAR1.

In an attempt to estimate the differences between the two enzymes, we compared 4657 editing sites supported by 13805 brain mRNAs, where both ADAR1 and ADAR2 are present, and 1684 sites residing in 10186 non-brain mRNAs, presumably edited mostly by ADAR1 (tissue-origin was determined by UCSC annotation [52]). The patterns observed were similar but not identical. For example, 1376 of the 2966 brain sites residing in a helix (46.4%) had a G in the dn1 position, compared to 452 out of 1076 (42.0%) in non-brain sites in a helix (p-value 0.013). However, differences were not statistically significant upon Bonferroni-correcting for multiple testing. Thus, a larger and better dataset (fully characterized in terms of of the tissues studied) is required in order to study the small tissue differences between the preferences of the two ADAR enzymes.

Conclusions

Using a dataset of 29,971 editing sites within Alu repeats, we analyzed the editing preferences. We found that the micro-structure a site resides in affects its editing frequency. In addition, the sequence motifs characterizing editing sites vary with the micro-structure. We have also shown that structural entropy and thermodynamic stability play a role in determining editing efficiency. We find that the probability of a nucleotide fluctuating between a number of possible structures to be edited is lower than the weighted average of the probabilities for each possible structure alone. This provides a hint as to the rate of the A-to-I editing process compared to the relaxation time scales controlling the transition between the possible folds.

Taken together, the results presented here could be of help in revealing the complex nature of the ADARs editing profile.

Methods

We construct a list of putative editing sites within Alu repeats following the method presented in [23,24]. That is, we use mismatches in the relatively clean RNA sequences, rather than the much larger but noisier EST data. We use UCSC alignments of human RNA sequences to the genomes http://genome.ucsc.edu webcite[52] and record all mismatches in these alignments. Then, known SNPs are removed, and the list is intersected with Alu locations, to obtain a set of mismatches within Alu repeats. A-to-I editing sites in Alu repeats tend to occur in clusters, we thus take only clusters of three or more consecutive identical mismatches. While this process is not inherently biased towards any specific type of mismatch, 98% of the mismatches found are A-to-G, suggesting that although these sites are typically supported by a single mismatch, they do represent true A-to-I editing sites with a low level of false-positives.

We then construct the predicted secondary structures using the following procedure: (a) for each site in our list, its Alu was located in the genome. Then, the nearest antisense Alu was located, and the genomic neighborhood that includes all nucleotides in and between the two Alus was taken, along with 200 extra nucleotides on each side. 61% of the inter-Alu distances are less than 1000 nucleotides, 22% more lie between 1000 and 2000, 9% are between 2000 and 3000 and the remaining 8% strech from 3000 to 6800 nucleotides.

(b) Neighborhoods having > 400 bp overlap on the same strand were merged into a single neighborhood encompassing both. This step resulted in 3,276 neighborhoods, containing 29,971 putative editing sites. RNA segments corresponding to these neighborhoods were folded using MFOLD, resulting in predicted secondary structures.

The accuracy of RNA secondary structure prediction by current dynamic programming algorithms (such as the MFOLD software) is moderate, up to 73% [53]. Yet, while false structure predictions would inevitably introduce noise to our analyzed data, the large sample size should allow for detecting a signal. Moreover, one should bear in mind that the RNA structures we consider - long and almost perfect dsRNAs - are relatively easy to analyze, and thus the accuracy of the folding algorithms is expected to be much higher than the above quoted rate.

(c) We parsed the output of MFOLD into a relational database containing all the information about the secondary structures in which the various sites reside (the basic substructures are given in fig. 1).

(d) All adenosines in the genomic neighborhoods' sites were classified: We find 29,971 putative editing sites within Alu repeats, denoted by E1 and 590,206 adenosines within Alu repeats that were not detected as editing sites, denoted by E0. The adenosines which are not in Alu repeats do not enter our analyses. In the following, we use the E0 sites as a control set. It should be stressed that the sensitivity of the bioinformatic algorithms for detecting editing sites is rather low, mainly due to the low coverage, low editing efficiency of most sites and tissue origin of the available sequences. For example, the observed editing efficiency averaged over all Alus, which is 0.048, is probably lower than the actual value. Therefore, the set of E0 sites should not be thought of as sites that are never edited, but rather as a background, maybe slightly depleted in editing sites. On the other hand, the set of E1 contains, with high precision, only edited sites [23].

Authors' contributions

YK carried out the RNA folding computations, prepared the foldings database and performed the statistical analysis. EE designed the study and prepared the editing sites database. Both authors read and approved the final manuscript.

Acknowledgements

We thank E.Y. Levanon for helpful comments and a critical reading of the manuscript. This work was supported by The Israel Science Foundation [grant number 365/06] and the Israel Ministry for Science and Technology (Scientific Infrastructure Program).

References

  1. Bass B: RNA editing by adenosine deaminases that act on RNA.

    Annu Rev Biochem 2002, 71:817-846. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Keegan L, Gallo A, O'Connell M: The many roles of an RNA editor.

    Nat Rev Genet 2001, 2:869-878. PubMed Abstract | Publisher Full Text OpenURL

  3. Wedekind J: Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business.

    Trends in Genetics 2003, 19(4):207-21. PubMed Abstract | Publisher Full Text OpenURL

  4. Conticello SG: The AID/APOBEC family of nucleic acid mutators.

    Genome biology 2008, 9(6):229. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  5. Palladino MJ, Keegan LP, O'Connell MA, Reenan RA: A-to-I pre-mRNA editing in Drosophila is primarily involved in adult nervous system function and integrity.

    Cell 2000, 102:437-449. PubMed Abstract | Publisher Full Text OpenURL

  6. Tonkin L, Saccomanno L, Morse DP, Brodigan T, Krause M, Bass BL: RNA editing by ADARs is important for normal behavior in Caenorhabditis elegans.

    EMBO J 2002, 21:6025-6035. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  7. Wang Q, Khillan J, Gadue P, Nishikura K: Requirement of the RNA editing deaminase ADAR1 gene for embryonic erythropoiesis.

    Science 2000, 290:1765-1768. PubMed Abstract | Publisher Full Text OpenURL

  8. Hartner JC, Schmittwolf C, Kispert A, Muller AM, Higuchi M, Seeburg PH: Liver Disintegration in the Mouse Embryo Caused by Deficiency in the RNA-editing Enzyme ADAR1.

    J Biol Chem 2004, 279(6):4894-4902. PubMed Abstract | Publisher Full Text OpenURL

  9. Higuchi M, Maas S, Single FN, Hartner J, Rozov A, Burnashev N, Feldmeyer D, Sprengel R, Seeburg PH: Point mutation in an AMPA receptor gene rescues lethality in mice deficient in the RNA-editing enzyme ADAR2.

    Nature 2000, 406:78-81. PubMed Abstract | Publisher Full Text OpenURL

  10. Maas S, Kawahara Y, Tamburro KM, Nishikura K: A-to-I RNA editing and human disease.

    RNA biology 2006, 3:1-9. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Kawahara Y, Ito K, Sun H, Aizawa H, Kanazawa I, Kwak S: Glutamate receptors: RNA editing and death of motor neurons.

    Nature 2004, 427(6977):801. PubMed Abstract | Publisher Full Text OpenURL

  12. Brusa R, Zimmermann F, Koh DS, Feldmeyer D, Gass P, Seeburg PH, Sprengel R: Early-Onset Epilepsy and Postnatal Lethality Associated with an Editing-Deficient GluR-B Allele in Mice.

    Science 1995, 270(5242):1677-1680. PubMed Abstract | Publisher Full Text OpenURL

  13. Gurevich I, Tamir H, Arango V, Dwork AJ, Schmauss C: Altered Editing of Serotonin 2C Receptor Pre-mRNA in the Prefrontal Cortex of Depressed Suicide Victims.

    Neuron 2002, 34(3):349-356. PubMed Abstract | Publisher Full Text OpenURL

  14. Niswender CM, Herrick-Davis K, Dilley GE, Meltzer HY, Overholser JC, Stockmeier CA, Emeson RB, Sanders-Bush E: RNA editing of the human serotonin 5-HT2C receptor: Alterations in suicide and implications for serotonergic pharmacotherapy.

    Neuropsychopharmacology 2001, 24:478-491. PubMed Abstract | Publisher Full Text OpenURL

  15. Iwamoto K: RNA editing of serotonin 2C receptor in human postmortem brains of major mental disorders.

    Neuroscience Letters 2003, 346(3):169-172. PubMed Abstract | Publisher Full Text OpenURL

  16. Maas S, Patt S, Schrey M, Rich A: Underediting of glutamate receptor GluR-B mRNA in malignant gliomas.

    Proceedings of the National Academy of Sciences of the United States of America 2001, 98(25):14687-14692. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  17. Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, Constantini S, Barbash ZS, Adamsky K, Safran M, Hirschberg A, Krupsky M, Ben-Dov I, Cazacu S, Mikkelsen T, Brodie C, Eisenberg E, Rechavi G: Altered adenosine-to-inosine RNA editing in human cancer.

    Genome Res 2007, 17(11):1586-1595. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  18. Morse DP, Bass BL: Long RNA hairpins that contain inosine are present in Caenorhabditis elegans poly(A)+ RNA.

    Proc Natl Acad Sci USA 1999, 96:6048-6053. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  19. Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, et al.: Systematic identification of abundant A-to-I editing sites in the human transcriptome.

    Nat Biotechnol 2004, 22:1001-1005. PubMed Abstract | Publisher Full Text OpenURL

  20. Athanasiadis A, Rich A, Maas S: Widespread A-to-I RNA Editing of Alu-Containing mRNAs in the Human Transcriptome.

    PLoS Biol 2004, 2:2144-2158. Publisher Full Text OpenURL

  21. Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A: Widespread RNA editing of embedded alu elements in the human transcriptome.

    Genome Res 2004, 14:1719-1725. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  22. Blow M, Futreal PA, Woostern R, Stratton MR: A survey of RNA editing in human brain.

    Genome Res 2004, 14(12):2379-2387. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  23. Eisenberg E, Nemzer S, Kinar Y, Sorek R, Rechavi G, Levanon EY: Is abundant A-to-I RNA editing primate-specific?

    Trends in Genetics 2005, 21(2):77-81. PubMed Abstract | Publisher Full Text OpenURL

  24. Neeman Y, Levanon EY, Jantsch MF, Eisenberg E: RNA editing level in the mouse is determined by the genomic repeat repertoire.

    RNA (New York N.Y.) 2006, 12(10):1802-1809. OpenURL

  25. Lev-Maor G, Sorek R, Levanon EY, Paz N, Eisenberg E, Ast G: RNA-editing-mediated exon evolution.

    Genome biology 2007, 8(2):R29. PubMed Abstract | BioMed Central Full Text | PubMed Central Full Text OpenURL

  26. Scadden AD, Smith CW: Specific cleavage of hyper-edited dsRNAs.

    EMBO J 2001, 20:4243-4252. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  27. Scadden AD: The RISC subunit Tudor-SN binds to hyper-edited double-stranded RNA and promotes its cleavage.

    Nature structural & molecular biology 2005, 12(6):489-496. OpenURL

  28. Scadden AD, O'Connell MA: Cleavage of dsRNAs hyper-edited by ADARs occurs at preferred editing sites.

    Nucleic acids research 2005, 33(18):5954-5964. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  29. Osenberg S, Dominissini D, Rechavi G, Eisenberg E: Widespread cleavage of A-to-I hyperediting substrates.

    RNA 2009, 15(9):1632-1639. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  30. Zhang Z, Carmichael GG: The fate of dsRNA in the nucleus: a p54(nrb) containing complex mediates the nuclear retention of promiscuously A-to-I edited RNAs.

    Cell 2001, 106:465-475. PubMed Abstract | Publisher Full Text OpenURL

  31. Prasanth KV, Prasanth SG, Xuan Z, Hearn S, Freier SM, Bennett CF, Zhang MQ, Spector DL: Regulating Gene Expression through RNA Nuclear Retention.

    Cell 2005, 123(2):249-263. PubMed Abstract | Publisher Full Text OpenURL

  32. Hundley HA, Krauchuk AA, Bass BL: C. elegans and H. sapiens mRNAs with edited 3' UTRs are present on polysomes.

    RNA (New York N.Y.) 2008, 14(10):2050-2060. OpenURL

  33. Chen LLL, DeCerbo JN, Carmichael GG: Alu element-mediated gene silencing.

    The EMBO journal 2008, 27(12):1694-1705. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  34. Seeburg P, Higuchi M, Sprengel R: RNA editing of brain glutamate receptor channels: mechanism and physiology.

    Brain Res Brain Res Rev 1998, 26(2-3):217-229. PubMed Abstract | Publisher Full Text OpenURL

  35. Gott JM, Emeson RB: Functions and mechanisms of RNA editing.

    Annual Review of Genetics 2000, 34:499-531. PubMed Abstract | Publisher Full Text OpenURL

  36. Melcher T, Maas S, Herb A, Sprengel R, Seeburg PH, Higuchi M: A mammalian RNA editing enzyme.

    Nature 1996, 379:460-464. PubMed Abstract | Publisher Full Text OpenURL

  37. Polson AG, Bass BL: Preferential selection of adenosines for modification by double-stranded RNA adenosine deaminase.

    Embo J 1994, 13:5701-5711. PubMed Abstract | PubMed Central Full Text OpenURL

  38. Lehmann KA, Bass BL: Double-stranded RNA adenosine deaminases ADAR1 and ADAR2 have overlapping specificities.

    Biochemistry 2000, 39:12875-12884. PubMed Abstract | Publisher Full Text OpenURL

  39. Riedmann EM, Schopoff S, Hartner JC, Jantsch MF: Specificity of ADAR-mediated RNA editing in newly identified targets.

    RNA 2008, 14(6):1110-1118. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  40. Aruscavage PJ, Bass BL: A phylogenetic analysis reveals an unusual sequence conservation within introns involved in RNA editing.

    RNA (New York N.Y.) 2000, 6(2):257-269. OpenURL

  41. Dawson TR, Sansam CL, Emeson RB: Structure and Sequence Determinants Required for the RNA Editing of ADAR2 Substrates.

    J Biol Chem 2004, 279(6):4941-4951. PubMed Abstract | Publisher Full Text OpenURL

  42. Lehmann KA, Bass BL: The importance of internal loops within RNA substrates of ADAR1.

    J Mol Biol 1999, 291:1-13. PubMed Abstract | Publisher Full Text OpenURL

  43. Zuker M, Mathews D, Turner D: Algorithms and Thermodynamics for RNA Secondary Structure Prediction: A Practical Guide. In RNA Biochemistry and Biotechnology, NATO ASI Series. Edited by Barciszewski J, Clark B. Kluwer Academic Publishers; 1999. OpenURL

  44. Wong SK, Sato S, Lazinski DW: Substrate recognition by ADAR1 and ADAR2.

    RNA 2001, 7:846-858. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  45. Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM: Genome-Wide Identification of Human RNA Editing Sites by Parallel DNA Capturing and Sequencing.

    Science 2009, 324(5931):1210-1213. PubMed Abstract | Publisher Full Text OpenURL

  46. Casey JL, Gerin JL: Hepatitis D virus RNA editing: specific modification of adenosine in the antigenomic RNA.

    J Virol 1995, 69(12):7593-7600. PubMed Abstract | PubMed Central Full Text OpenURL

  47. Taylor JM: Replication of human hepatitis delta virus: influence of studies on subviral plant pathogens.

    Adv Virus Res 1999, 54:45-60. PubMed Abstract | Publisher Full Text OpenURL

  48. Maas S, Melcher T, Herb A, Seeburg PH, Keller W, Krause S, Higuchi M, O'Connell MA: Structural Requirements for RNA Editing in Glutamate Receptor Pre-mRNAs by Recombinant Double-stranded RNA Adenosine Deaminase.

    J Biol Chem 1996, 271(21):12221-12226. PubMed Abstract | Publisher Full Text OpenURL

  49. Melcher T, Maas S, Herb A, Sprengel R, Higuchi M, Seeburg PH: RED2, a Brain-specific Member of the RNA-specific Adenosine Deaminase Family.

    J Biol Chem 1996, 271(50):31795-31798. PubMed Abstract | Publisher Full Text OpenURL

  50. Yang JH, Sklar P, Axel R, Maniatis T: Purification and characterization of a human RNA adenosine deaminase for glutamate receptor B pre-mRNA editing.

    Proceedings of the National Academy of Sciences of the United States of America 1997, 94(9):4354-4359. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  51. Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB: Regulation of serotonin-2C receptor G-protein coupling by RNA editing.

    Nature 1997, 387(6630):303-308. PubMed Abstract | Publisher Full Text OpenURL

  52. Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ, University of California Santa Cruz: The UCSC Genome Browser Database.

    Nucleic acids research 2003, 31:51-54. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  53. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH: Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure.

    Proceedings of the National Academy of Sciences of the United States of America 2004, 101(19):7287-7292. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL