MicroRNAs (miRNAs) play a critical role in post-transcriptional gene regulation and have been shown to control many genes involved in various biological and metabolic processes. There have been extensive studies to discover miRNAs and analyze their functions in model plant species, such as Arabidopsis and rice. Deep sequencing technologies have facilitated identification of species-specific or lowly expressed as well as conserved or highly expressed miRNAs in plants.
In this research, we used Solexa sequencing to discover new microRNAs in trifoliate orange (Citrus trifoliata) which is an important rootstock of citrus. A total of 13,106,753 reads representing 4,876,395 distinct sequences were obtained from a short RNA library generated from small RNA extracted from C. trifoliata flower and fruit tissues. Based on sequence similarity and hairpin structure prediction, we found that 156,639 reads representing 63 sequences from 42 highly conserved miRNA families, have perfect matches to known miRNAs. We also identified 10 novel miRNA candidates whose precursors were all potentially generated from citrus ESTs. In addition, five miRNA* sequences were also sequenced. These sequences had not been earlier described in other plant species and accumulation of the 10 novel miRNAs were confirmed by qRT-PCR analysis. Potential target genes were predicted for most conserved and novel miRNAs. Moreover, four target genes including one encoding IRX12 copper ion binding/oxidoreductase and three genes encoding NB-LRR disease resistance protein have been experimentally verified by detection of the miRNA-mediated mRNA cleavage in C. trifoliata.
Deep sequencing of short RNAs from C. trifoliata flowers and fruits identified 10 new potential miRNAs and 42 highly conserved miRNA families, indicating that specific miRNAs exist in C. trifoliata. These results show that regulatory miRNAs exist in agronomically important trifoliate orange and may play an important role in citrus growth, development, and response to disease.
MicroRNAs (miRNAs) are 19-23 nucleotide long non-coding RNAs that regulate gene expression at the post-transcriptional level, either by endonucleolytic cleavage or by translational inhibition [1-3]. Increasing evidence indicates that miRNAs play major roles in key aspects of plant development and their response to environmental stresses [4-7]. The fact that a large number of the known miRNAs in the plant kingdom from mosses and ferns to higher flowering plants are evolutionarily conserved has been used as a practical indicator for identification or prediction of miRNAs by homology searches in other species [8,9]. miRNAs predicted by bioinformatics should be validated for their expression by experimental methods. Northern blotting and PCR based amplification of adaptor-ligated cDNA have been used for validation of predicted miRNAs . Northern blotting might not be sensitive enough to detect less-abundant miRNAs and it does not reveal the actual miRNA sequences while PCR-based amplification can be difficult in practice when the actual mature miRNA region is unknown. Recently developed next-generation high throughput sequencing technologies provide a powerful strategy to identify as well as quantify miRNAs. These technologies open up possibilities of exploring sRNA populations in economically important species that lack adequate genome information, such as trifoliate orange (Citrus trifoliata).
Increasing evidence shows that the miRNA repertoire of any plant or animal species comprises of a set of conserved ancient miRNAs as well as many recently evolved species-specific miRNAs [11-13]. Since the species-specific miRNAs often accumulate at lower levels than conserved ancient miRNAs, it is sometimes difficult to assess them when derived from traditional sequencing approaches such as the Sanger sequencing method which has been widely used in model plant species with known genome sequences e.g Arabidopsis, poplar, and rice . The availability of next generation sequencing technologies provide high throughput tools to make new discoveries of additional species-specific or lowly expressed miRNAs, e.g. in Arabidopsis [12,15], rice [16,17], poplar [18,19], Triticum aestivum [20,21], Zea mays [22,23], Medicago tuncatula , Lycopersicon esculentum , Gossypium hirsutum , and Taxus chinensis . With the availability of high throughput sequencing technology, like the 454 Technology, Solexa Platform, and Massively Parallel Sequencing (MPSS), it may be possible to make new discoveries of species specific or lowly expressed miRNAs. However, there still exist some differences between these novel sequencing technologies. It is reported that the longest reads can be obtained using 454 Technology, while the Solexa platform can yield a higher number of reads , and is fit for sequencing of shorter reads (up to 35 bp) . Although MPSS can also give a huge number of reads, it gives even shorter reads i.e only 17 bp  that are shorter than that of miRNAs. Since the miRNAs sequences are only about 21 nt in length, the Solexa platform seems to be the preferable choice for miRNA discovery.
Citrus is one of the most economically important evergreen fruit crop in the world. Citrus trifoliata (representing Poncirus trifoliata in the NCBI taxonomy) has consistently been one of the most important rootstock species used in the citrus industry, and has even been used as a model species for citrus molecular biology and genomic studies. The availability of a large number of expressed sequence tags (ESTs) from C. trifoliata is also an excellent source of experimental material for elucidation of gene expression and regulation. Although miRNAs have been extensively studied in the past five years, limited systematic study of miRNAs has been performed on the citrus genus and especially C. trifoliata. Sequencing of all expressed sRNAs is required for complete identification of conserved miRNAs in C. trifoliata.
Identification of several miRNAs from citrus by computational approaches [9,29,30] and the verification of conserved citrus miRNA by homology to Arabidopsis have been reported . However, the number of predicted C. trifoliata miRNAs still remains quite low. Recent miRNA analysis in Arabidopsis and rice using the deep sequencing approach discovered that the encoding loci of non-conserved miRNAs were more than that of conserved miRNAs [12,16]. Therefore, it was necessary to carry out further research on the miRNAs in C. trifoliata, and deep sequencing as a method was given preference.
To investigate the role of miRNAs during reproductive growth, high throughput sequencing technology (Illumina) was employed to survey sRNA populations from C. trifoliata flower buds, flowers and fruits at different development stages. In this study, a total of 13,106,753 sRNAs representing 4,876,395 unique sRNAs were sequenced. Based on sequence similarity and hairpin structure prediction, we found that 156,639 reads representing 63 sequences from 42 highly conserved miRNA families, have perfect matches to known miRNAs. Our results indicate that a complex and diverse sRNA population exists in C. trifoliata. Based on the identified miRNA and some miRNA* sequences, 29 conversed miRNA (miR156, miR160, miR162, miR164, miR165, miR166, miR167, miR168, miR169, miR170, miR171, miR172, miR319, miR390, miR394, miR396, miR398, miR399, miR403, miR408, miR419, miR530, miR835, miR844, miR950, miR1027, miR1044, miR1426, and miR1446) precursors have been identified from the citrus EST library [9,29-31]. Through deep sequencing we also identified ten species-specific miRNAs whose precursors were all potentially generated from 560,271 citrus ESTs, of which five miRNA* sequences were also sequenced. Furthermore, we studied expression patterns of the 10 novel miRNA candidates by qRT-PCR in different tissues of C. trifoliata. Potential target genes were predicted for most conserved and novel miRNAs, of which four target genes including one IRX12 encoding copper ion binding/oxidoreductase and three genes encoding NB-LRR disease resistance protein have been experimentally verified by detection of the miRNA-mediated mRNA cleavage in C. trifoliata.
C. trifoliata has a complex sRNA population
To identify miRNAs involved in development of the citrus rootstock, C. trifoliata, a separate sRNA library was generated from the earlier mentioned reproductive tissues. The library was sequenced by Solexa (Illumina), yielding a total of 13,106,573 sRNA raw reads with lengths of 18 to 30nt and consisting of 4,876,395 unique sequences (Table 1). After further removal of tRNAs (265,796), rRNAs (1,643,869), snRNAs (67,225), snoRNAs (30, 909), exon RNA (7,136), intron sense (3,947), and repeat region (67,225), a total of 11,091,865 sRNA sequences were obtained. Although some sRNAs were highly abundant and present thousands of times in our dataset, majority of sRNAs were sequenced only a few times. For example, 4,876,395 out of 13, 113, 220 sRNAs were sequenced only once. The results show that (1) the expression of different sRNAs in trifoliate orange varies drastically and (2) survey of sRNA is far from being exhausted in the trifoliate orange. This also suggests that trifoliate orange contains a large and diverse sRNA population. Compared to equivalent studies from other plants (Table 2), the quantity of sRNAs acquired from C. trifoliata is at least 14 times more, which might be due to the difference of depth of the sequencing effort, the genome size of the plants, or the development stages of the samples collected in the different sequencing projects.
Table 1. SRNAs annotation and length distribution
Table 2. High-throughput sequencing of sRNAs in some plants
The size distribution of all sRNAs is as summarized in Figure 1A. The length of the trifoliate orange sRNAs varied from 18 nt to 30 nt, and the majority of them (approximately 92%) were in the range from 19 to 24 nt in length with 21 (13.2%) and 24 (44.1%) nt ones as the two major size classes (Figure 1A). This result was consistent with those of Arabidopsis , Medicago truncatula , and Oryza sativa . In Arabidopsis, the 24 nt sRNAs even accounted to about 60% of its sRNA transcriptome . However, the trifoliate orange sRNA size distribution differed from those of wheat and conifer obtained through 454 high throughput sequencing [20,33] and from Taxus chinensis ones obtained through Solexa sequencing . To further compare the average abundance of sRNAs with different lengths, we measured the ratio of raw and unique sequences. SRNAs varied widely in length, and there is variation in redundancies of them, among which the sRNAs class with 21 and 22 nt showed the highest redundancies (Figure 1B). The average ratio of redundant and unique sequences of sRNAs with different sizes showed no obvious changes. Although the sRNAs annotated as miRNAs in the sizes of 18 nt, 19 nt, 20 nt, and 21 nt all had about 96-102 unique reads, their redundant reads surprisingly varied widely especially for the 21 nt group (Figure 2) which had 140,004 redundant reads and occupied almost 73% of the sRNAs assigned to miRNA therefore indicating that 21 nt long sRNAs are the most outstanding miRNA.
Figure 1. Sequence length distribution of C. trifoliata sRNAs.
Figure 2. Mapping of C. trifoliata sRNAs to other plant miRNAs.
Identifying conserved miRNAs in C. trifoliata
To identify the conserved miRNAs from the C. trifoliata, sRNA sequences identified from C. trifoliata by deep sequencing were compared with the currently known mature plant miRNAs in miRBase . After Blastn searches and further sequence analysis, a total of 63 conserved miRNAs, belonging to 42 miRNA families, were identified in C. trifoliate. 23 miRNA* of these miRNAs were also sequenced (Additional file 1, Figure 3). The most of the identified miRNA families have been shown to be conserved in a variety of plant species using a comparative genomics-based strategy. For example, miR319, miR156/157, miR169, miR165/166, and miR394 have been found in 51, 45, 41, 40, and 40 plant species, respectively . There is only one member identified in the majority of miRNA families, whereas some miRNA families contained many potential members (Figure 3) that need some further validation based on genomic or EST sequences.
Format: DOC Size: 155KB Download file
This file can be viewed with: Microsoft Word Viewer
Figure 3. Numbers of identical miRNA member in each family in C. trifoliata.
It has been demonstrated that high throughput sequencing can provide an alternative way to estimate the expression profiles of miRNA genes [27,35] and allow us to determine the abundance of various miRNA families and even distinguish between different members of a given family of one organism. The highly expressed miRNA will likely have a large number of sequenced clones. Among the 42 miRNA families, the miR172 family had the most reads, accounting for 22.5% of the conserved miRNA reads. Additionally, fifteen miRNA families namely miR156, miR159, miR160, miR162, miR164, miR166, miR167, miR168, miR169, miR171, miR172, miR390, miR394, miR403, and miR1446, were found to have some thousands to tens of thousands of redundancies while four families (miR395, miR396, miR397, miR414, and miR827), had more than one hundred redundancies. The remaining families were infrequently sequenced (less than 100). In this study, we have tried to identify the precursor sequences for the 63 conserved C. trifoliata miRNAs, and only 29 pre-miRNAs and their secondary structures have been identified from the available citrus EST databases [9,29-31].
Identifying novel potential miRNAs in C. trifoliata
Since the whole genomic sequence of C. trifoliata is unavailable, unique sRNA sequences were mapped to EST sequences restored in NCBI using C. trifoliata, as well as the other citrus species. By searching all C. trifoliata sRNAs against ESTs and predicting the secondary structures of a series of sequences surrounding them (Additional file 2), we identified 10 sequences that satisfied the secondary structure criteria as established by Zhang et al.  and shown in Table 3 with all the sequences meeting the new criteria of miRNA annotation . Five of these putative miRNAs i.e. 50% were supported by miRNA*. Based on their near perfect secondary structure and following recent miRNA annotation criteria, these miRNAs were considered novel miRNAs . Although not all were supported by miRNA*s, the novel C. trifoliata-specific miRNAs generally have a significant number of reads in the small library. Similar to conserved miRNAs, 8 of the 10 novel miRNAs begin with a 5' uridine, which is a characteristic feature of miRNAs (Table 3). The expression of the 10 miRNAs was assayed using qRT-PCR analysis and signals were detected for all of them (Figure 4). miRNAs specific to C. trifoliata exhibited different tissue-specific expression patterns. For example, Ctr-miRn1 was found to be specifically expressed in fruit tissue, whereas ctr-miRn2, which targeted NB-LRR gene (see below), was expressed in all the tissues assayed (Figure 4).
Format: DOC Size: 434KB Download file
This file can be viewed with: Microsoft Word Viewer
Table 3. Novel miRNA candidates in C. trifoliata
Figure 4. Expression patterns of novel miRNA predicted from C. trifoliata. qRT-PCR of Low Molecular Weight RNA isolated from tissues at different development stages. Lanes: r, root; s1, young stem; s2, old stem; l1, leaf1 (0.2 cm in diameter); l2, leaf2 (0.5 cm in diameter); l3, leaf3 (1 cm in diameter); fl1, flower bud; fl2, partly open flower; fl3, full open flower; fr1, fruit 1 (about 0.5 cm in diameter); fr2, fruit 2 (about 1.5 cm in diameter); and fr3, fruit 3 (about 2 cm in diameter). Each reaction was repeated three times and the template amount was corrected by 5.8 s rRNAs.
Detection and expression patterns of novel potential miRNAs in C. trifoliata
Preferential expression of novel miRNAs in specific tissues might provide clues about the physiological function of these miRNAs. qRT-PCR is a reliable method for detecting and measuring the expression levels of miRNAs. In this study, we adopted this technique to validate and measure the expression of 10 novel potential miRNAs (ctr-miRn1, ctr-miRn2, ctr-miRn3, ctr-miRn4, ctr-miRn5, ctr-miRn6, ctr-miRn7, ctr-miRn8, ctr-miRn9 and ctr-miRn10). All of these potential miRNAs were identified in C. trifoliata by Solexa sequencing. To aid in determination of C. trifoliata novel miRNA functions, we examined their expression in different organs (Figure 4) by qRT-PCR analysis of LMW-RNA samples from various tissues of trifoliate orange trees. The data obtained can form powerful evidence to support the existence of the novel miRNAs in C. trifoliata. The expression patterns of most novel miRNAs in C. trifoliata appear to be tissue or development stage specific, with all the expression patterns of these miRNAs in citrus being grouped into several situations. The expression patterns of ctr-miRn1 and ctr-miRn8 were similar and revealed to be specifically expressed in C. trifoliata fruit tissue (fruit diameter = 2 cm) as shown in Figure 4A and 4H. However, ctr-miRn2 displayed different expression patterns in various organs and development stages where there was high expression in leaves of 1 cm diameter, weaker expression in leaves of 0.5 cm diameter and no expression in leaves of 0.2 cm diameter. Furthermore, ctr-miRn2 had different expression patterns in stem, flower, and fruits. It seems that ctr-miRn2 becomes strongly expressed as different tissues age. Some of the miRNAs might display species-specific and/or developmental stage-specific expression patterns, as exemplified by ctr-miRn3, ctr-miRn6 and ctr-miRn7 (Figure 4C, F and 4G). Ctr-miRn3 had preferential expression only in open flowers, while ctr-miRn6 had strong expression in leaves of 0.2 cm in diameter and weak expression in developing flower bud. Ctr-miRn7 was expressed only in flowers at different development stages (Figure 4H), while ctr-miRn4 seemed to be expressed strongly in old stems, moderately in leaves of 0.2 cm in diameter and weakly in roots of trifoliate orange (Figure 4D). Ctr-miRn5 was expressed in all tissues tested with the highest expression level in fruits of 2 cm in diameter and relatively strong expression in flower buds (Figure 4E). Ctr-miRn9 was expressed in all tissues tested except in root and young stem, with at highest expression level in flower and relatively strong expression in leaves of 1 cm in diameter (Figure 4I). Ctr-miRn10 was expressed in most of tissues except in leaves of 0.2 cm diameter. Specifically, ctr-miRn10 had strong expression in flowers at different stages and moderate expression in fruits of different sizes (Figure 4J).
In summary, 10 novel potential miRNAs, all identified by Solexa sequencing, were validated by qRT-PCR with some being expressed ubiquitously in all tissues with tissue, species, and/or growth stage specific characteristics reflected at different expression levels.
Prediction of the C. trifoliata miRNA target genes
To better understand the functions of the newly identified species-specific as well as conserved C. trifoliata miRNAs, putative targets of these miRNAs were predicted using the described criteria and methods. We predicted target genes, and putative targets were identified for 24 out of 42 conserved families (Additional file 1). We also found homologs of known miRNA target genes for several conserved C. trifoliata miRNAs, such as SBP for miR156, ATP synthase for miR159, ARF for miR160, NAC for miR164, HD-Zip for miR165 and miR166, Anthocyanidin synthase for miR169, GRAS for miR171, AP2 for miR172, TCP for miR319, TIR for miR393, F-box for miR394, Sulfate transporter 2.1 for miR395, IRX12 copper ion binding/oxidoreductase for miR397, ARGONAUTE 2 for miR403, Basic blue copper protein for miR408 and Zinc finger protein-related for miR414. Additionally, we predicted a few genes with unknown function and hypothetical genes for miRNA targeting (Additional file 1). Careful analysis of these potential targets will contribute to our understanding of the role of miRNAs in fruit trees. No targets were found in C. trifoliata for miR398, miR399, miR419, miR530, miR535, miR827, miR828, miR35, miR844, miR861, miR902, miR950, miR1027, miR1044, miR1310, miR1426, miR1436, and miR1440.
Using the criteria used by Song et al. , targets were predicted for 3 of these novel miRNAs. It has been confirmed that miR172 targets the mRNA coding for APETALA2-like transcription factors, an important gene known for controlling flower development [36-38]. The important targets included NB-LRR disease resistance gene analogs, such as UC46-24238 and UC46-17900 that were found to be similar to Populus trichocarpa XM_002328188 cc-nbs-lrr resistance protein and UC46-20080 that was similar to P. trichocarpa XM_002301537 cc-nbs-lrr resistance protein.
Identification of miRNA-guided cleavage of target mRNAs in C. trifoliata
Most Arabidopsis miRNAs have been shown to guide cleavage of their target genes [1,39]. To verify the nature of the potential miRNA targets and to study how the miRNAs in C. trifoliata regulate their target genes, a modified RLM-RACE experiment was set up, as described in the materials and methods section. In this study, the RLM-RACE procedure was successfully used to map the cleavage sites in four predicted target genes of C. trifoliata. Given the clear tissue-specific pattern of expression of ctr-miR397, ctr-miRn1, ctr-miRn2 and ctr-miRn3 in all types of vegetative organs of trifoliate orange (Figure 4), these analyses were performed on a few of their putative targets, using RNA extracted from leaves, stems, roots, flowers, and fruits of trifoliate orange, where ctr-miR397, ctr-miRn2, ctr-miRn3 and ctr-miRn4 were all abundantly expressed. UC46-13453, UC46-20080, UC46-17900, and UC46-24238 were confirmed as the real targets of ctr-miR397, ctr-miRn2, ctr-miRn3 and ctr-miRn4 respectively, since all the 5'ends of the mRNA fragments were mapped to the nucleotide that pairs to the tenth nucleotide of each miRNA with higher frequencies than depicted for each pairing oligo (Figure 5). All four predicted targets were found to have specific cleavage sites corresponding to the miRNA complementary sequences (Figure 5) and might be regulated by the four miRNAs in the style of small interfering RNAs (siRNAs) directing the cleavage of mRNA targets [40,41]. UC46-13453 is similar to Arabidopsis proteins coded by IRX12 copper ion binding/oxidoreductase (IRX12CBO) (Table 1), while UC46-20080, UC46-17900 and UC46-24238 all coded for a protein highly homologous to NB-LRR disease resistance protein (Table 4).
Figure 5. Mapping of mRNA cleavage sites by RNA ligase-mediated 5' RACE. Each top strand (black) depicts a miRNA-complementary site in the target mRNA, and each bottom strand (red) depicts the miRNA. Watson-Crick pairing (vertical dashes) and G:U wobble pairing (circles) are indicated. Arrows indicate the 5' termini of mRNA fragments isolated from Citrus, as identified by cloned 5'RACE products, with the frequency of clones shown. Only cloned sequences that matched the correct gene and had 5' ends within a 100 nt window centered on the miRNA complementary site are counted. Partial mRNA sequences from target genes were aligned with miRNAs. Numbers indicate the fraction of cloned PCR products terminating at different positions. UC46-13453 was similar to AT2G38080 (NM_129364) IRX12 (IRREGULAR XYLEM 12); laccase (IRX12); UC46-24238 and UC46-17900 were similar to Populus trichocarpa (XM_002328188) cc-nbs-lrr resistance protein; UC46-20080 was similar to Populus trichocarpa (XM_002301537) cc-nbs-lrr resistance protein.
Table 4. Predicted targets for identified novel miRNAs in C. trifoliata.