Intron-derived long noncoding RNAs with snoRNA ends (sno-lncRNAs) are highly expressed from the imprinted Prader-Willi syndrome (PWS) region on human chromosome 15. However, sno-lncRNAs from other regions of the human genome or from other genomes have not yet been documented.
By exploring non-polyadenylated transcriptomes from human, rhesus and mouse, we have systematically annotated sno-lncRNAs expressed in all three species. In total, using available data from a limited set of cell lines, 19 sno-lncRNAs have been identified with tissue- and species-specific expression patterns. Although primary sequence analysis revealed that snoRNAs themselves are conserved from human to mouse, sno-lncRNAs are not. PWS region sno-lncRNAs are highly expressed in human and rhesus monkey, but are undetectable in mouse. Importantly, the absence of PWS region sno-lncRNAs in mouse suggested a possible reason why current mouse models fail to fully recapitulate pathological features of human PWS. In addition, a RPL13A region sno-lncRNA was specifically revealed in mouse embryonic stem cells, and its snoRNA ends were reported to influence lipid metabolism. Interestingly, the RPL13A region sno-lncRNA is barely detectable in human. We further demonstrated that the formation of sno-lncRNAs is often associated with alternative splicing of exons within their parent genes, and species-specific alternative splicing leads to unique expression pattern of sno-lncRNAs in different animals.
Comparative transcriptomes of non-polyadenylated RNAs among human, rhesus and mouse revealed that the expression of sno-lncRNAs is species-specific and that their processing is closely linked to alternative splicing of their parent genes. This study thus further demonstrates a complex regulatory network of coding and noncoding parts of the mammalian genome.
Keywords:lncRNA; sno-lncRNA; Alternative splicing; Species-specific; PWS
Although only about 2% of the human genome encodes protein sequences [1,2], recent advances in genomewide analyses have revealed that the majority of the human genome is transcribed [3,4], largely from noncoding segments that used to be considered as “junk sequences” or “dark matter” [5,6]. Besides well-characterized housekeeping noncoding RNAs (such as tRNA, rRNA, snRNA and snoRNA) and small regulatory ncRNAs [7,8], the transcriptome has become even more complex with pervasively transcribed long noncoding RNAs (lncRNAs, at least 200 nt long) [4,9,10]. Using systematic and integrative strategies and by considering multiple biological features, thousands of lncRNAs were identified from intergenic regions (long intergenic noncoding RNAs, lincRNAs) in mouse , zebrafish  and human  genomes. Importantly, the strategy of lincRNA discovery has served as a road map for the systematic annotation of other lncRNAs.
In addition to intergenic regions, introns account for over 20% of noncoding sequences in the human genome and provide yet another source to generate lncRNAs. By removal of redundant rRNAs and poly(A)+ RNAs, a relatively pure population of non-polyadenylated and non-ribosomal (poly(A)-/ribo-) RNAs was obtained and subjected to high-throughput deep sequencing . This type of poly(A)- RNA-seq of the human cell transcriptomes surprisingly revealed previously-ignored RNA signals in exons and introns [14-16]. Interestingly, nuclear fractionation also indicated the presence of stable transcripts from intronic sequences in X. tropicalis . What mechanism(s) can protect these excised introns from rapid degradation after splicing? Further analyses revealed a class of intron-derived lncRNAs that depend on the snoRNA machinery at both ends for their processing (sno-lncRNAs) . This finding shed new light on lncRNA characterization from “junk” intronic sequences.
Strikingly, five sno-lncRNAs derived from introns of the Prader-Willi syndrome (PWS) region (15q11-q13) were highly expressed in human embryonic stem cells and strongly associated with Fox family splicing regulators to alter patterns of splicing . The PWS 15q11-q13 region is imprinted, leading to the expression of the SNURF-SNRPN gene and downstream noncoding region from the paternal chromosome. All paternal transcripts downstream of the SNRPN gene are noncoding and have been considered primarily as precursors for small RNAs, including the SNORD116 cluster of 29 similar snoRNAs [18,19]. Importantly, SNORD116 deficiency has been recognized as the primary cause of PWS in recent disease model [20-22]. Although the function of SNORD116s remained elusive, the recently identification of PWS region sno-lncRNAs and their association with Fox family splicing regulators offers a functional connection of sno-lncRNAs in the molecular pathogenesis of PWS . However, it was not clear how many other sno-lncRNAs may exist in the genome. Given that the vast majority of snoRNAs are encoded in introns of protein-coding genes , it was of interest to annotate sno-lncRNAs in a genomewide manner. Moreover, signals of poly(A)- transcripts from intronic regions have been widely detected in a variety of cultured cells , which has provided a rich data source to explore sno-lncRNAs from different cell lines.
Here, we applied computational pipelines to identify sno-lncRNAs genome-widely from poly(A)-/ribo- transcriptomes of human, rhesus and mouse. In total, 19 sno-lncRNAs have been identified with tissue- and species-specific expression patterns from available species/cell lines. PWS region sno-lncRNAs are highly expressed in human, somewhat in rhesus, and none in mouse. In contrast, a RPL13A region sno-lncRNA is highly expressed in mouse, but almost absent in human. We further demonstrated that the formation of sno-lncRNAs often requires alternative splicing, indicating a complex regulatory network of coding and noncoding parts of the genome.
Results and discussion
Genomewide identification of sno-lncRNAs across species
The processing of intron-derived sno-lncRNAs depends on the snoRNA machinery at both ends . Five such sno-lncRNAs are highly expressed in introns of PWS imprinted region of chr15 and are strongly associated with Fox family splicing regulators to alter patterns of splicing . However, sno-lncRNAs from other regions of the human genome or other species were less obvious. Since the vast majority of snoRNA genes are located in introns of protein-coding genes (Figure 1A) , and one intron containing two snoRNA genes is a prerequisite for the generation of a sno-lncRNA, we first surveyed genomic locations of annotated snoRNAs in different genomes to locate snoRNA pairs in one intron (≥two snoRNAs/intron). In total, 400 annotated snoRNAs were downloaded from snoRNABase  for human, and 132 snoRNAs for mouse from RefSeq (http://www.ncbi.nlm.nih.gov/refseq/ webcite, downloaded on 2013/3/4), respectively. Since snoRNAs in rhesus are not well annotated, we transposed human and mouse snoRNA annotations over to the rhesus genome to generate 375 putative rhesus snoRNAs (Methods). We then analyzed the genomic locations of these snoRNAs in different species.
Figure 1. Genomewide prediction of sno-lncRNAs. (A) Genomewide prediction of sno-lncRNAs with annotated snoRNAs. The numbers of annotated snoRNAs, snoRNAs in introns or snoRNAs forming pairs in any single intron were compared across human (blue), rhesus (green) and mouse (yellow). Rhesus snoRNAs were transposed from human and mouse annotations. Note that only a few of snoRNAs could form pairs in any single intron, and even fewer sno-lncRNAs (highlighted by asterisk) could be detected with poly(A)-/ribo- RNA-seq signals. (B) Genomewide prediction of sno-lncRNAs with an intron-annotation-independent algorithm. For each two adjacent snoRNAs (blue bars) within 10 kb, RPKM of snoRNAs (blue regions) and internal region (pink regions) in poly(A)-/ribo- RNA-seq datasets were evaluated with a set of criteria to identify novel sno-lncRNAs (Methods). In total, 19 sno-lncRNAs were identified in examined species and cell lines (Table 1).
Only a small portion of snoRNA pairs could be found from any single RefSeq intron. Importantly, even fewer such introns are expressed in detected cell lines by interrogating poly(A)-/ribo- RNA-seq datasets. For example, in human pluripotent H9 cells, only seven sno-lncRNAs could be detected, including six reported sno-lncRNAs and a new one derived from the C17orf76-AS1 region (Table 1), although we identified 48 snoRNAs that form pairs within single introns (Figure 1A). In addition, no putative sno-lncRNAs could be detected in rhesus or mouse by analyzing snoRNA pairs in annotated introns.
Table 1. Summary of sno-lncRNAs identified from human, rhesus and mouse ESCs
Additional file 1. Identification and validation of two novel human sno-lncRNAs. (A) Expression patterns of predicted sno-lncRNA in human cell lines. Normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) were indicated in H9 and HeLa, respectively. Red bar, NB probe for (B). (B) Northern blot validation of this novel sno-lncRNA (~598 nt) in H9, PA-1 and HeLa-J cell lines. (C) Expression patterns of predicted sno-lncRNA in human cell lines. Left, normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) were indicated in H9 and HeLa, respectively. Red bar, NB probe for (B). (D) NB validation of this novel sno-lncRNA (~585 nt) in H9, PA-1 and HeLa-J cell lines.
Format: JPEG Size: 566KB Download file
Additional file 2. Northern blots of sno-lncRNAs with native agarose gel. (A) and (B) Northern bolts show that sno-lncRNAs can be recapitulated after replacing the snoRNA end from C/D box snoRNA to H/ACA box snoRNA (A) or vice versa (B). Top, a schematic drawing of wild-type sno-lncRNAs (sno-lncRNA2 and sno-lnc5AC) or modified sno-lncRNAs (sno-lncRNA2-5A and sno-lnc5C-14) in the expression vector. Black/grey boxes, exons; Black bars, NB probes; Blue circles, C/D snoRNAs; Yellow circles, H/ACA snoRNAs; Bottom, Northern blot validation. NT, no transfection; EV, empty vector. RNA marker III was used to indicate RNA sizes. Denatured RNAs were separated on 1% agarose gel. Note that similar RNA separations were obtained by both denatured PAGE gels (Figure 2) and native (shown here) agarose gels.
Format: JPEG Size: 233KB Download file
To further explore sno-lncRNA candidates, we developed a custom computational pipeline to predict sno-lncRNAs by integrating snoRNA annotations with poly(A)-/ribo- RNA-seq datasets from human , rhesus and mouse (GEO:GSE53942) (Figure 1B, Methods). By applying this pipeline to multiple poly(A)-/ribo- RNA-seq datasets, 19 sno-lncRNAs were identified from different species and/or different cell lines (Table 1). Two additional sno-lncRNAs were predicted in H9 cells and, importantly, both could be validated by Northern blots in human H9, HeLa-J and PA1 cells (Additional file 1). Three more sno-lncRNAs were further predicted from ENCODE cell lines (Table 1), suggesting that more sno-lncRNAs could be identified when this prediction pipeline is applied to other trancriptomes. Furthermore, it is expected that more sno-lncRNAs will be identified after improvements in snoRNA annotation. Interestingly, only one sno-lncRNA could be predicted from the entire mouse ESC transcriptomes used in this study (Table 1). This mouse sno-lncRNA could be validated by Northern blots from different murine cell lines, as indicated below, but its homolog expression was much lower in rhesus and undetectable in human.
Although the majority of predicted sno-lncRNAs in human contain either box C/D snoRNAs or box H/ACA snoRNAs on both ends, we found one sno-lncRNA from ENCODE datasets that contains a box C/D snoRNA at one end and a box H/ACA snoRNA at the other end (highlighted with asterisk in Table 1). To demonstrate that the hybrid snoRNPs at the ends are also capable of generating sno-lncRNAs, we have constructed sno-lncRNA expression vectors, which contain a box C/D snoRNA at one end and a box H/ACA snoRNA at the other end to make artificial sno-lncRNAs with hybrid snoRNA ends (top panels of Figure 2A and B, Additional file 2A and B, and data not shown). Northern blots clearly demonstrated the successful expression of such sno-lncRNAs with hybrid snoRNA ends (bottom panels of Figure 2A and B, Additional file 2A and B, and data not shown). Thus, these data strongly indicate that either box C/D or box H/ACA snoRNP complex at each end is sufficient to protect of internal sequences in sno-lncRNAs from nuclease trimming after splicing and that multiple formats of sno-lncRNAs may exist in human transcriptomes.
Figure 2. Either H/ACA or C/D box snoRNP complexes are sufficient for sno-lncRNA formation (A) and (B) Northern bolts with denature PAGE gels show that sno-lncRNAs can be recapitulated after replacing the snoRNA end from C/D box snoRNA to H/ACA box snoRNA (A) or vice versa (B). Top, a schematic drawing of wild-type sno-lncRNAs (sno-lncRNA2 and sno-lnc5AC) or modified sno-lncRNAs (sno-lncRNA2-5A and sno-lnc5C-14) in the expression vector. Black/grey boxes, exons; Black bars, NB probes; Blue circles, C/D snoRNAs; Yellow circles, H/ACA snoRNAs; Bottom, Northern blot validation. NT, no transfection; EV, empty vector. RNA marker III was used to indicate RNA sizes. Total RNAs were separated on 3.5% denature PAGE gels with urea.
Low conservation of sno-lncRNAs with highly conserved snoRNA ends
Compared to coding genes, lncRNAs are generally expressed at low levels and are not well conserved, which have impeded their discovery and functional analyses (for review see ). We examined the sequence conservation of sno-lncRNAs by calculating PhastCons scores from multiple alignments of primate genomes. Such analysis revealed that the conservation of sno-lncRNAs is the lowest among other well-characterized lncRNAs  and predicted lincRNAs  (Figure 3A). However, the conservation of snoRNAs themselves is much higher than internal sequences of sno-lncRNAs or nearby exons (Figure 3B). For example, PWS region snoRNAs (SNORD116 cluster in human) exhibit a remarkably higher conservation across species than either the SNURF-SNRPN exons or introns (Figure 3C and Additional file 3). However, the high conservation of genomic sequences does not necessarily imply the expression of homologous RNAs at the transcriptome level, as indicated below (Figure 4 and Additional file 4 and Additional file 5). The expression of sno-lncRNAs is highly restricted to specific species and no homology could be detected from all three species among all predicted sno-lncRNAs (Table 1).
Figure 3. Highly conserved snoRNA ends from sno-lncRNAs. (A) Conservation analysis of sno-lncRNAs and other lncRNAs. sno-lncRNAs (11 of human sno-lncRNAs summarized in Table 1) are less conserved than other reported lncRNAs  or lincRNAs . The median phastCons score was labeled to indicate the average conservation level. (B) Conservation analysis of nearby exons, snoRNA ends and internal sequences of sno-lncRNAs. snoRNA ends exhibit much higher conservation than nearby exons and internal sequences. The median phastCons score was labeled to indicate the average conservation level. (C) Sequence conservation analysis of PWS region in different species with VISTA browser. PWS region snoRNAs (SNORD116 cluster, light blue) exhibit a remarkably higher conservation across species than SNURF-SNRPN exons and introns. Y-axis, species selected for comparing (left panel) and conservation levels (right panel); Red bars, human PWS region sno-lncRNAs; Blue circles, human PWS region snoRNAs (SNORD116 cluster); Black bars, exons of human PWS region sno-lncRNA host gene (SNURF-SNRPN). Colors of conserved regions were labeled by VISTA according to UCSC annotations (exons in blue and introns in red). Note that sno-lncRNA4 was previously identified as PWCR1 mRNA or small nucleolar RNA (snord116-20) according to UCSC annotation (blue). An analysis of all five PWS region sno-lncRNAs and their flanking sequences was highlighted in Additional file 3.
Figure 4. Unique expression of PWS region sno-lncRNAs across species. Normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) from human (A), rhesus (B) and mouse (C) were shown from UCSC genome browser with customer bigwig inputs. Confident sno-lncRNA signals were detected from human and rhesus ESCs, but not from mouse ESCs. Note there is no rhesus Refseq annotation in this regions, transcripts from de novo assembly (black lines) and transposed annotations from other species (blue lines) are shown.
Additional file 3. Sequence conservation analysis of PWS region across species. PWS region snoRNAs (SNORD116 cluster snoRNAs, light blue) exhibit a remarkably higher conservation across species than SNURF-SNRPN exons and introns. Y-axis, species selected for comparing (left panel) and conservation levels (right panel); Red bars, human PWS region sno-lncRNAs; Blue circles, human PWS region snoRNAs (SNORD116 cluster); Black bars, exons of human PWS region sno-lncRNA host gene (SNURF-SNRPN).
Format: JPEG Size: 677KB Download file
Additional file 4. Expression of PWS region in rhesus. Normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) in rhesus ESCs showed two highly expressed sno-lncRNAs in PWS region. Note that there is no RefGene annotation in rhesus, instead, homologues genes from other species are shown.
Format: JPEG Size: 557KB Download file
Additional file 5. Expression of PWS region in mouse. Normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) of PWS region in mouse ESC R1 and mouse hippocampus showed undetected expression of PWS region sno-lncRNAs. Note that mouse SNORD116 snoRNAs are over 50 kb away from mouse SNURF-SNRPN. These SNORD116 snoRNAs and their adjacent spliced ESTs are not expressed in mESCs, but are expressed in mouse hippocampus.
Format: JPEG Size: 540KB Download file
PWS region sno-lncRNAs are highly expressed in human, but undetectable in mouse
The genomic context of the PWS region is complex and the characterization of this region across species is still lacking comprehensive analysis. We first examined the genomic context of PWS region sno-lncRNAs by comparing their genomic sequences from different species. Given that sno-lncRNA formation depends on snoRNA sequences at both ends within a single intron, the highly conserved PWS region SNORD116 snoRNAs suggested the likelihood of the formation of PWS region sno-lncRNAs in other species. Compared to five in human (Figure 4A and Additional file 3), interrogation of poly(A)-/ribo- RNA-seq datasets revealed only two PWS region sno-lncRNAs from alternative spliced introns in rhesus ESC cells (Figure 4B and Additional file 4), but none in mouse cells (Figure 4C and Additional file 5). In addition, no clear evidence for PWS region sno-lncRNAs could be found in mouse brain/hippocampus in which the SNURF-SNRPN transcript and its downstream noncoding region are highly transcribed (Additional file 5), further indicating the absence of PWS region sno-lncRNAs in mouse.
SNORD116 snoRNAs are highly conserved from human and rhesus to mouse (Figure 3C and Additional file 3); however, obvious differences in their genomic locations were observed in the PWS region. In human/rhesus genomes, SNORD116s are located in introns of the parent SNURF-SNRPN transcript and some of them form snoRNA pairs in one alternative spliced intron, which results in the formation of sno-lncRNAs (Figures 4A and B). In the mouse genome, SNORD116s are located in introns of a series of spliced ESTs, which are located at least 50 kb away from the SNURF-SNRPN locus (Figure 4C and Additional file 5). Although there are expressed signals of spliced ESTs in the mouse hippocampus transcriptome (Additional file 5), no SNORD116 snoRNA pairs were found between these spliced ESTs, thus no sno-lncRNAs could be generated from this region in mouse. Taken together, although snoRNA ends are essential for the formation of sno-lncRNAs, the existence of highly-conserved snoRNAs alone is not sufficient for their formation.
The genomic region encoding 15q11-13 sno-lncRNAs is specifically deleted in human PWS. PWS is a multiple system disorder with a minimal paternal deletion in chr15 . The deficiency of SNORD116 snoRNAs within the minimal deletion has been thought to play an important role in the pathogenesis of PWS [20-22]. However, mouse models with SNORD116 deletions can only partially mimic PWS phenotypes, including metabolism and growth deficiency, but not obesity [28,29]. Although the mechanism of PWS pathogenesis still remains mysterious, the recent finding of sno-lncRNAs in the PWS region in human and their regulatory function in splicing has offered an additional functional layer of gene regulation underlying PWS pathogenesis . The finding of no expression of PWS region sno-lncRNAs in mouse indicates a possible limitation of the use of mouse models to study human PWS.
Characterization of PWS region sno-lncRNAs in rhesus revealed that they sequester Fox proteins like human sno-lncRNAs
We next inspected PWS region sno-lncRNAs in rhesus in greater detail. Since the genomic context of the PWS region is complex and the transcriptome annotation in rhesus is limited, we used SNURF-SNRPN exons to locate SNORD116 snoRNAs and PWS region sno-lncRNAs in rhesus. The rhesus SNURF-SNRPN exons were determined by transposing the human/mouse homologous exons (Figure 5A), and putative SNURF-SNRPN transcripts could be identified in rhesus by de novo assembly from a poly(A)+ RNA-seq dataset (GSE53942). Pair-wise sequence alignments between human (black bars in Figure 5A) and rhesus (grey bars in Figure 5A) suggested that SNURF-SNRPN exons exhibit high sequence similarity (Additional file 6). Interestingly, one human SNURF-SNRPN exon (the seventh) is repeated at least eight times in the rhesus PWS region; however it is unclear whether this repetitive region occurs only in rhesus or has been lost in humans during evolution.
Figure 5. Distinct landscapes of PWS regions across species. (A) Comparison of PWS region sno-lncRNAs and its parent SNURF-SNRPN exons in different species. Conserved SNURF-SNRPN exons were manually linked between human (black boxes of top panel) and rhesus (grey boxes of middle panel), and SNORD116 cluster snoRNAs (dark and light blues bars for human and rhesus, respectively) were located in introns. Note that several non-SNURF-SNRPN exons were annotated in mouse (empty boxes). The homogenous SNURF-SNRPN exons in rhesus are based on sequence homology and/or expression signals (Additional file 6). (B) Enrichment of Fox binding sites in PWS region sno-lncRNAs. Potential Fox protein binding sites were predicted in five human and two rhesus sno-lncRNAs.
Sequence alignment of SNORD116 snoRNAs and their parent SNURF-SNRPN exons revealed that two PWS region sno-lncRNAs in rhesus are similar to human PWS region sno-lncRNA3 and sno-lncRNA4, respectively. Although predicted rhesus SNORD116 snoRNAs are scattered among individual introns (Figure 5A), de novo assembly with rhesus poly(A)+ RNA-seq revealed a variety of alternatively spliced SNURF-SNRPN transcripts in rhesus, thus leading to the formation of sno-lncRNAs in rhesus SNURF-SNRPN region (Figure 4B, transcripts from de novo assembly shown in thick black lines).
Human PWS region sno-lncRNAs could function as molecular sponges by associating with Fox family splicing regulators and altering patterns of splicing . Due to the high similarity of rhesus PWS region sno-lncRNAs with human in the genomic context, we reasoned that they might function similarly as well. We thus scanned the rhesus sno-lncRNA sequence for Fox binding motifs, and identified an enrichment of Fox binding sites (Figure 5B), further indicating that rhesus PWS region sno-lncRNAs might also interact with Fox family splicing regulators and play a similar role in splicing regulation. On the other hand, the absence of PWS region sno-lncRNAs in mouse indicated that a similar regulation mechanism is absent in mouse.
In sum, PWS region sno-lncRNAs are highly expressed in human and rhesus, but are absent in mouse. The absence of PWS region sno-lncRNAs in mouse also suggests one possible reason to explain the failure of current mouse deletion models to fully recapitulate pathological features of human PWS [27-29]. However, we cannot exclude other regulatory pathways or mechanisms during PWS pathogenesis.
A non-human sno-lncRNA and its possible association with the regulation of lipid toxicity
Although none homologue of human sno-lncRNAs identified in this study could be detected in mouse transcriptomes, there is one highly-expressed sno-lncRNA predicted in mouse ESCs. This sno-lncRNA is flanked by SNORD33 and SNORD34 at its ends and is located in ribosomal protein L13a (RPL13A) gene (Figure 6A). Northern blots with different probes demonstrated the existence of this mouse sno-lncRNA of the expected size in several mouse cell lines with both native agarose gels (Figure 6B) and denatured PAGE gels (Additional file 7). Furthermore, this RPL13A region sno-lncRNA is highly expressed in mouse ESC cells, but less in other mouse lines (Figure 6B). Moreover, it can be recapitulated in expression vectors, as indicated in Figure 6C. Strikingly, even though both the sequences and structures of snoRNA ends and RPL13A exons are highly conserved from human to mouse (Additional file 8), the sno-lncRNA was not expressed in examined human cell lines, and expressed at very low levels in rhesus ESCs (Table 1).
Figure 6. Expression of a non-human sno-lncRNA in RPL13A region in mouse. (A) A highly expressed sno-lncRNA was indicated in mouse ESC transcriptomes. Top, normalized read densities of poly(A)-/ribo- (red) and poly(A)+ RNA-seq (black) in mouse ESCs. Note two ends of this sno-lncRNA (indicated below) map precisely to the intron-imbedded SNORD33 and SNORD34.(B) Northern blot validation of this sno-lncRNA in mouse cell lines. This sno-lncRNA was highly expressed in mouse R1 cells, but less expressed in other mouse cell lines (NIH 3 T3, VDβ and MEF) with probes for parent RPL13A exon (black bar, left panel) or SNORD33 (blue bar, right panel) by Northern blots. Top, a schematic drawing of RPL13A gene (black bars) and intron-embedded snoRNAs (SNORD35, SNORD34 and SNORD33, from left to right, blue bars). Note that the NB probe recognizing SNORD33 can also visualize other mouse snoRNAs due to the sequence similarity of snoRNAs. (C) Recapitulation of RPL13A region sno-lncRNA in NIH 3 T3 cells. Top, a schematic drawing of mouse sno-lncRNA flanked by its full length intron and exons in expression vector. Bottom, Northern blots of recapitulated mouse sno-lncRNA with probes recognizing different regions. NT, no transfection; EV, empty vector. Note that endogenous RPL13A region sno-lncRNA is lowly expressed in NIH 3T3 cells.
Additional file 7. Northern blot of mouse-specific sno-lncRNA in RPL13A region. Northern blot validation of mouse specific sno-lncRNA from multiple mouse cell lines. Total RNAs from ESC R1, NIH 3T3, VDβ and MEF were denatured and separated on 8% denatured PAGE gel. After separation, the gel was stained with ethidium bromide for rRNA/tRNA visualization (A), and transferred to membrane for Northern blot with probe for SNORD33 (blue bar of Figure 6B) after destaining (B). Positions for 5.8S, 5S rRNA, and tRNAs were indicated with ethidium bromide staining.
Format: JPEG Size: 183KB Download file
Additional file 8. Sequence conservation analysis of RPL13A region sno-lncRNA. Y-axis, species selected for comparing (left panel) and conservation levels (right panel); Red bars, a non-human sno-lncRNA; Blue circles, mouse snoRNAs (SNORD35, SNORD34, SNORD33 and SNORD32a, from left to right); Black bars, exons of the non-human sno-lncRNA host gene (RPL13A).
Format: JPEG Size: 319KB Download file
SnoRNAs within RPL13A introns are critical mediators of lipotoxic cell death in both hamster and mouse . Lipotoxic stress strongly induces expression of these snoRNAs, but has no effect on the steady state levels of the parent RPL13A gene . While it is unclear whether RPL13A region sno-lncRNA is also involved in lipotoxicity like its snoRNA ends, our finding offers another possible regulation for gene expression in this region and it will be of interest to study the function of this RPL13A region sno-lncRNA.
Species-specific alternative splicing leads to the formation of the RPL13A region sno-lncRNA in the mouse
The analyses of PWS region sno-lncRNAs and the mouse RPL13A region sno-lncRNA strongly indicated that sno-lncRNAs are expressed in a species-specific manner. To determine whether there are differences in the biogenesis process of sno-lncRNAs in different species, we individually transfected expression vectors for mouse sno-lncRNA into human cells or expression vectors for human sno-lncRNA into mouse cells. Interestingly, both species-specific sno-lncRNAs could be recapitulated in cultured cells from other species (Additional file 9), suggesting that species-specificity of sno-lncRNAs is mainly derived from their genomic context instead of from the underlying biogenesis machinery.
Additional file 9. Transfection of species-specific sno-lncRNA into cell lines derived from different species. (A) Transfection of human sno-lncRNA ito mouse NIH 3T3 cell line generates the human sno-lncRNA as revealed by NB. NT, no transfection; EV, empty vector. (B) Transfection of mouse sno-lncRNA to human HeLa-J cell line generates the mouse sno-lncRNA as revealed by NB. NT, no transfection; EV, empty vector.
Format: JPEG Size: 120KB Download file
As mentioned above, snoRNAs generally are located within introns, but most of them do not exist as pairs in single introns (Figure 1A). With known intron annotations, only a few sno-lncRNAs could be identified (Figure 1A). However, using an intron-annotation-independent sno-lncRNA prediction strategy (Figure 1B), several additional novel sno-lncRNAs were identified in examined transcriptomes (Table 1), suggesting that previously uncharacterized alternative splicing events can generate snoRNAs pairs within one intron. For example, the PWS region sno-lncRNAs in rhesus could be generated from alternative spliced SNURF-SNRPN (Figure 4B). Taking the RPL13A region sno-lncRNA into consideration, the gene organization of RPL13A is highly conserved across species and SNORD33 and SNORD34 are usually confined to distinct single introns. Given the fact that only two snoRNA genes located within one intron can lead to the formation of a sno-lncRNA, this suggests that the existence of a previously uncharacterized alternative splicing event in the adjacent introns of SNORD33 and SNORD34. To test this possibility, we did de novo transcript assembly with a poly(A)+ RNA-seq dataset, and successfully identified the alternative splicing event that results in the location of SNORD33 and SNORD34 into one intron (Figure 7A). Interestingly, a similar alternative splicing event could be assembled in rhesus (Additional file 10), but not in human (Figure 7B), which is consistent with the lack of an expression signal for a RPL13A region sno-lncRNA in human. In addition, RT-PCR results further confirmed the alternative splicing in mouse, but none in human (Figure 7C). Thus, the alternative splicing of RPL13A gene in mouse leads to the formation of a species-specific sno-lncRNA. On the other hand, the lack of RPL13A alternative splicing in human likely prevents the expression of this sno-lncRNA. It should be noted that there is low expression of RPL13A region sno-lncRNA in rhesus, together with a low level of the necessary alternative splicing (Additional file 10).
Figure 7. Species-specific RPL13A region sno-lncRNA is derived from species-specific alternatively spliced rpl13a transcripts. (A)De novo transcript assembly revealed previously uncharacterized alternative spliced rpl13a transcripts (mouseR1_pAplus_cufflinks). At least two new rpl13a isoforms (back arrows on right) were identified to splice out an intron containing SNORD33 and SNORD34. Y-axis, normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) of mouse ESC and hippocampus transcriptomes. (B) No such alternative spliced rpl13a transcripts in human ESC H9 and HeLa cells from both de novo assembly (h9_pAplus_cufflinks) and known annotation (blue lines). Y-axis, normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) of human H9 and HeLa transcriptomes. (C) Validation of alternative spliced rpl13a transcripts by RT-PCR. Left, the previously uncharacterized alternative splicing event of RPL13A gene could only be detected in mouse, but not in human, with RT-PCR. Right, a schematic drawing of different splicing events and their validation primers in RPL13A. Black bars, RPL13A exons; Blue circles, snoRNAs; Red arrows, PCR primer sets. (D) The alternative splicing of RPL13A causes amino acid changes at the C-terminal of RPL13A protein. Top, schematic diagram of canonical splicing of RPL13A and its amino acid sequence; Bottom, schematic diagram of uncharacterized splicing of RPL13A and its amino acid sequence.
Additional file 10. Species-specific RPL13A region sno-lncRNA is derived from species-specific alternative spliced rpl13a transcripts in rhesus.De novo transcript assembly revealed previously uncharacterized alternative spliced rpl13a transcripts (rhesus_pAplus_cufflinks). One new rhesus rpl13a isoform (indicated by arrow) was identified to splice out a large intron containing SNORD33 and SNORD34. Y-axis, normalized read densities of poly(A)-/ribo- RNA-seq (red) and poly(A)+ RNA-seq (black) of rhesus ESC transcriptomes.
Format: JPEG Size: 377KB Download file
Very interestingly, further analyses revealed that the specific RPL13A alternative splicing event that leads to the production of sno-lncRNA could also generate a protein of altered amino acid sequence in both mouse (Figure 7D) and rhesus (data not shown), although the change of protein sequence and its consequence need to be further experimentally confirmed. As a 60S ribosomal subunit protein, RPL13A is highly conserved across species and plays an essential role in protein synthesis. Therefore, the finding of alternative splicing of RPL13A has implicated a possible role in the regulation of 60S ribosome assembly or function in a species-specific manner. Since a large diversification of splicing exists between tissues and species [31,32] and sno-lncRNAs are expressed with tissue- and species-specific patterns, it quite likely that more such RNAs will be uncovered when additional tissues and species samples are examined. Taken together, alternative splicing not only increases the diversity of coding mRNAs/proteins, but also expands transcriptome complexity by promoting the formation of noncoding RNAs from untranslated intron sequences.
We explored non-polyadenylated transcriptomes (poly(A)-/ribo-) from human, rhesus and mouse, and systematically annotated sno-lncRNAs across species. Although primary sequence analysis revealed that snoRNA ends of such molecules are highly conserved, PWS region sno-lncRNAs are highly expressed in human and rhesus, but absent in mouse. The absence of PWS region sno-lncRNAs in mouse suggested a possible reason for the failure of the current mouse model to fully recapitulate pathological features of human PWS. Only one mouse sno-lncRNA was identified from the limited available mouse datasets in RPL13A region, and snoRNAs themselves in this region have been suggested to be involved in lipotoxicity in mouse. Our results also demonstrated that the formation of sno-lncRNAs often requires alternative splicing within their parent genes, indicating a complex regulatory network of coding and noncoding parts of the genome.
Annotation of snoRNAs across species
Annotated human snoRNAs derived from snoRNABase  (https://www-snorna.biotoul.fr/ webcite) were downloaded from UCSC Genome Bioinformatics database (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/wgRna.txt.gz webcite, updated on 2010/10/3). 132 mouse snoRNA annotations were downloaded from RefSeq database (http://www.ncbi.nlm.nih.gov/refseq/ webcite, downloaded on 2013/3/4). 375 putative rhesus snoRNAs were transposed from human and mouse snoRNA annotations using liftOver (http://genome.ucsc.edu/cgi-bin/hgLiftOver webcite) with minMatch = 0.95 and combined together to be used as rhesus snoRNA annotations. All these snoRNA annotations were overlapped respectively with relevant gene annotations according to their species-derivation (Human: UCSC Genes, updated on 2012/2/5; Rhesus: RefSeq Genes, updated on 2013/3/24; Mouse: UCSC Genes, updated on 2011/5/30) to find snoRNA pairs (at least two snoRNAs in one intron) in introns, as indicated in Figure 1A. SnoRNA pairs in the same introns were further examined from poly(A)-/ribo- RNA-seq datasets to identify putative sno-lncRNAs.
Sequencing read alignment and transcript de novo assembly
The poly(A)+ or poly(A)-/ribo- RNA-seq reads were uniquely aligned to relevant genomes (Human: hg19, GRCh37; Rhesus: rheMac3, BGI CR_1.0; Mouse: mm9, NCBI37) using TopHat 2.0.8  (parameters: -g 1 -a 6 -i 50 --microexon-search --coverage-search -m 2) with existing annotations (Human: UCSC Genes, updated on 2012/2/5; Rhesus: RefSeq Genes, updated on 2013/3/24; Mouse: UCSC Genes, updated on 2011/5/30), respectively. To facilitate the identification of potential sno-lncRNAs, Bowtie 0.12.9  (parameters: -v 3 -k 1 -m 1) was also employed to map poly(A)-/ribo- RNA-seq reads to annotated genome references. Expression level (RPKM) of annotated genes (including snoRNAs) was obtained with customized pipeline (Zhu et al., in preparation). Cufflinks v2.0.2  (parameters: -F 0) was employed to assemble poly(A)+ RNA-seq mapping results to obtain de novo RNA transcripts. All mapping results were normalized and uploaded to the UCSC Genome Browser (http://genome.ucsc.edu/ webcite) for visualization.
Computational pipeline to identify sno-lncRNAs from poly(A)-/ribo- RNA-seq datasets
To systematically identify sno-lncRNAs independently from known gene annotation, we developed a custom sno-lncRNA identification pipeline (termed as SNOLNCfinder), as indicated in Figure 1B. Briefly, for each two adjacent snoRNAs (distance < 10 kb), RPKMs of snoRNA regions and their internal regions were calculated with a customized pipeline (Zhu et al., in preparation) from Bowtie mapped poly(A)-/ribo- RNA-seq reads. A putative sno-lncRNA was selected with 1) both snoRNA pairs are expressed with RPKM ≥ 1; 2) at least 80% of the internal region between snoRNA pairs have poly(A)-/ribo- RNA-seq signals by sliding window examination (Figure 1B); and 3) relatively high expression of the internal region (at least 40% of expression of snoRNA pairs). All candidates were manually inspected by comparing the poly(A)+ and poly(A)-/ribo- RNA-seq datasets. This pipeline is independent on known gene annotation and can be successfully employed in human, rhesus and mouse datasets to identify new sno-lncRNAs. RNA-seq datasets used here were from human ESC H9 cells and HeLa cells(GEO:GSE24399), ENCODE cell lines (GEO: GSE26284). RNA-seq files for rhesus ESCs, mouse ESCs and mouse hippocampus can be accessed from the NCBI Sequence Read Archive by Gene Expression Ominbus accession number (GEO:GSE53942).
Genomic sequence comparison with VISTA
VISTA Browser  (http://genome.lbl.gov/vista/ webcite) was employed to inspect the conservation landscape for a given region from different genomes, including human (Feb. 2009), Cow (Oct. 2011), Mouse (Dec. 2011 or Jul. 2007), Callithrix jacchus v.2.0.2 (Jun. 2007), Rhesus (Jan. 2006), Pongo pygmaeus abelii v.2.0.2 (Jul. 2007), Gorilla (Dec. 2009), Chimp (Mar. 2006), Rat (Nov. 2004), Dog (May 2005) and Horse (Jan. 2007).
SNURF-SNRPN sequence comparative analysis
Putative rhesus snoRNA116s (Figure 4B) were marked with human and mouse annotations (Figures 4A and C) in UCSC. Locations of putative rhesus SNURF-SNRPN exons were defined according to rhesus poly(A)+ RNA-seq mapping signals. Sequences of human (Figure 4A) and rhesus (Figure 4B) SNURF-SNRPN exons were extracted from UCSC Genome Bioinformatics database (http://genome.ucsc.edu/ webcite). Pair-wise sequence alignments were carried out using T-Coffee .
Fox protein binding site prediction on PWS sno-lncRNAs
Sequences of human PWS sno-lncRNAs and rhesus putative PWS sno-lncRNAs were extracted from UCSC Genome Bioinformatics database (http://genome.ucsc.edu/ webcite). All these sequences were scanned for Fox hexanucleotide motifs including UGCAUG, GCAUGU, GUGAUG, UGGUGA and GGUGGU .
Conservation analysis with PhastCons
PhastCons scores for multiple alignments of primate genomes (http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/phastCons46wayPrimates.txt.gz webcite, updated on 2009/12/6) were downloaded from UCSC and corresponding PhastCons scores for lncRNAs , lincRNAs  and sno-lncRNAs (11 predicted in human, Table 1) were counted separately to inspect the conservation difference of these three datasets. PhastCons scores for nearby exons, snoRNAs at both ends of sno-lncRNAs and internal regions of sno-lncRNAs were also calculated separately to investigate the region-specific conservation difference of sno-lncRNAs.
Cell culture, cell transfection and antisense oligonucleotide treatment
All cell lines were cultured using standard protocols. Plasmid transfection was carried out with X-tremeGENE 9 (Roche) or with nucleofection (Lonza) according to the manufacturer’s instructions. Rhesus rhesus RNAs were extracted from ESC line IVF3.2 . Mouse RNAs were extracted from ESC R1 line or sacrificed mouse hippocampus, respectively. Mice were maintained and used in accordance with the guidelines of the Institutional Animal Care and Use Committee of Shanghai Institutes for Biological Sciences.
SNORD116-14 in pcDNA3-sno-lncRNA2  was replaced with SNORA5A to generate construct pcDNA3-sno-lncRNA2-5A (Figure 2A), and SNORA5A in pcDNA3-sno-lnc5AC was substituted with SNORD116-14 to generate constructs pcDNA3-sno-lnc5C-14 (Figure 2B) with primers listed in Table S1. Mouse sno-lnc33/34 and human sno-lnc5AC flanked by its full length intron, splice sites and exons were cloned into pcDNA3 (Figure 6C).
RNA Isolation, poly(A)-/ribo- fractionation, RNA-seq and Northern Blot
Cultured cell lines or cells with different treatments were harvested in Trizol (Invitrogen) and RNAs were extracted according to the manufacturer’s instruction, followed by DNase I treatment at 37°C for 30 mins (Ambion, DNA-free™ Kit). Poly(A)+ and poly(A)-/ribo- RNA transcripts were fractionated and sequenced as previously described . Raw sequencing dataset and bigWig track file of rhesus and mouse poly(A)-/ribo- RNAs are available for download from NCBI Gene Expression Omnibus under accession number GSE53942 for mouse ESCs, mouse hippocampus and rhesus ESCs. Northern Blot was carried out according to the manufacturer’s protocol (DIG Northern Starter Kit, Roche). Denatured RNAs were loaded on either native agarose gel or denatured PAGE gel with urea for Northern Blots as previous studies [15,16]. Digoxigenin (Dig) labeled antisense and sense probes were made using either SP6 or T7 RNA polymerase by in vitro transcription with the AmpliScribe™ SP6 and T7 High Yield Transcription Kits (Epicentre). DIG-labeledRNA Molecular Wight Marker III is from Roche.
The authors declare that they have no competing interests.
LY and LLC designed the project. XOZ and LY performed bioinformatics analyses. QFY, HBW, YZ, TC, PZ and XL performed experiments. LY, LLC and XOZ analyzed the data. LY and LLC wrote the paper with inputs from other authors. All authors read and approved the final manuscript.
We are grateful to Gordon Carmichael for critical reading of the manuscript and all lab members for helpful discussion and technical support. H9 cells were obtained from the WiCell Research Institute. RNA-seq was performed at CAS-MPG Partner Institute for Computational Biology Omics Core, Shanghai, China. This work was supported by grants 2014CB964800 and 2014CB910600 from MOST, XDA01010206 and 2012OHTP08 from CAS, 31322018, 31271376 and 31271390 from NSFC.
Clark MB, Amaral PP, Schlesinger FJ, Dinger ME, Taft RJ, Rinn JL, Ponting CP, Stadler PF, Morris KV, Morillon A, Rozowsky JS, Gerstein MB, Wahlestedt C, Hayashizaki Y, Carninci P, Gingeras TR, Mattick JS: The reality of pervasive transcription.
PLoS Biol 2011, 9:e1000625.
discussion e1001102PubMed Abstract | Publisher Full Text | PubMed Central Full Text
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M, Pham P, Cheuk S, Karlin-Newmann G, Liu SX, Lam B, Sakano H, Wu T, Yu G, Miranda M, Quach HL, Tripp M, Chang CH, Lee JM, Toriumi M, Chan MMH, Tang CC, Onodera CS, Deng JM, Akiyama K, Ansari Y, et al.: Empirical analysis of transcriptional activity in the Arabidopsis genome.
Landgraf P, Rusu M, Sheridan R, Sewer A, Iovino N, Aravin A, Pfeffer S, Rice A, Kamphorst AO, Landthaler M, Lin M, Socci ND, Hermida L, Fulci V, Chiaretti S, Foa R, Schliwka J, Fuchs U, Novosel A, Muller RU, Schermer B, Bissels U, Inman J, Phan Q, Chien M, Weir DB, Choksi R, Vita GD, Frezzetti D, Trompeter HI, et al.: A mammalian microRNA expression atlas based on small RNA library sequencing.
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Roder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, et al.: Landscape of transcription in human cells.
Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo R: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression.
Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, Cabili MN, Jaenisch R, Mikkelsen TS, Jacks T, Hacohen N, Bernstein BE, Kellis M, Regev A, Rinn JL, Lander ES: Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals.
Cavaille J, Buiting K, Kiefmann M, Lalande M, Brannan CI, Horsthemke B, Bachellerie JP, Brosius J, Huttenhofer A: Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization.
Sahoo T, del Gaudio D, German JR, Shinawi M, Peters SU, Person RE, Garnica A, Cheung SW, Beaudet AL: Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar RNA cluster.
de Smith AJ, Purmann C, Walters RG, Ellis RJ, Holder SE, Van Haelst MM, Brady AF, Fairbrother UL, Dattani M, Keogh JM, Henning E, Yeo GSH, O’Rahilly S, Froguel P, Farooqi S, Blakemore AIF: A deletion of the HBII-85 class of small nucleolar RNAs (snoRNAs) is associated with hyperphagia, obesity and hypogonadism.
Duker AL, Ballif BC, Bawle EV, Person RE, Mahadevan S, Alliman S, Thompson R, Traylor R, Bejjani BA, Shaffer LG, Rosenfeld JA, Lamb AN, Sahoo T: Paternally inherited microdeletion at 15q11.2 confirms a significant role for the SNORD116 C/D box snoRNA cluster in Prader-Willi syndrome.
Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, Kim T, Misquitta-Ali CM, Wilson MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ: The evolutionary landscape of alternative splicing in vertebrate species.