<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2164-8-244</ui>
   <ji>1471-2164</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Weile</snm>
               <fnm>Christian</fnm>
               <insr iid="I1"/>
               <email>cweile@yahoo.dk</email>
            </au>
            <au id="A2" ca="yes">
               <snm>Gardner</snm>
               <mi>P</mi>
               <fnm>Paul</fnm>
               <insr iid="I1"/>
               <email>pgardner@binf.ku.dk</email>
            </au>
            <au id="A3">
               <snm>Hedegaard</snm>
               <mi>M</mi>
               <fnm>Mads</fnm>
               <insr iid="I1"/>
               <email>mmhedegaard@bi.ku.dk</email>
            </au>
            <au id="A4">
               <snm>Vinther</snm>
               <fnm>Jeppe</fnm>
               <insr iid="I1"/>
               <email>jvinther@bi.ku.dk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Molecular Evolution Group, Department of Molecular Biology, University of Copenhagen, Ole Maal&#248;es Vej 5, Building 4.1.27, DK-2200 Copenhagen N, Denmark</p>
            </ins>
         </insg>
         <source>BMC Genomics</source>
         <issn>1471-2164</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>244</fpage>
         <url>http://www.biomedcentral.com/1471-2164/8/244</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17645787</pubid>
               <pubid idtype="doi">10.1186/1471-2164-8-244</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>13</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>23</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>23</day>
               <month>7</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Weile et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Within the last decade a large number of noncoding RNA genes have been identified, but this may only be the tip of the iceberg. Using comparative genomics a large number of sequences that have signals concordant with conserved RNA secondary structures have been discovered in the human genome. Moreover, genome wide transcription profiling with tiling arrays indicate that the majority of the genome is transcribed.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have combined tiling array data with genome wide structural RNA predictions to search for novel noncoding and structural RNA genes that are expressed in the human neuroblastoma cell line SK-N-AS. Using this strategy, we identify thousands of human candidate RNA genes. To further verify the expression of these genes, we focused on candidate genes that had a stable hairpin structures or a high level of covariance. Using northern blotting, we verify the expression of 2 out of 3 of the hairpin structures and 3 out of 9 high covariance structures in SK-N-AS cells.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results demonstrate that many human noncoding, structured and conserved RNA genes remain to be discovered and that tissue specific tiling array data can be used in combination with computational predictions of sequences encoding structural RNAs to improve the search for such genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The sequencing of the human genome marked the starting point of a very difficult task: to make sense of the enormous amount of information stored in the genome by annotating the functionally important regions. Emphasis was initially put on the protein coding DNA sequences, which are generally well conserved and can easily be converted into the corresponding protein sequence. However, in recent years it has become clear that large parts of the noncoding DNA present in the human genome is functional and that noncoding genes may be as abundant as protein coding genes <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
         <p>Central to this realization has been the sequencing of additional mammalian genomes. Comparative genomics have demonstrated that the fraction of the human genome that is under purifying selection is much larger than the part that makes up the protein coding sequence, suggesting that many non protein coding regions of the genome have important functions <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Conserved sequence elements in promoter, intron and untranslated regions (UTRs) control transcription and processing of mRNAs <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Moreover, distant enhancer elements also influence transcription over long distances.</p>
         <p>In fact, such noncoding enhancer elements are the most highly conserved regions of the human genome <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Another class of conserved noncoding sequence is the RNA genes that are transcribed, but does not encode any protein. Instead the functions of these genes depend on the RNA itself, which can be unstructured or adopt functional secondary structures through internal base pairing or pairing to other RNA molecules.</p>
         <p>In this way RNA can act as enzymes, structural scaffolds and cofactors for proteins. Structural RNA gene sequences are often less well conserved than protein coding and regulatory sequences, since it is the RNA secondary structure that is conserved rather than the primary sequence. Recently, computational methods that can detect the signatures of conserved RNA structure in aligned DNA sequences have been developed and have revealed that the human genome contains many thousands of potential structural RNA genes <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Some of these can be assigned to known RNA gene families such as tRNA, rRNAs, snoRNAs and miRNAs, while others have no assigned functions. A common theme seems to be that many noncoding RNA genes have a very restricted expression. Often, they have low or no EST coverage, but this does not necessarily mean that they are not expressed and nonfunctional <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. An interesting example of this is the noncoding RNA (ncRNA) HAR1F that has undergone strong positive selection in the human lineage and are expressed only in Cajal-Retzius neurons in the developing human neocortex from 7 to 19 gestational weeks <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Such spatial and temporal restricted expression makes it a daunting task to verify expression of computationally predicted structural RNAs <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. This may be especially true for RNA genes expressed in the brain, which is a very complex organ estimated to have thousands of different cell types.</p>
         <p>Advances in array technology have allowed unbiased genome wide analysis of RNA transcription using tiling arrays of overlapping probes spanning the entire euchromatic part of the human genome <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. These RNA expression studies demonstrate that a large proportion of the human genome is transcribed and that the transcription is more complex than previously anticipated with extensive use of alternative promoters, splicing and polyadenylation. So far tiling array analysis has been performed on RNA from a limited number of cell lines, but these experiments nevertheless indicate that large parts of the human genome are transcribed. These findings are supported by findings from large scale cDNA cloning efforts that also find high transcriptional diversity and many ncRNAs <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
         <p>We have combined data from structural RNA gene prediction <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> with tiling array data from the neuroblastoma cell line SK-N-AS <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr></abbrgrp> to identify novel structural RNA genes expressed in this cell line. Using this strategy, we identify thousands of human candidate RNA genes that are most likely expressed in SK-N-AS cells. The list of candidates can be found at the CRUFTS homepage <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. For verification of expression we focused on candidates having energetically favorable hairpin structures or a high level of covariance. Using northern blotting, we verify the expression of 2 out of 3 of the hairpins structures. Moreover, 3 out of 9 of the structures with high covariance could be detected by northern in SK-N-AS cells.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The identification of ncRNAs has been facilitated by comparative genomics and development of methods to detect RNA expression on a genome wide scale. In this work we combine genome tiling array expression data <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr></abbrgrp> with genome sequence conservation <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and secondary structure information <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> in an effort to identify novel ncRNAs in the human genome.</p>
         <p>The genome tiling array data is derived from phase 2 of Affymetrix tiling array studies <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Here, 10 chromosomes (6, 7, 13, 14, 19, 20, 21, 22, X and Y) of the human genome, corresponding to ~30% of the non-repetitive portion of the genome, are tiled upon microarrays at 5 base-pair intervals. Only non-repetitive regions are tiled due to the risk of cross hybridisation and the difficulty of determining which genomic region a multi-copy transcript is derived from. For this study we have used data from the neuroblastoma cell line (SK-N-AS) that was analyzed using a hidden Markov model trained to discriminate between transcribed and untranscribed regions <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The combined conservation and secondary structure track is derived from a study using structural information on the conserved fraction of the human genome <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B16">16</abbr></abbrgrp>. The method is based upon a secondary structure prediction algorithm for folding sequence alignments <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> combined with an algorithm (called RNAz) <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> that has been trained to discriminate between sequence alignments of ncRNA sequences and their randomized counterparts <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>We intersected 88,319 genomic regions predicted to be expressed in SK-N-AS cells by tiling array analysis <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B13">13</abbr></abbrgrp> with 91,677 genomic regions predicted to contain conserved secondary structure (Figure <figr fid="F1">1</figr>)<abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. To improve sensitivity, we used the least conservative prediction of secondary structure for the intersection. To further improve the predictions, we obtained multi-species alignments from UCSC table browser <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> of human (hg17), chimpanzee (panTro1), dog (canFam1), mouse (mm5), rat (rn3), chicken (galGal2), zebrafish (danRer1) and Fugu (fr1) for the regions that showed evidence of both expression and structure. These alignments were re-scored with RNAz using more stringent settings. This produced 32,439 CRUFTS (Conserved RNAs of Unidentified Function that are Transcribed and Structured), which when collapsed into overlapping regions these map to 6,534 unique genomic regions.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Strategy used to identify structural non coding RNA genes Schematic representation of the work-flow used to identify and verify CRUFTS</p>
            </caption>
            <text>
               <p>Strategy used to identify structural non coding RNA genes Schematic representation of the work-flow used to identify and verify CRUFTS. Multispecies conservation data [2], structured alignments [43] and the tiling array data [10,13] have all been published. For details and references see main text.</p>
            </text>
            <graphic file="1471-2164-8-244-1"/>
         </fig>
         <p>To investigate if the CRUFTS contained already known ncRNAs, we used available annotations of human ncRNAs <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The ncRNAs used were: Xist, Telomerase RNA, HVG-1,2&amp;3, H19, RNase MRP, RNase P, tRNAs, Pseudo-tRNAs, rRNAs, small cytosolic RNAs (SRP, hY1, hY3, hY4, hY5), miRNAs and snoRNAs. The classical ncRNAs such as rRNA, tRNA, SRP etc. are classified as repeats by RepeatMasker <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and are therefore not present in the CRUFT dataset. Also, some rRNA, tRNAs and SRPs were absent in the final set due to difficulties of producing correct genome alignments for these regions, which is critical for secondary structure prediction with RNAz. In subsequent versions of the genome alignments (17-way and beyond) these difficulties appear to have been overcome <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>. Of the 32,439 CRUFTS, 240 overlap the remaining known ncRNAs in our control data set (see Table <tblr tid="T1">1</tblr>), consistent with not all of these being expressed in the SK-N-AS cell line and not all ncRNA being detected by the RNAz algorithm. Moreover, it is noteworthy that the SK-N-AS tiling array data used for our analysis is based on hybridization of cDNA originating from polyA selected RNA to the array, which probably excludes some ncRNAs from the CRUFTS dataset. All in all, after removing the known ncRNAs and CRUFTS overlapping 3' UTRs, we have 5,629 potential novel non-overlapping ncRNAs in the CRUFTS dataset. To further refine the dataset and reduce the number of false positive among the CRUFTS, we compared a number of parameters for the CRUFTS with those from the known ncRNAs (Figure <figr fid="F2">2</figr>). We find that the CRUFTS have a mean pairwise identity (PID) distribution that is similar to that of the control ncRNA set, except that many more CRUFTS have structures that have PIDs above 95% (Figure <figr fid="F2">2A</figr>). Previously, it has been shown that secondary structure signals are largely lost below 65% identity and above 95% identity there is little supporting information from mutational analysis <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Moreover, the RNAz algorithm detects many structures having PID above 95% and it is currently not known, if these represent new structural RNAs that are more highly conserved than known ncRNAs or false positives <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. We also noted that that CRUFTS generally have sequence coverage in fewer species than the known ncRNAs (Figure <figr fid="F2">2B</figr>), which reflects that the ncRNAs in the known ncRNA set are well conserved. The covariance and RNAz SVM probability distributions of the CRUFTS are similar to the corresponding distributions of the ncRNAs (Figure <figr fid="F2">2C</figr> and <figr fid="F2">2D</figr>), but the known ncRNAs cluster in the RNAz high probability fraction. After considering the distributions of these different statistics, we applied the filters shown in Table <tblr tid="T2">2</tblr> to enrich for CRUFTS resembling the known ncRNAs in the dataset. These filters resulted in a 10-fold reduction of the amount of data (from 32439 to 3243 CRUFTS or 6534 to 1593 non-overlapping regions) and increased the enrichment of known ncRNAs 2.17 fold, which is highly significant (p = 6.6e-8) (see Table <tblr tid="T1">1</tblr>). Of the 1593 non-overlapping regions present in the filtered CRUFTS dataset, 1314 are potential novel ncRNAs (i.e. not a known ncRNA and not located in an 3 ' UTR).</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Enrichment of known ncRNAs in subsets of CRUFTS</p>
            </caption>
            <tblbdy cols="3">
               <r>
                  <c ca="left">
                     <p>Scheme/Overlap</p>
                  </c>
                  <c ca="left">
                     <p>ncRNA Enrichment</p>
                  </c>
                  <c ca="left">
                     <p>ncRNA Families</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>All</p>
                  </c>
                  <c ca="left">
                     <p>1.00 (1.00)</p>
                  </c>
                  <c ca="left">
                     <p>135 miRNA, 21 rRNA, 58 snoRNA, 9 snRNA, 17 Mt-tRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Filtered (Table 2 parameters)</p>
                  </c>
                  <c ca="left">
                     <p>2.17 (6.630e-08)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, rRNA, snoRNA, snRNA, Mt-tRNA</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mRNA/EST</p>
                  </c>
                  <c ca="left">
                     <p>0.64 (1.000)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, snRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>mRNA/EST (no UTR or exon)</p>
                  </c>
                  <c ca="left">
                     <p>2.00 (7.000e-05)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, snRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>5' UTR</p>
                  </c>
                  <c ca="left">
                     <p>0.97 (0.5806)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Intron</p>
                  </c>
                  <c ca="left">
                     <p>0.97 (0.1148)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, rRNA, snoRNA, Mt-tRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>3' UTR</p>
                  </c>
                  <c ca="left">
                     <p>0.00 (1.000)</p>
                  </c>
                  <c ca="left">
                     <p>-</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Intergenic</p>
                  </c>
                  <c ca="left">
                     <p>1.21 (4.448e-03)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, rRNA, snRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>EvoFold</p>
                  </c>
                  <c ca="left">
                     <p>4.67 (5.271e-06)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>InDel selection</p>
                  </c>
                  <c ca="left">
                     <p>1.74 (&lt;2.2e-16)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, rRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>InDel selection (no miRNA)</p>
                  </c>
                  <c ca="left">
                     <p>0.46 (0.4115)</p>
                  </c>
                  <c ca="left">
                     <p>rRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Transposonfree (10 k)</p>
                  </c>
                  <c ca="left">
                     <p>0.51 (0.9786)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Transposonfree (5 k)</p>
                  </c>
                  <c ca="left">
                     <p>1.17 (0.09059)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, rRNA, snoRNA</p>
                  </c>
               </r>
               <r>
                  <c cspan="3">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Covariance top 300</p>
                  </c>
                  <c ca="left">
                     <p>4.05 (4.277e-04)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA, snoRNA, snRNA</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RNAz probability top 300</p>
                  </c>
                  <c ca="left">
                     <p>6.76 (8.367e-09)</p>
                  </c>
                  <c ca="left">
                     <p>miRNA</p>
                  </c>
               </r>
            </tblbdy>
            <tblfn>
               <p>Enrichment values of known ncRNAs in each filtering method or genome annotation. Column 2 contains the degree of enrichment of the known ncRNAs for each dataset compared to the "All" CRUFTS dataset. P-values for the enrichment were calculated using Fishers' exact test. In the final column, the ncRNA families contained within the annotation is indicated. See text for details and references.</p>
            </tblfn>
         </tbl>
         <tbl id="T2">
            <title>
               <p>Table 2</p>
            </title>
            <caption>
               <p>Parameters used for filtering the CRUFTS</p>
            </caption>
            <tblbdy cols="2">
               <r>
                  <c ca="left">
                     <p>Feature</p>
                  </c>
                  <c ca="center">
                     <p>Threshold</p>
                  </c>
               </r>
               <r>
                  <c cspan="2">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RNAalifold covariation measure</p>
                  </c>
                  <c ca="center">
                     <p>&lt; 0</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Number of species</p>
                  </c>
                  <c ca="center">
                     <p>>4</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>Mean pairwise sequence identity</p>
                  </c>
                  <c ca="center">
                     <p>65% &lt; and &lt; 95%</p>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>RNAz SVM probability</p>
                  </c>
                  <c ca="center">
                     <p>> 0.90</p>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Characteristics of CRUFTS and known ncRNAs</p>
            </caption>
            <text>
               <p>Characteristics of CRUFTS and known ncRNAs. Histograms showing the distributions of mean-pairwise sequence identity A), species-coverage B), covariance C) and RNAz probability D) for the CRUFTs and known ncRNAs.</p>
            </text>
            <graphic file="1471-2164-8-244-2"/>
         </fig>
         <p>To further characterize our CRUFTS data set we mapped a number of other genome annotations to the CRUFTS. Using annotations from the Refseq database <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and the human EST database, we find that CRUFTS overlapping with known ncRNA are enriched in intergenic regions and regions that have mRNA/EST evidence, but no overlapping exon or a UTR sequence (see Table <tblr tid="T1">1</tblr>). This corresponds to what one would expect given the types of ncRNAs in the control ncRNA set and suggests that CRUFTS located in intergenic regions and having mRNA/EST evidence, but no overlapping exon or a UTR sequence are more likely to represent true ncRNA genes.</p>
         <p>Of particular interest is a study by Pedersen et al. that implemented a probabilistic approach (called EvoFold) based on phylogenetic stochastic context-free grammars to predict conserved secondary structures in the human genome <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. In contrast to the RNAz algorithm, EvoFold does not use folding energy to predict RNA structures, but rather calculates the probability of an RNA structure, while taking the phylogeny into consideration. We find that the EvoFold and RNAz CRUFTS enrich for known miRNAs (Table <tblr tid="T1">1</tblr>, p = 5.2e-6), showing that these two structural RNA gene finders complement each other and that the CRUFTS overlapping EvoFold predictions are more likely to be miRNAs than the CRUFTS in general. Many of the CRUFTS are located in intergenic regions that have no known function. Two approaches that have the potential to detect genomic regions that are under purifying selection have recently been published <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>. Lunter et al. searched the genome for insertion and deletion (indel) free regions <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and found clear evidence of purifying selection against indels in many regions of the genome. Interestingly, the majority of indel free regions are located outside protein coding genes and most known miRNA genes are located within indel free regions <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. We find that CRUFTS that overlap an indel free region of the genome are significantly enriched in known ncRNA (Table <tblr tid="T1">1</tblr>, P-value &lt; 2e-16). These observations suggest that the CRUFTS that overlap indel free regions of the human genome are more likely to be ncRNAs (and miRNAs in particular) that have important functions sensitive to insertions and deletions in the sequence. Simons et al. have made a similar analysis of transposon-free regions of the human genome <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. As shown in Table <tblr tid="T1">1</tblr> the CRUFTS overlapping transposon-free regions were only slightly enriched for ncRNAs (P = 0.09 for the 5 kb regions), indicating that the known ncRNA are rather insensitive to insertion of transposons in a 5 kb window containing the ncRNA. All the CRUFT datasets and the annotation of these can be accessed at the CRUFTS homepage <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>Next, we wanted to experimentally verify the expression of some of the CRUFTS in the SK-N-AS cell line. When CRUFTS were ranked on the RNAalifold measure of covariance <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> known ncRNAs including miRNAs, snoRNAs and snRNAs were enriched in the top 300 rankings (p = 4.3e-4)(see Table <tblr tid="T1">1</tblr> and Figure <figr fid="F3">3A</figr>). We choose 9 structures from the top 25 CRUFTS ranked on covariance and designed complementary probes for northern blotting. Using RNA enriched for small RNAs and isolated from SK-N-AS cells, three out of the nine selected CRUFTS could be repeatedly detected by northern blotting using LNA modified DNA probes (Figure <figr fid="F3">3B</figr>). As a positive control we used the U68 H/ACA snoRNA, which ranked high on the covariance sorted list. A list of these investigated CRUFTS along with their predicted structures and the probes sequences can be found in Additional file <supplr sid="S1">1</supplr> and is exemplified for C3462 in figure <figr fid="F3">3C</figr>. The CRUFTS that were not detected by our northern blots may represent sequences that are not RNA genes or RNA genes that are expressed in SK-N-AS cells at levels below the detection level of our northern blots.</p>
         <suppl id="S1">
            <title>
               <p>Additional file 1</p>
            </title>
            <text>
               <p>alignments and structures of the experimentally investigated CRUFTS and the sequence of the probes used for northern blotting.</p>
            </text>
            <file name="1471-2164-8-244-S1.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Experimental verification of CRUFTS showing high RNAz covariance</p>
            </caption>
            <text>
               <p>Experimental verification of CRUFTS showing high RNAz covariance. A) Histogram showing enrichment of known ncRNAs in the top 300 CRUFTS sorted on covariance. B) Northern blots with specific LNA modified DNA probes for three high covariance CRUFTS. The U68 snoRNA was used as positive control. C) Alignment and conserved secondary structure of the CRUFTS C3462. The location of the probe used for detection is indicated. The positions in the alignments and the secondary structure are color-coded according to the conservation of the basepair interaction following the RNAz conventions [9]. Green indicates that 3 different types of pairs (e.g. G-C in human, G-U in dog and A-U in zebrafish) support the interaction. Yellow color coding indicates that the base pair is supported by 2 types of pairs and red that only a single pair-type supports the interaction. The intensity of color coding is fated with the number of sequences in conflict with the predicted interaction.</p>
            </text>
            <graphic file="1471-2164-8-244-3"/>
         </fig>
         <p>Alternatively, they may be expressed as part of long RNA transcripts that would not be detected in our northerns or be processed into smaller RNAs not targeted by our probes. The three CRUFTS that are detected by our probes do not match any of the profiles in the RFAM database and do not resemble any previously described ncRNA gene. The probes hybridize to RNAs in the range between 70 and 95 bp. This size range is typical of C/D snoRNAs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, but none of the candidates have canonical CD boxes, indicating that these CRUFTS expressed in the SK-N-AS cell line are not snoRNAs, but belong to currently uncharacterized ncRNA genes families. The C4796 CRUFTS is located intergenic, whereas C6194 and C3462 are located introns of latent transforming growth factor beta binding protein 2 (LTBP2) and transmembrane protease, serine 6 (TMPRSS6), respectively. All the three detected covariance CRUFTS are located in indel free regions <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. UCSC screenshots of the genomic neighborhoods of the detected covariance CRUFTS can be found in Additional file <supplr sid="S2">2</supplr>.</p>
         <suppl id="S2">
            <title>
               <p>Additional file 2</p>
            </title>
            <text>
               <p>UCSC screenshots of the genomic neighborhoods of the verified covariance CRUFTS.</p>
            </text>
            <file name="1471-2164-8-244-S2.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>The RNAz algorithm is dependent on folding energy and since miRNA genes generally form stable secondary structures consisting of a hairpin, RNAz shows high sensitivity for miRNA genes <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. When the CRUFTS were sorted according to their RNAz SVM probability, known and predicted miRNA genes were enriched in the top 300 ranking (p = 8.4e-9, see Figure <figr fid="F4">4A</figr>, Table <tblr tid="T1">1</tblr>) and many structures with miRNA like hairpins can be observed. We found that three of the CRUFTS within the TOP 300 RNAz SVM rankings overlapped with miRNAs candidates that previously have been predicted by phylogenetic shadowing by Plasterk and coworkers <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and also the indel free regions described by Lunter et al. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Using LNA modified DNA probes complementary to each side of these hairpin structures (2 probes for each candidate structure, see Figure <figr fid="F4">4C</figr> and Additional file <supplr sid="S1">1</supplr>), two of the three probe pairs hybridized specifically to SK-N-AS RNA enriched for small RNAs (Figure <figr fid="F4">4B</figr>).</p>
         <fig id="F4">
            <title>
               <p>Figure 4</p>
            </title>
            <caption>
               <p>Experimental verification of CRUFTS showing high RNAz SVM probability</p>
            </caption>
            <text>
               <p>Experimental verification of CRUFTS showing high RNAz SVM probability. A) Histogram showing enrichment of known ncRNAs in the top 300 CRUFTS sorted on RNAz SVM probability B) Northern blots with specific LNA modified DNA probes for three high covariance CRUFTS. The U68 snoRNA is the positive control. C) Alignment and conserved secondary structure of the CRUFTS C3462. The location of the probe used for detection is indicated. The positions in the alignments and the secondary structure are color-coded according to the conservation of the basepair interaction following the RNAz conventions [9]. Green indicates that 3 different types of pairs (e.g. G-C in human, G-U in dog and A-U in zebrafish) support the interaction. Yellow color coding indicates that the base pair is supported by 2 types of pairs and red that only a single pair-type supports the interaction. The intensity of color coding is faded fates with the number of sequences in conflict with the predicted interaction.</p>
            </text>
            <graphic file="1471-2164-8-244-4"/>
         </fig>
         <p>However, the signals observed with these probes were all in the 75&#8211;90 nt. range and we see no signal in the size range of mature miRNA. This was not due to loss of small RNAs in our RNA preparation, since a known miRNA (miR-20) was detected with a miR-20 specific probe (Figure <figr fid="F4">4A</figr>). The fact that we observe a signal of similar size with probes targeted to both sides of the putative miRNA hairpins indicates that the probes do detect a pre-miRNA like RNA hairpin expressed in the SK-N-AS cell line.</p>
         <p>During the course of this study, expression of the mature form of CRUFTS C4801 (candidate 225 from Berezikov et al., <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>) has been verified by cloning from mouse brain and by a modified microarray-based detection system (RAKE) <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Previously, it has been observed that miRNA-138 accumulates in the pre-miRNA form in the cytoplasm in some tissues and are only processed to the mature form in restricted tissues <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. We have tested a panel of cell lines originating from different tissues with probes for C2780 and C4801 and find that 75&#8211;90 nt RNAs are detected in most cell lines and tissues, but no RNAs corresponding to mature forms (~21 nts.) (Additional file <supplr sid="S4">4</supplr>). It is therefore possible that miRNA processing of C4801 and possibly C2780 is regulated and occurs only in restricted tissues. However, we cannot completely rule out that we fail to detect the mature miRNA forms of these CRUFTS miRNA candidates because our northern probes do not have sufficient overlap with the mature form of the miRNA. UCSC screenshots of the genomic neighborhoods of the detected hairpin CRUFTS can be found in Additional file <supplr sid="S3">3</supplr> online. Interestingly, C4801 is located close to miR-99b, miR-125a and miR-let-7e on chromosome 19, suggesting that C4801 is a new member of this miRNA cluster.</p>
         <suppl id="S3">
            <title>
               <p>Additional file 3</p>
            </title>
            <text>
               <p>UCSC screenshots of the genomic neighborhoods of the verified hairpin CRUFTS.</p>
            </text>
            <file name="1471-2164-8-244-S3.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional file 4</p>
            </title>
            <text>
               <p>Northern blots for the C2780 and C4801 CRUFTS on RNA isolated from the SK-N-AS, U87, U373, HeLa, C2C12, HUH-7 and MCF-7 cell lines.</p>
            </text>
            <file name="1471-2164-8-244-S4.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>Other studies have used strategies that a similar to ours in order to identify novel ncRNAs. Babak et al. <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> used the QRNA algorithm <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> to search for ncRNAs in human-mouse pairwise alignments from intergenic and intronic regions conserved between human and mouse and rat. A custom mouse DNA array with 6 probes for each of 3,478 predicted ncRNAs was hybridized with RNA from 16 mouse tissues.</p>
         <p>The 55 candidates that showed the highest signal on the array were chosen for northern blotting, which confirmed the expression of 8 candidates. Surprisingly, none of these candidates could be detected in human tissues, leading the authors to speculate that conserved and transcribed intergenic and intronic regions are not independent functional elements, but may have species or lineage specific functions <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. Babak et al. also investigate the overlap between their candidates and tiling array data <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and find that they do not overlap more than what would be expected by chance. Our study is not directly comparable with the study of Babak et al. We have used multiple alignments and RNAz <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> rather than pairwise alignments and QRNA <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> to predict conserved secondary structure. Moreover, we use the properties of the predicted secondary structures and the tiling data for filtering our predictions before verifying expression by northern blotting. These differences may explain that we have a higher success rate in our northern verifications. In another study, Washietl et al. used RNAz <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and EvoFold <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> secondary structure predictions to identify potential ncRNAs in the ENCODE regions <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. From a selection of 175 high-scoring predictions that was aided by visual inspection, 43 were detected by RT-PCR on RNA isolated from 6 different tissues. Interestingly, the predictions that are supported by tiling array expression were more likely to yield positive RT-PCR results (29% compared to 19% without support from tiling) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. These results support our finding that is possible to enrich for structural RNA genes by combining RNA structure predictions with tiling array data.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>We have integrated tiling array expression data with different annotations derived from comparative genomics to search for structural RNA genes that are expressed in the human neuroblastoma cell line SK-N-AS. In this way, we identified several thousand genomic regions (CRUFTS) that are strong candidates for being structural RNA genes. Using northern blotting, we verified the expression of 5 out of 12 investigated CRUFTS in the SK-N-AS cell line. Three of the verified CRUFTS can not be assigned to existing ncRNA families and could belong to novel ncRNA classes. The remaining two CRUFTS, which were detected by northern blotting, probably belong to the miRNA family. Our results indicate that many human noncoding, structured and conserved RNA genes remain to be discovered and that tiling array data can be used in combination with computational predictions of structural RNAs to detect novel ncRNA genes. Our strategy could easily be applied to other tiling array datasets and new annotations from comparative sequence analysis and should facilitate the identification of novel ncRNAs. The CRUFTS data can be accessed at the CRUFTS homepage <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Bioinformatic analysis</p>
            </st>
            <p>To produce a set of predictions enriched for both novel and known ncRNAs, we located overlapping regions of a conserved, structured RNA-like and an unbiased transcription annotation. The essential features of our pipeline are outlined in Figure <figr fid="F1">1</figr>. Beginning with the 88,319 genomic regions from the least conservative mammalian RNAz annotation <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B40">40</abbr></abbrgrp> and 93917 genomic regions from the ExpressHMM analysis of Affymetrix phase 2 human genome tiling arrays <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B41">41</abbr></abbrgrp>, we produced a dataset using the UCSC table browser <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> of 4,160 genomic regions that overlapped both the RNAz and expressHMM predictions.</p>
            <p>From these regions 7,703 alignments of genomic regions from human (hg17), chimpanzee (panTro1), dog (canFam1), mouse (mm5), rat (rn3), chicken (galGal2), zebrafish (danRer1) and Fugu (fr1) within the resulting regions were obtained using the UCSC table browser. These alignments were fed into the RNAz algorithm and rescored using the following parameters. The alignments were sliced into 120 long blocks with a step size of 20 and only alignments with more than 65 columns were reported. All slices with an SVM derived probability greater than 0.5 were reported. Both strands of the genome were tested for structure potential as the tiling array data is not strand specific. This resulted in 32,439 genomic regions or 6,534 regions if overlapping predictions are combined.</p>
            <p>The accuracy of the predictions was evaluated using a number of different annotations of human ncRNAs. Most of the ncRNAs used (214 miRNAs, 17 miscellaneous RNAs (Xist, Telomerase RNA, HVG-1,2 and 3, H19, RNase MRP, RNase P), 636 tRNAs, 705 rRNAs, 1805 small cytoplasmic RNAs (SRP, hY1, hY3, hY4, hY5) and 1103 snoRNAs) were mapped onto the human genome by Jones &amp; Eddy <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. In addition, we used the following ncRNA annotations: the ENSEMBL v37 ncRNA track, which annotates 4156 human ncRNAs <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, a set of 332 miRNAs obtained from miRBase (ver 8.0) <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, 1435 snoRNAs from snoRNA-LBME-db <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and 441 tRNA and 170 Pseudo-tRNAs obtained from the genomic tRNA database <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>Some predicted ncRNAs were also noted but these were not used for evaluating the accuracy of the predictions. These were 674, 133 and 975 miRNA candidates from miRMAP <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, the colorectal miRNAome <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> and miRNA shadowing <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> respectively. Overlaps with protein coding features were determined using the Refseq database <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Cell culture</p>
            </st>
            <p>SK-N-AS neuroblastoma cells (ATCC # CRL-2137) were cultured as mono-layers in Dulbecco's modified eagle medium (Invitrogen) supplemented with 2 mM L-glutamate (Invitrogen), 10% bovine fetal serum (Invitrogen) and antibiotics (penicillin 50 units/ml and streptomycin and 50 &#956;g/ml, Invitrogen) at 37&#176;C and 5% CO2. Cells for RNA extraction were harvested at passages 8&#8211;20 at 90&#8211;95% confluence.</p>
         </sec>
         <sec>
            <st>
               <p>Northern Blotting for small RNAs</p>
            </st>
            <p>RNA samples enriched for small RNAs were extracted using the mirVana extraction kit according to the recommendations of the manufacturer (Ambion). The integrity and concentration of the RNA samples was evaluated by spectrophotometry (Nano-drop ND-1000) and agarose gel electrophoresis.</p>
            <p>2 &#956;g of the small-selected RNA samples were run on 12% denaturing polyacrylamide gels together with the Decade marker (Ambion) for about 3 hours at 250 V. The gels were stained with ethidium bromide in 0,5 &#215; TBE for 45 min. The RNA was blotted onto Hybond+ N membranes (Amersham Biosciences) in a semidry blotter (BIO-RAD trans-blot SD) at 20 V for 1 hour and crosslinked twice with auto crosslinking settings in a UV Stratalinker 1800 from Stratagene. Crosslinked membranes were stored at 4&#176;C.</p>
            <p>20 pmol of LNA modified DNA oligos (Sigma-Proligo) were end-labeled with &#945;-<sup>32</sup>P UTP (3000 Ci/mmol, 10 mCi/ml, Amersham) using T4 PNK (Roche) and purified through NucAway spin columns according to the recommendations of the manufacturer (Ambion). 2&#8211;5 &#956;l (of 20 &#956;l total) of the eluates from the NucAway columns was added to 10 ml of Ultrahyb-Oligo hybridization buffer (Ambion) in hybridization tubes and used for hybridization of the blotted membranes over night at 42&#176;C in an Apollo HP9300 hybridization oven. The blotted membranes was washed twice at 68&#176;C for 30 min in wash buffer (2&#215; SSC and 0,5% SDS). Films (Kodak) were exposed to the blotted membranes 2&#8211;6 days at -80&#176;C using intensifying screens (Amersham). All northern blots were replicated at least twice with independent RNA preparations.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>CRUFTS: Conserved RNAs of Unidentified Function that are Transcribed and Structured, ncRNA: noncoding RNA.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CW performed and designed the northern blot experiments with help from JV and MHH. PPG did the bioinformatic analysis. JV wrote the manuscript with help from CW, PPG and MMH. PPG and JV conceived and designed the study.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This work is funded by a Carlsberg Foundation Grant (21-00-0680) to the Molecular Evolution Group.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>RNA regulation: a new genetics?</p>
            </title>
            <aug>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>316</fpage>
            <lpage>323</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1321</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131654</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Hou</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Spieth</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Richards</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1034</fpage>
            <lpage>1050</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182216</pubid>
                  <pubid idtype="pmpid" link="fulltext">16024819</pubid>
                  <pubid idtype="doi">10.1101/gr.3715005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Xie</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kulbokas</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <snm>Golub</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Mootha</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kellis</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>434</volume>
            <fpage>338</fpage>
            <lpage>345</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03441</pubid>
                  <pubid idtype="pmpid" link="fulltext">15735639</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Ultraconserved elements in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makunin</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Stephen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <fpage>1321</fpage>
            <lpage>1325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1098119</pubid>
                  <pubid idtype="pmpid" link="fulltext">15131266</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Identification and classification of conserved RNA secondary structures in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lindblad-Toh</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e33</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1440920</pubid>
                  <pubid idtype="pmpid" link="fulltext">16628248</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020033</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome</p>
            </title>
            <aug>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Pang</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Furuno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Okunishi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ru</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Gongora</snm>
                  <fnm>MM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>11</fpage>
            <lpage>19</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356124</pubid>
                  <pubid idtype="pmpid" link="fulltext">16344565</pubid>
                  <pubid idtype="doi">10.1101/gr.4200206</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>An RNA gene expressed during cortical development evolved rapidly in humans</p>
            </title>
            <aug>
               <au>
                  <snm>Pollard</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Salama</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Lambert</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lambot</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Coppens</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Katzman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Onodera</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Siepel</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>443</volume>
            <fpage>167</fpage>
            <lpage>172</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05113</pubid>
                  <pubid idtype="pmpid" link="fulltext">16915236</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Fast and reliable prediction of noncoding RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>2454</fpage>
            <lpage>2459</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548974</pubid>
                  <pubid idtype="pmpid" link="fulltext">15665081</pubid>
                  <pubid idtype="doi">10.1073/pnas.0409169102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brubaker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tammana</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>1149</fpage>
            <lpage>1154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1108625</pubid>
                  <pubid idtype="pmpid" link="fulltext">15790807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>987</fpage>
            <lpage>997</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1172043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998911</pubid>
                  <pubid idtype="doi">10.1101/gr.3455305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The transcriptional landscape of the mammalian genome</p>
            </title>
            <aug>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Frith</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Maeda</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Oyama</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <fpage>1559</fpage>
            <lpage>1563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112014</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141072</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A hidden Markov model approach for determining expression from genomic tiling micro arrays</p>
            </title>
            <aug>
               <au>
                  <snm>Munch</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Arctander</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>239</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1481622</pubid>
                  <pubid idtype="pmpid" link="fulltext">16672042</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-239</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>CRUFTS homepage</p>
            </title>
            <url>http://projects.binf.ku.dk/pgardner/CRUFTS/</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid" link="fulltext">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid" link="fulltext">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Secondary structure prediction for aligned RNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Fekete</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>319</volume>
            <fpage>1059</fpage>
            <lpage>1066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)00308-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12079347</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid" link="fulltext">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The UCSC Table Browser data retrieval tool</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D493</fpage>
            <lpage>D496</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308837</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681465</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>ncRNA annotations by Jones and Eddy</p>
            </title>
            <url>ftp://ftp.genetics.wustl.edu/pub/eddy/annotation/human-hg16</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Ensembl 2006</p>
            </title>
            <aug>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Caccamo</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Clarke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Coates</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cunningham</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Curwen</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Cutts</snm>
                  <fnm>T</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D556</fpage>
            <lpage>D561</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347495</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381931</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj133</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>miRBase: microRNA sequences, targets and gene nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Grocock</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>van</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D140</fpage>
            <lpage>D144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347474</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381832</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Lestrade</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Weber</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D158</fpage>
            <lpage>D162</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347365</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381836</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>The Genomic tRNA Database</p>
            </title>
            <aug>
               <au>
                  <cnm>GtRDB</cnm>
               </au>
            </aug>
            <url>Http://rna.wustl.edu/GtRDB/</url>
         </bibl>
         <bibl id="B25">
            <title>
               <p>RepeatMasker Open-3.0 1996&#8211;2004</p>
            </title>
            <url>http://www.repeatmasker.org</url>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Aligning multiple genomic sequences with the threaded blockset aligner</p>
            </title>
            <aug>
               <au>
                  <snm>Blanchette</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Riemer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Elnitski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>AF</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rosenbloom</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Clawson</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>ED</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <fpage>708</fpage>
            <lpage>715</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">383317</pubid>
                  <pubid idtype="pmpid" link="fulltext">15060014</pubid>
                  <pubid idtype="doi">10.1101/gr.1933104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>11484</fpage>
            <lpage>11489</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">208784</pubid>
                  <pubid idtype="pmpid" link="fulltext">14500911</pubid>
                  <pubid idtype="doi">10.1073/pnas.1932072100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A benchmark of multiple sequence alignment programs upon structural RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Wilm</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>2433</fpage>
            <lpage>2439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1087786</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860779</pubid>
                  <pubid idtype="doi">10.1093/nar/gki541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Structured RNAs in the ENCODE selected regions of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pedersen</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Korbel</snm>
                  <fnm>JO</fnm>
               </au>
               <au>
                  <snm>Stocsits</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gruber</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Hackermuller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hertel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lindemeyer</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Reiche</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tanzer</snm>
                  <fnm>A</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <fpage>852</fpage>
            <lpage>864</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1891344</pubid>
                  <pubid idtype="pmpid" link="fulltext">17568003</pubid>
                  <pubid idtype="doi">10.1101/gr.5650707</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D501</fpage>
            <lpage>D504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">539979</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608248</pubid>
                  <pubid idtype="doi">10.1093/nar/gki025</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Genome-wide identification of human functional DNA using a neutral indel model</p>
            </title>
            <aug>
               <au>
                  <snm>Lunter</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ponting</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Hein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Comput Biol</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e5</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1326222</pubid>
                  <pubid idtype="pmpid" link="fulltext">16410828</pubid>
                  <pubid idtype="doi">10.1371/journal.pcbi.0020005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Transposon-free regions in mammalian genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Simons</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Pheasant</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Makunin</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>164</fpage>
            <lpage>172</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1361711</pubid>
                  <pubid idtype="pmpid" link="fulltext">16365385</pubid>
                  <pubid idtype="doi">10.1101/gr.4624306</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Hsu</snm>
                  <fnm>PW</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>HD</fnm>
               </au>
               <au>
                  <snm>Hsu</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>LZ</fnm>
               </au>
               <au>
                  <snm>Tsou</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Tseng</snm>
                  <fnm>CP</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D135</fpage>
            <lpage>D139</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347497</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381831</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj135</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>A comparison of RNA folding measures</p>
            </title>
            <aug>
               <au>
                  <snm>Freyhult</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Moulton</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>241</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1274297</pubid>
                  <pubid idtype="pmpid" link="fulltext">16202126</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Phylogenetic shadowing and computational identification of human microRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guryev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>van de</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Wienholds</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>120</volume>
            <fpage>21</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.12.031</pubid>
                  <pubid idtype="pmpid" link="fulltext">15652478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>van</snm>
                  <fnm>TG</fnm>
               </au>
               <au>
                  <snm>Verheul</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>van de</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>van</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Vos</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Verloop</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>van de</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Guryev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Takada</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Genome Res</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <fpage>1289</fpage>
            <lpage>1298</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1581438</pubid>
                  <pubid idtype="pmpid" link="fulltext">16954537</pubid>
                  <pubid idtype="doi">10.1101/gr.5159906</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Post-transcriptional regulation of microRNA expression</p>
            </title>
            <aug>
               <au>
                  <snm>Obernosterer</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Leuschner</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Alenius</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martinez</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <fpage>1161</fpage>
            <lpage>1167</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1484437</pubid>
                  <pubid idtype="pmpid" link="fulltext">16738409</pubid>
                  <pubid idtype="doi">10.1261/rna.2322506</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription</p>
            </title>
            <aug>
               <au>
                  <snm>Babak</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Blencowe</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Hughes</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>104</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1199595</pubid>
                  <pubid idtype="pmpid" link="fulltext">16083503</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-6-104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Noncoding RNA gene detection using comparative sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Rivas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">64605</pubid>
                  <pubid idtype="pmpid" link="fulltext">11801179</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-2-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>RNAz Dataset</p>
            </title>
            <url>http://www1.bioinf.uni-leipzig.de/stefan/ncRNA/bed/set1_50.bed</url>
         </bibl>
         <bibl id="B41">
            <title>
               <p>ExpressHMM Dataset</p>
            </title>
            <url>http://www.binf.ku.dk/~kasper/wiki/Expresshmm.html</url>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The colorectal microRNAome</p>
            </title>
            <aug>
               <au>
                  <snm>Cummins</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Leary</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Pagliarini</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diaz</snm>
                  <fnm>LA</fnm>
                  <suf>Jr</suf>
               </au>
               <au>
                  <snm>Sjoblom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Barad</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Bentwich</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Szafranska</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Labourier</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>3687</fpage>
            <lpage>3692</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1450142</pubid>
                  <pubid idtype="pmpid" link="fulltext">16505370</pubid>
                  <pubid idtype="doi">10.1073/pnas.0511155103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
