<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-7-173</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Uzilov</snm>
               <mi>V</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>andrew.uzilov@gmail.com</email>
            </au>
            <au id="A2">
               <snm>Keegan</snm>
               <mi>M</mi>
               <fnm>Joshua</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>josh.keegan@gmail.com</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Mathews</snm>
               <mi>H</mi>
               <fnm>David</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>david_mathews@urmc.rochester.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Biochemistry &amp; Biophysics, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biostatistics &amp; Computational Biology, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA</p>
            </ins>
            <ins id="I3">
               <p>Center for Pediatric Biomedical Research, University of Rochester Medical Center, 601 Elmwood Avenue, Box 712, Rochester, New York 14642, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2006</pubdate>
         <volume>7</volume>
         <issue>1</issue>
         <fpage>173</fpage>
         <url>http://www.biomedcentral.com/1471-2105/7/173</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">16566836</pubid>
               <pubid idtype="doi">10.1186/1471-2105-7-173</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>30</day>
               <month>11</month>
               <year>2005</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>27</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>27</day>
               <month>3</month>
               <year>2006</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2006</year>
         <collab>Uzilov et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, Dynalign, a program for predicting secondary structures common to two RNA sequences on the basis of minimizing folding free energy change, is utilized as a computational ncRNA detection tool. The Dynalign-computed optimal total free energy change, which scores the structural alignment and the free energy change of folding into a common structure for two RNA sequences, is shown to be an effective measure for distinguishing ncRNA from randomized sequences. To make the classification as a ncRNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine. The latter method is much faster, but slightly less sensitive at a given specificity. Additionally, the classification support vector machine method is shown to be sensitive and specific on genomic ncRNA screens of two different <it>Escherichia coli </it>and <it>Salmonella typhi </it>genome alignments, in which many ncRNAs are known. The Dynalign computational experiments are also compared with two other ncRNA detection programs, RNAz and QRNA.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The Dynalign-based support vector machine method is more sensitive for known ncRNAs in the test genomic screens than RNAz and QRNA. Additionally, both Dynalign-based methods are more sensitive than RNAz and QRNA at low sequence pair identities. Dynalign can be used as a comparable or more accurate tool than RNAz or QRNA in genomic screens, especially for low-identity regions. Dynalign provides a method for discovering ncRNAs in sequenced genomes that other methods may not identify. Significant improvements in Dynalign runtime have also been achieved.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>RNA plays many important biological roles other than as a transient carrier of amino acid sequence information. It catalyzes peptide bond formation <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>, participates in protein localization <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, serves in immunity <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>, catalyzes intron splicing and RNA degradation <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, serves in dosage compensation <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, is an essential subunit in telomerase <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, guides RNA modification <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, controls development <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, and has an abundance of other regulatory functions <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>.</p>
         <p>Non-coding RNAs (ncRNAs) are transcripts that have function without being translated to protein. The number of known ncRNAs is growing quickly <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>, and their significance had been severely underestimated in classic models of cellular processes <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. It is desirable to develop high-throughput methods for discovery of novel ncRNAs for greater biological understanding and for discovering candidate drug targets.</p>
         <p>However, novel ncRNAs are difficult to detect in conventional biochemical screens <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>: they are frequently short <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B20">20</abbr></abbrgrp>, often not polyadenylated <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, and might only be expressed under specific cellular conditions <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Experimental screens have found many ncRNAs <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, but have demonstrated that no single screen is capable of discovering all known ncRNAs for an organism. A more effective approach, demonstrated in previous studies <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>, may be to first detect ncRNA candidates computationally, then verify them biochemically. Considering the number of available whole genome sequences <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp>, this approach can be applied to a large and diverse dataset, and has massive potential for novel ncRNA discovery.</p>
         <p>The effectiveness of a computational ncRNA detection/classification method is determined by measuring its sensitivity and specificity on a test set of known ncRNAs and negative sequences. Sensitivity and specificity are defined as:</p>
         <p>
            <m:math name="1471-2105-7-173-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
               <m:semantics>
                  <m:mrow>
                     <m:mtext>sensitivity</m:mtext>
                     <m:mo>=</m:mo>
                     <m:mfrac>
                        <m:mrow>
                           <m:mtext>true&#160;positives</m:mtext>
                        </m:mrow>
                        <m:mrow>
                           <m:mtext>true&#160;positives</m:mtext>
                           <m:mo>+</m:mo>
                           <m:mtext>false&#160;negatives</m:mtext>
                        </m:mrow>
                     </m:mfrac>
                     <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                     <m:mo stretchy="false">[</m:mo>
                     <m:mtext>eq</m:mtext>
                     <m:mo>.</m:mo>
                     <m:mtext>&#160;</m:mtext>
                     <m:mn>1</m:mn>
                     <m:mo stretchy="false">]</m:mo>
                  </m:mrow>
                  <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGZbWCcqqGLbqzcqqGUbGBcqqGZbWCcqqGPbqAcqqG0baDcqqGPbqAcqqG2bGDcqqGPbqAcqqG0baDcqqG5bqEcqGH9aqpdaWcaaqaaiabbsha0jabbkhaYjabbwha1jabbwgaLjabbccaGiabbchaWjabb+gaVjabbohaZjabbMgaPjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZbqaaiabbsha0jabbkhaYjabbwha1jabbwgaLjabbccaGiabbchaWjabb+gaVjabbohaZjabbMgaPjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZjabgUcaRiabbAgaMjabbggaHjabbYgaSjabbohaZjabbwgaLjabbccaGiabb6gaUjabbwgaLjabbEgaNjabbggaHjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZbaacaWLjaGaaCzcaiabcUfaBjabbwgaLjabbghaXjabc6caUiabbccaGiabigdaXiabc2faDbaa@80C7@</m:annotation>
               </m:semantics>
            </m:math>
         </p>
         <p>
            <m:math name="1471-2105-7-173-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
               <m:semantics>
                  <m:mrow>
                     <m:mtext>specificity</m:mtext>
                     <m:mo>=</m:mo>
                     <m:mfrac>
                        <m:mrow>
                           <m:mtext>true&#160;negatives</m:mtext>
                        </m:mrow>
                        <m:mrow>
                           <m:mtext>true&#160;negatives</m:mtext>
                           <m:mo>+</m:mo>
                           <m:mtext>false&#160;positives</m:mtext>
                        </m:mrow>
                     </m:mfrac>
                     <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                     <m:mo stretchy="false">[</m:mo>
                     <m:mtext>eq</m:mtext>
                     <m:mo>.</m:mo>
                     <m:mtext>&#160;2</m:mtext>
                     <m:mo stretchy="false">]</m:mo>
                  </m:mrow>
                  <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGZbWCcqqGWbaCcqqGLbqzcqqGJbWycqqGPbqAcqqGMbGzcqqGPbqAcqqGJbWycqqGPbqAcqqG0baDcqqG5bqEcqGH9aqpdaWcaaqaaiabbsha0jabbkhaYjabbwha1jabbwgaLjabbccaGiabb6gaUjabbwgaLjabbEgaNjabbggaHjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZbqaaiabbsha0jabbkhaYjabbwha1jabbwgaLjabbccaGiabb6gaUjabbwgaLjabbEgaNjabbggaHjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZjabgUcaRiabbAgaMjabbggaHjabbYgaSjabbohaZjabbwgaLjabbccaGiabbchaWjabb+gaVjabbohaZjabbMgaPjabbsha0jabbMgaPjabbAha2jabbwgaLjabbohaZbaacaWLjaGaaCzcaiabcUfaBjabbwgaLjabbghaXjabc6caUiabbccaGiabbkdaYiabc2faDbaa@8024@</m:annotation>
               </m:semantics>
            </m:math>
         </p>
         <p>where true positives are ncRNAs that are detected by the method, true negatives are sequences that are not ncRNA and are not classified as ncRNA by the method, false positives are sequences that are not ncRNA, but are classified as ncRNA by the method, and false negatives are ncRNAs that are missed by the method.</p>
         <p>Generally, there is a tradeoff between sensitivity and specificity &#8211; tailoring a computational method to increase one measurement may decrease the other. Throughout this paper, receiver operating characteristic (ROC) curves are used to visually express the quality of a ncRNA classification method by plotting sensitivity as a function of the false positive rate (1 &#8211; specificity), providing a complete description of all possible sensitivity/specificity tradeoffs. It should be noted that in a whole genome screen, high specificity is more essential than high sensitivity due to the large ratio of non-ncRNA sequence to ncRNA sequence. Low specificity results in an overwhelming number of false positives, swamping the number of true positives, and increasing the difficulty, time, and cost of a biochemical verification screen.</p>
         <p>It has been proposed that ncRNAs may form secondary structures that are more stable than would be expected from non-ncRNA sequences of the same nucleotide or dinucleotide composition <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. This hypothesis has been controversial; it has been suggested that it is not true, or at least that the stability difference is not statistically significant enough to be a sensitive and specific criterion for classifying sequences as ncRNA <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> (also claimed on the basis of a small set of tRNA in <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>). However, the program RNAz was recently reported <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> to use folding free energy changes of single sequences, combined with a structure conservation index (SCI) determined from a fixed, multiple sequence alignment, to effectively detect ncRNA. The SCI is the ratio of the consensus secondary structure free energy change (which also includes terms rewarding mutations evidencing structure conservation) determined by RNAalifold <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> to the average folding free energy change for each sequence determined alone. This indicates that incorporating secondary structure conservation into a model based on folding free energy change improves the quality of prediction.</p>
         <p>Here, the effectiveness of the program Dynalign <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp> as a tool for detection of ncRNA on the basis of predicted folding free energy change is investigated. Dynalign is a dynamic programming algorithm for simultaneously computing the lowest free energy common secondary structure and the structural alignment for two sequences. In brief, Dynalign minimizes &#916;G<sub>total</sub>:</p>
         <p>&#916;G&#176;<sub>total </sub>= &#916;G&#176;<sub>1 </sub>+ &#916;G&#176;<sub>2 </sub>+ (number of gaps in alignment) &#215; &#916;G&#176;<sub>gap penalty </sub>&#160;&#160;&#160;[eq. 3]</p>
         <p>where &#916;G&#176;<sub>1 </sub>and &#916;G&#176;<sub>2 </sub>are the predicted folding free energy changes of secondary structures of sequence 1 and sequence 2, respectively, and &#916;G&#176;<sub>gap penalty </sub>is a penalty applied for each gap in the alignment. Only conserved helices, i.e. those that appear in both sequences, are predicted. The conformational free energy changes are predicted using an empirical nearest neighbor model <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp> and &#916;G&#176;<sub>gap penalty </sub>was empirically determined by maximizing structure prediction accuracy <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Dynalign predicts secondary structure with significantly greater accuracy than single sequence structure prediction methods because of the additional information contained in the structural alignment <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. It requires no sequence identity between the two sequences to perform well because there are no energy terms (equation 3) that address sequence identity. Therefore, Dynalign is robust for cases in which extensive covariation of base-paired nucleotides exists as a result of sequence evolution.</p>
         <p>Dynalign is initially implemented in this paper as a computational ncRNA classifier by using it to compute the &#916;G&#176;<sub>total </sub>of an input sequence pair, then comparing that value to the mean of &#916;G&#176;<sub>total</sub>s of control sequence pairs generated specifically for that input pair. If the &#916;G&#176;<sub>total </sub>of the input sequence pair is sufficiently lower than the mean &#916;G&#176;<sub>total </sub>of the set of controls, the input sequences are classified as ncRNA. The <it>z </it>score is used to quantify this difference, defined as:</p>
         <p><it>z </it>= (<it>x </it>- &#956;)/&#963; &#160;&#160;&#160; [eq. 4]</p>
         <p>where <it>x </it>is the &#916;G&#176;<sub>total </sub>of the input sequence pair, and &#956; and &#963; are the mean and standard deviation of the &#916;G&#176;<sub>total</sub>s of sequence pairs in the control set, respectively. Therefore, the <it>z </it>score is just the number of standard deviations that the &#916;G&#176;<sub>total </sub>of the input sequence pair is above or below the mean of its set of controls.</p>
         <p>It should be noted that the definition of <it>z </it>score implies that the control set values follow a normal distribution, but it has been noted that the distribution of &#916;G&#176;s for single sequences is actually extreme value with skew towards lower folding free energies <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Tests (data not shown) suggest that the distributions of &#916;G&#176;<sub>total</sub>s of sequence pairs in control sets are also skewed towards lower free energies. However, the <it>z </it>score is an effective measure for classification and has been used in this manner elsewhere <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B41">41</abbr><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr></abbrgrp>.</p>
         <p>This approach is tested on a large database of known 5S rRNA and tRNA sequences and artificially generated negatives, demonstrating that the <it>z </it>score based on the &#916;G&#176;<sub>total </sub>can be used as a sensitive and specific classification measure. These results are also compared to RNAstructure <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, a dynamic programming algorithm for single sequence secondary structure prediction by free energy minimization. Also, a support vector machine (SVM) is implemented to speed the classification process by training an SVM classifier that does not require a control set for an input sequence pair.</p>
         <p>Additionally, the capability to use Dynalign as an effective genomic ncRNA screening tool is illustrated with a whole genome screen on a crude alignment of the <it>Escherichia coli </it>and <it>Salmonella typhi </it>genomes <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, which contain a significant number of known ncRNAs. Many methods have been employed for genomic screens for ncRNAs of specific families <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr><abbr bid="B58">58</abbr></abbrgrp>; benchmarks and discussion in this paper are focused on the premise of using Dynalign as a general genomic screening tool for diverse, novel ncRNAs.</p>
         <p>The above tests are benchmarked against two leading ncRNA prediction programs, QRNA <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> (version 2.0.2c) and RNAz <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> (version 0.1.1). RNAz uses a regression SVM to compute a <it>z </it>score for each sequence in a multiple sequence alignment, then uses the mean of those <it>z </it>scores and the SCI as input to a classification SVM. While structure predictions by Dynalign and RNAz are based on calculating the most stable secondary structure using experimentally determined thermodynamic parameters, QRNA uses a fully probabilistic covariance analysis approach that compares scores of three models &#8211; ncRNA, open reading frame, or other (null hypothesis) &#8211; for a pair of sequences.</p>
         <p>Unlike Dynalign, which optimizes its own structural alignment, both QRNA and RNAz require a fixed sequence alignment as input. It is shown here that at low pairwise sequence identity, the Dynalign approach outperforms the fixed alignment approach. Additionally, Dynalign is shown to be a more sensitive ncRNA finder on whole genome screen tests.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Improving time and memory performance of Dynalign</p>
            </st>
            <p>Dynalign's complexity is O(<it>N</it><sup>3</sup><it>M</it><sup>3</sup>) in time and O(<it>N</it><sup>2</sup><it>M</it><sup>2</sup>) in storage, where <it>N </it>is the length of the shorter sequence and <it>M </it>is the maximum separation parameter that limits the set of sequence alignments that are considered <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>. For nucleotide <it>i </it>in the first sequence to align to nucleotide <it>k </it>in the second sequence:</p>
            <p>| <it>i </it>- <it>k </it>| = <it>M </it>&#160;&#160;&#160; [eq. 5]</p>
            <p>must be satisfied. The <it>M </it>parameter therefore reduces the set of alignments that are considered by Dynalign and hence the computational cost. Similar constraints have been used by others <abbrgrp><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr><abbr bid="B62">62</abbr></abbrgrp> to provide computational tractability.</p>
            <p>To improve the efficiency of Dynalign, two strategies have been employed. The first was to recast the implementation of the <it>M </it>parameter to a form that scales with the difference in sequence length of the two sequences, so that for <it>i </it>and <it>k </it>to align:</p>
            <p>| <it>i </it>&#215; (<it>N</it><sub>2</sub>/<it>N</it><sub>1</sub>) - <it>k </it>| = <it>M </it>&#160;&#160;&#160; [eq. 6]</p>
            <p>must be satisfied, where <it>N</it><sub>1 </sub>is the total length of the first sequence and <it>N</it><sub>2 </sub>is the total length of the second sequence. This constraint automatically allows the 3' ends of the sequences (<it>i </it>= <it>N</it><sub>1 </sub>and <it>k </it>= <it>N</it><sub>2</sub>) to align for any <it>M </it>and any difference in sequence length. With equation 5, <it>M </it>had to be at least as large as the difference in lengths of the sequences in order for the 3' ends of the sequences to align. Now, with equation 6, significantly smaller <it>M </it>sizes can be used with Dynalign. For example, tRNA sequences can now be folded with an <it>M </it>= 6, where previously <it>M </it>= 15 was used, resulting in a significant runtime improvement without affecting accuracy.</p>
            <p>The second approach employed to accelerate Dynalign was to determine base pairs that are unlikely to form on the basis of single sequence folding and then not consider those pairs in the Dynalign calculation. Pairs that would result in secondary structures with free energy greater than the lowest free energy structure by more than 30%, as determined by energy dot plots <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>, are excluded from consideration in the Dynalign calculation <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>. Table <tblr tid="T1">1</tblr> shows that nearly 99% of known base pairs are found within this energy increment, hence this heuristic has little effect on the accuracy of Dynalign calculations. This pre-computation of structural information by single sequence secondary structure prediction is similar to approaches used by Hofacker <it>et al </it><abbrgrp><abbr bid="B61">61</abbr></abbrgrp> and Holmes <abbrgrp><abbr bid="B60">60</abbr></abbrgrp> to speed the alignment of RNA sequences using secondary structure information.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Percent of known base pairs in predicted suboptimal structures for single sequences.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6" ca="center">
                        <p>
                           <b>Maximum percent change in free energy from lowest free energy structure</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>RNA Type<sup>1</sup></b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>5%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>10%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>20%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>30%</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>50%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SSU (16S) rRNA</p>
                     </c>
                     <c ca="center">
                        <p>74.5 &#177; 21.9 (80.5 &#177; 16.0)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>88.1 &#177; 14.9 (96.8 &#177; 2.7)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 13.6 (97.2 &#177; 1.4)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>99.2 &#177; 8.2 (97.2 &#177; 1.4)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>99.3 &#177; 7.2 (97.2 &#177; 1.4)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>99.3 &#177; 3.4 (97.2 &#177; 1.4)<sup>2</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>LSU (23S) rRNA</p>
                     </c>
                     <c ca="center">
                        <p>84.4 &#177; 8.9 (91.9 &#177; 13.4)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>96.8 &#177; 3.8 (97.9 &#177; 1.2)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>98.1 &#177; 1.2 (98.0 &#177; 0.7)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>98.1 &#177; 1.2 (98.0 &#177; 0.7)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>98.1 &#177; 1.2 (98.0 &#177; 0.7)<sup>2</sup></p>
                     </c>
                     <c ca="center">
                        <p>98.1 &#177; 1.2 (98.0 &#177; 0.7)<sup>2</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>74.5 &#177; 25.6</p>
                     </c>
                     <c ca="center">
                        <p>88.1 &#177; 20.1</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 7.4</p>
                     </c>
                     <c ca="center">
                        <p>99.2 &#177; 1.6</p>
                     </c>
                     <c ca="center">
                        <p>99.3 &#177; 1.4</p>
                     </c>
                     <c ca="center">
                        <p>99.3 &#177; 1.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group I Intron</p>
                     </c>
                     <c ca="center">
                        <p>79.0 &#177; 12.6</p>
                     </c>
                     <c ca="center">
                        <p>93.5 &#177; 7.6</p>
                     </c>
                     <c ca="center">
                        <p>98.3 &#177; 1.4</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.4</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.4</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group I Intron &#8211; 2</p>
                     </c>
                     <c ca="center">
                        <p>74.4 &#177; 13.6</p>
                     </c>
                     <c ca="center">
                        <p>92.8 &#177; 8.0</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 1.7</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 1.7</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 1.7</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 1.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Group II Intron</p>
                     </c>
                     <c ca="center">
                        <p>91.9 &#177; 5.7</p>
                     </c>
                     <c ca="center">
                        <p>100.0 &#177; 0.0</p>
                     </c>
                     <c ca="center">
                        <p>100.0 &#177; 0.0</p>
                     </c>
                     <c ca="center">
                        <p>100.0 &#177; 0.0</p>
                     </c>
                     <c ca="center">
                        <p>100.0 &#177; 0.0</p>
                     </c>
                     <c ca="center">
                        <p>100.0 &#177; 0.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RNase P</p>
                     </c>
                     <c ca="center">
                        <p>79.8 &#177; 11.0</p>
                     </c>
                     <c ca="center">
                        <p>95.5 &#177; 2.6</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.2</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.2</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.2</p>
                     </c>
                     <c ca="center">
                        <p>98.4 &#177; 1.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>RNase P &#8211; 2</p>
                     </c>
                     <c ca="center">
                        <p>75.0 &#177; 7.9</p>
                     </c>
                     <c ca="center">
                        <p>95.6 &#177; 4.9</p>
                     </c>
                     <c ca="center">
                        <p>98.3 &#177; 1.5</p>
                     </c>
                     <c ca="center">
                        <p>98.3 &#177; 1.5</p>
                     </c>
                     <c ca="center">
                        <p>98.3 &#177; 1.5</p>
                     </c>
                     <c ca="center">
                        <p>98.3 &#177; 1.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SRP RNA</p>
                     </c>
                     <c ca="center">
                        <p>73.1 &#177; 25.2</p>
                     </c>
                     <c ca="center">
                        <p>90.7 &#177; 14.3</p>
                     </c>
                     <c ca="center">
                        <p>95.5 &#177; 8.2</p>
                     </c>
                     <c ca="center">
                        <p>97.1 &#177; 2.7</p>
                     </c>
                     <c ca="center">
                        <p>97.2 &#177; 2.6</p>
                     </c>
                     <c ca="center">
                        <p>97.2 &#177; 2.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>tRNA</p>
                     </c>
                     <c ca="center">
                        <p>87.0 &#177; 18.2</p>
                     </c>
                     <c ca="center">
                        <p>94.5 &#177; 13.3</p>
                     </c>
                     <c ca="center">
                        <p>97.9 &#177; 7.8</p>
                     </c>
                     <c ca="center">
                        <p>99.3 &#177; 4.6</p>
                     </c>
                     <c ca="center">
                        <p>99.6 &#177; 3.2</p>
                     </c>
                     <c ca="center">
                        <p>99.8 &#177; 1.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Average<sup>3</sup></b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>80.5 &#177; 6.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.4 &#177; 4.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>97.8 &#177; 1.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.7 &#177; 0.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.8 &#177; 0.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.8 &#177; 1.0</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>1</sup>The database of structures was assembled for previous studies of secondary structure prediction [48] and is derived from a diverse set of databases [76, 77, 81-84].</p>
                  <p><sup>2</sup>The large and small subunit rRNA sequences are divided into domains of less than 700 nucleotides for structure prediction. In parenthesis are the accuracies when the whole sequence is folded at once.</p>
                  <p><sup>3</sup>The average is calculated excluding the second database of Group II introns and RNase P sequences as was done in [48].</p>
                  <p>The percent of known base pairs contained in at least one predicted suboptimal structure within a specified percent difference in free energy from the minimum free energy. For example, 99.2% of known base pairs in 5S rRNA secondary structures are found on average within the predicted structures with less than 20% difference in free energy from the lowest free energy structure. These numbers were calculated from an energy dot plot using the thermodynamic parameters of [49]. On average, 98.7% of known base pairs occur in at least one suboptimal secondary structure within 20% or less difference in free energy from the minimum free energy structure. This accuracy remains similar as the percent energy difference is increased to 30% or 50%.</p>
               </tblfn>
            </tbl>
            <p>For the benchmarks performed previously <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, using these two methods does not lower the accuracy of Dynalign secondary structure predictions (Table <tblr tid="T2">2</tblr>). Table <tblr tid="T3">3</tblr> shows the calculation time and memory requirements for three pairs of RNA sequences with <it>N </it>from 77 to 217 before and after both of the above improvements. Calculations that use the improved Dynalign are completed in less than a twentieth of the time required for calculations using the previous Dynalign. The calculation time is now reduced to a level similar to FOLDALIGN <abbrgrp><abbr bid="B62">62</abbr></abbrgrp>, another dynamic programming algorithm that determines the secondary structure common to two unaligned sequences. The revised Dynalign is available for download from the Mathews lab website <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> as both source code for local compilation and as part of the RNAstructure package for Microsoft Windows.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Secondary structure prediction accuracy with and without speed improvements for tRNA and 5S rRNA sequences.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Without speed improvements</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>With speed improvements</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>RNA type</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sensitivity</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PPV</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Best sensitivity</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sensitivity</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>PPV</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Best sensitivity</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>tRNA</p>
                     </c>
                     <c ca="center">
                        <p>92.9 &#177; 12.6</p>
                     </c>
                     <c ca="center">
                        <p>92.7 &#177; 14.3</p>
                     </c>
                     <c ca="center">
                        <p>98.8 &#177; 4.3</p>
                     </c>
                     <c ca="center">
                        <p>93.0 &#177; 12.3</p>
                     </c>
                     <c ca="center">
                        <p>92.7 &#177; 14.1</p>
                     </c>
                     <c ca="center">
                        <p>99.1 &#177; 2.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5S rRNA</p>
                     </c>
                     <c ca="center">
                        <p>91.7 &#177; 7.0</p>
                     </c>
                     <c ca="center">
                        <p>82.0 &#177; 7.1</p>
                     </c>
                     <c ca="center">
                        <p>97.9 &#177; 3.2</p>
                     </c>
                     <c ca="center">
                        <p>91.7 &#177; 6.9</p>
                     </c>
                     <c ca="center">
                        <p>81.9 &#177; 7.0</p>
                     </c>
                     <c ca="center">
                        <p>98.0 &#177; 3.0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sensitivity is the percent of known base pairs correctly predicted. Positive predictive value (PPV) is the percent of predicted base pairs that are in the known structure. Best sensitivity is the sensitivity of the most sensitive structure in a set of 750 suboptimal structures, i.e. structures with folding free energy change similar to the lowest free energy structure. Sensitivity and positive predictive value are calculated as described previously [46]. The tRNA and 5S rRNA datasets are sets of randomly chosen sequences, set sizes 40 and 14, respectively, used for benchmarks previously [46]. Accuracies are reported as averages for all pairwise combinations of sequences, and single standard deviations are reported as errors. The average accuracy of secondary structure prediction by Dynalign is essentially unchanged by pre-filtering base pairs and changing the implementation of the <it>M </it>parameter. Without speed improvements, <it>M </it>was set to 15 for both tRNA and 5S rRNA. With speed improvements, <it>M </it>was 6 for tRNA and 7 for 5S rRNA for these benchmarks.</p>
               </tblfn>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Dynalign calculation time and memory requirements.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Dynalign before acceleration</b>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <b>Dynalign after acceleration</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>System</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sequence 1</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sequence 2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Length (nt)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Time (hr:min)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Memory (MB)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>
                              <it>M</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Time (hr:min)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Memory (MB)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>tRNA</p>
                     </c>
                     <c ca="left">
                        <p>RD0260</p>
                     </c>
                     <c ca="left">
                        <p>RE6781</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>0:22 (0:24)</p>
                     </c>
                     <c ca="center">
                        <p>33 (57)</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0:01 (0:01)</p>
                     </c>
                     <c ca="center">
                        <p>12 (24)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>5S rRNA</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>H. volcanii</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>A. globiformis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>122</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>1:11 (1:09)</p>
                     </c>
                     <c ca="center">
                        <p>76 (85)</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>0:03 (0:03)</p>
                     </c>
                     <c ca="center">
                        <p>21 (30)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>R2 3' UTR RNA</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>D. takahashii</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>217</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                     <c ca="center">
                        <p>26:05</p>
                     </c>
                     <c ca="center">
                        <p>491</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>0:39 (0:35)</p>
                     </c>
                     <c ca="center">
                        <p>81 (104)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Calculation times and memory use are reported for a 3.2 GHz Intel Pentium 4 with 1 GB RAM running Red Hat Enterprise Linux using the gcc 3.2.3-42 compiler. In parentheses are time and memory requirements on a laptop with a 3.06 GHz Pentium 4 processor and 1 GB of RAM running Microsoft Windows XP Professional using the Microsoft C++ .NET 2002 compiler. For Linux, CPU time is reported; for Windows, wall time is reported. "Length" is length of the first sequence. Sequences are obtained from [76, 77, 85, 86]. Note that this is the time requirement including suboptimal secondary structure prediction. Slightly less than half the computer time is required to find only the lowest free energy common structure.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Tests by <it>z </it>score classification of single sequences</p>
            </st>
            <p>To test the effectiveness of classifying single sequences as ncRNA on the basis of a folding free energy change, RNAstructure <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> was used to compute the minimum folding free energy change for a test set of 1,582 known 5S rRNA, tRNA, and negative sequences. A negative sequence was generated from each real ncRNA by the Altschul-Erikson sequence shuffle that exactly preserves the nucleotide and dinucleotide (i.e. AA, AU, AC, etc.) frequencies of the real ncRNA <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B65">65</abbr></abbrgrp>. Because the stabilities of base pairs are predicted using a nearest neighbor model that considers the sequence identity of two stacked pairs, negative and control sequences must preserve the dinucleotide frequencies of the original sequence, while also breaking the nested base pair structure <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. The negatives are needed to test the rate of false positive classification to determine specificity.</p>
            <p>To compute the <it>z </it>score, each sequence in the test set had a control set of 100 sequences generated specifically for it using the Altschul-Erikson shuffle, and their minimum folding free energy changes were determined by RNAstructure. The <it>z </it>score histograms for 5S rRNA, tRNA, and negative sequences generated from them are shown in Figure <figr fid="F1">1</figr>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Distribution of single sequence <it>z </it>scores for 5S rRNA, tRNA, and negative sequences</p>
               </caption>
               <text>
                  <p><b>Distribution of single sequence <it>z </it>scores for 5S rRNA, tRNA, and negative sequences</b>. Distributions of RNAstructure-predicted <it>z </it>scores computed on the basis of folding single sequences for 5S rRNA and negatives generated from them (left figure) and tRNA and negatives generated from them (right figure). Real ncRNA are white, negatives are black. Controls were generated by the Altschul-Erikson dinucleotide shuffle of original sequence, with 100 controls for each test set sequence. 309 5S rRNA sequences and 482 tRNA sequences, plus one negative sequence generated from each real sequence by the Altschul-Erikson shuffle, were used for the test set.</p>
               </text>
               <graphic file="1471-2105-7-173-1"/>
            </fig>
            <p>Sequences below a cutoff <it>z </it>score are classified as ncRNA. However, rather than pick a single <it>z </it>score cutoff for classification and report those results, iterations were performed over a wide range of <it>z </it>score cutoffs in order to construct an ROC curve (Figure <figr fid="F2">2</figr>) expressing sensitivity as a function of the false positive rate, thus showing the overall quality of the ncRNA classification method for all sensitivity/specificity tradeoffs. Figure <figr fid="F2">2</figr> shows that tRNA sequences are classified with better sensitivity (for all specificities) than 5S rRNA sequences using either method, which suggests that tRNA sequences have a lower predicted folding free energy than 5S rRNA sequences versus matched controls.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Quality of classification using the <it>z </it>score method for single sequences</p>
               </caption>
               <text>
                  <p><b>Quality of classification using the <it>z </it>score method for single sequences</b>. ROC curves showing quality of classification based on single sequences, using RNAstructure-predicted <it>z </it>scores for folding free energy change. The ncRNA sequences and controls are the same as in Figure <figr fid="F1">1</figr>. Red and green show results for 5S rRNA and tRNA, respectively, when tested separately; blue shows results when both are combined into a single test set.</p>
               </text>
               <graphic file="1471-2105-7-173-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Tests by <it>z </it>score classification of sequence pairs</p>
            </st>
            <p>To test the effectiveness of classifying pairs of sequences as ncRNA using Dynalign on the basis of the &#916;G&#176;<sub>total</sub>-based <it>z </it>score, <it>z </it>scores were determined for a test set of 3,302 known 5S rRNA, tRNA, and negative sequence pairs. Negative sequence pairs were generated from real sequence pairs by shuffling the columns in the real sequence pair gapped global alignment, then removing gaps. Three control generation methods were used for each sequence pair to randomize the nucleotide order and remove the nested base pair structure while preserving other sequence properties. Because Dynalign's computation time is greater than that of RNAstructure, the number of controls per sequence pair was limited to 20 to make the calculation time feasible.</p>
            <p>The first two control generation methods focus on preserving dinucleotide frequencies (i.e. frequencies of AA, AU, AC, etc.) and are applied to each sequence in the original pair separately, without regard for alignment. As with prediction from single sequences, because stability contributions of each base pair are dependent on the base pairs on which it is stacked, it may be necessary to control for the dinucleotide frequencies <abbrgrp><abbr bid="B41">41</abbr><abbr bid="B42">42</abbr></abbrgrp>. The first-order Markov chain sampling method for control sequence generation <it>approximately </it>preserves the original dinucleotide frequencies <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> (resulting in more variation in the control set), while the Altschul-Erikson shuffle method for control generation <it>exactly </it>preserves both nucleotide and dinucleotide frequencies, with the restriction that the first and last nucleotide of the shuffled sequence are exactly the same as the original <abbrgrp><abbr bid="B65">65</abbr></abbrgrp>.</p>
            <p>The third control generation method is a columnwise shuffle of a global alignment that approximately preserves the percent identity of the original sequence alignment. Although removing gaps and re-aligning the columnwise shuffled sequences results in a different alignment, the change in percent identity from the original sequence pair is not as drastic as with the other two control methods. For example, columnwise shuffling a sequence pair alignment, followed by re-alignment, results in a mean percent identity change of 2.57, with a standard deviation of 2.74; however, the mean and standard deviation of percent identity change if Altschul-Erikson shuffles are used are -11.30 and 9.28, respectively. It is reasonable that randomizing sequences separately and re-aligning them results in a greater change (in most cases, a decrease) in percent identity of the sequence pair, compared to shuffling the alignment in columns.</p>
            <p>ROC curves comparing effectiveness of the three control generation methods are plotted in Figure <figr fid="F3">3</figr>. The columnwise shuffle control method produces the highest sensitivities for all specificities, and is therefore the best approach of the three. The distributions of <it>z </it>scores for 5S rRNA, tRNA, and negative sequence pairs for trials using this control method are shown in Figure <figr fid="F4">4</figr>, and the separation of real ncRNA and negative sequences is significantly more distinct than in Figure <figr fid="F1">1</figr> (single sequence <it>z </it>scores). Additionally, the Altschul-Erikson shuffle control generation method is more effective than the first-order Markov chain sampling method.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of three methods for generating 20 controls from each input sequence pair</p>
               </caption>
               <text>
                  <p><b>Comparison of three methods for generating 20 controls from each input sequence pair</b>. ROC curves comparing three methods for generating a set of 20 controls from an input sequence pair to determine the <it>z </it>score for ncRNA classification using the Dynalign-computed &#916;G&#176;<sub>total</sub>. The test set contains 755 5S rRNA and 896 tRNA sequence pairs, plus one negative sequence pair generated from each real sequence pair, yielding 3,302 trial pairs total. All tests are run with the parameter <it>M </it>= 8. "dinuc controls" (green): controls are generated by sampling from a first-order Markov chain, approximately preserving dinucleotide frequencies of each original sequence. "AE controls" (orange): controls are generated by the Altschul-Erikson dinucleotide shuffle, exactly preserving dinucleotide frequencies of each original sequence. "column controls" (blue): controls are generated by a columnwise shuffle of a global sequence alignment, without regard for gap placement or local conservation.</p>
               </text>
               <graphic file="1471-2105-7-173-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Distribution of sequence pair <it>z </it>scores for 5S rRNA, tRNA, and negative sequences</p>
               </caption>
               <text>
                  <p><b>Distribution of sequence pair <it>z </it>scores for 5S rRNA, tRNA, and negative sequences</b>. Distribution of <it>z </it>scores computed using the Dynalign &#916;G&#176;<sub>total </sub>and the columnwise shuffle control method (<it>M </it>= 8) for 5S rRNA sequence pairs and negatives generated from them (left figure) and tRNA sequence pairs and negatives generated from them (right figure). Real ncRNA are white, negatives are black. Test set is the same as for Figure <figr fid="F3">3</figr>.</p>
               </text>
               <graphic file="1471-2105-7-173-4"/>
            </fig>
            <p>Each control generation method was also tried using two different values of the <it>M </it>parameter, <it>M </it>= 6 and <it>M </it>= 8, for computation of the input sequence pair and the control set &#916;G&#176;<sub>total</sub>s (columnwise shuffle control generation method results in Figure <figr fid="F5">5</figr>, complete results for all methods in <supplr sid="S1">Additional File 1</supplr> in "Additional Files"). It was found that the higher <it>M </it>parameter improves the quality of classification for all control generation methods, at the expense of longer runtime.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>Complete ROC curves for classification of sequence pairs by the Dynalign <it>z </it>score method. Adobe Acrobat PDF (version 4.0 or above) file showing complete ROC curves comparing effectiveness of Dynalign <it>z </it>score classification of sequence pairs using three control generation methods and two <it>M </it>parameter values (<it>M </it>= 6 and <it>M </it>= 8). This is the same sequence test set that Figures <figr fid="F3">3</figr>, <figr fid="F4">4</figr> and <figr fid="F5">5</figr> are based on. In all cases, increasing the value of the <it>M </it>parameter improves prediction quality. Dark and light green: controls generated by first-order Markov chain sampling, tests run using <it>M </it>= 6 and <it>M </it>= 8, respectively. Brown and orange: controls generated by Altschul-Erikson dinucleotide shuffle, tests run using <it>M </it>= 6 and <it>M </it>= 8, respectively. Dark and light blue: controls generated by the columnwise shuffle, tests run using <it>M </it>= 6 and <it>M </it>= 8, respectively.
</p>
               </text>
               <file name="1471-2105-7-173-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Higher <it>M </it>parameter improves quality of classification when using the <it>z </it>score method</p>
               </caption>
               <text>
                  <p><b>Higher <it>M </it>parameter improves quality of classification when using the <it>z </it>score method</b>. ROC curves comparing effectiveness of the best control generation method for sequence pairs (i.e. columnwise shuffle of a global sequence alignment) at parameters <it>M </it>= 6 (dark blue) and <it>M </it>= 8 (light blue). Test set is same as for Figure <figr fid="F3">3</figr>. For all other control generation methods, increasing the <it>M </it>parameter value likewise increases the quality of classification (see <supplr sid="S1">Additional File 1</supplr> in "Additional Files" for supporting figure).</p>
               </text>
               <graphic file="1471-2105-7-173-5"/>
            </fig>
            <p>The quality of the best control generation method is also examined when 5S rRNA and tRNA are separated into different sets and tested independently (Figure <figr fid="F6">6</figr>). It was discovered that this method is generally more sensitive at a given specificity for detecting 5S rRNA than tRNA (the opposite of the trend than observed in classification of single sequences in Figure <figr fid="F2">2</figr>). Finally, ncRNA classification using single sequences is compared with the sequence pair approach in Figure <figr fid="F7">7</figr>. This shows significantly better performance for the two sequence approach with Dynalign as compared to single sequences.</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Quality of classification using the <it>z </it>score method, broken down by ncRNA family</p>
               </caption>
               <text>
                  <p><b>Quality of classification using the <it>z </it>score method, broken down by ncRNA family</b>. ROC curves showing effectiveness of the best control generation method for sequence pairs (i.e. columnwise shuffle of global alignment at parameter <it>M </it>= 8) for 5S rRNA by itself (red), tRNA by itself (green), and both combined into one test set (blue). The 5S rRNA or tRNA sequences in the test set are the same as those used for the test set in Figures <figr fid="F3">3</figr>, <figr fid="F4">4</figr> and <figr fid="F5">5</figr>.</p>
               </text>
               <graphic file="1471-2105-7-173-6"/>
            </fig>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Comparison of the <it>z </it>score classification method using single sequences versus using sequence pairs</p>
               </caption>
               <text>
                  <p><b>Comparison of the <it>z </it>score classification method using single sequences versus using sequence pairs</b>. ROC curves comparing quality of classification based on single sequences versus based on sequence pairs, using the same free energy parameters for both. The single sequence curve (blue) is the same as in Figure <figr fid="F2">2</figr>. Black shows the best results for the sequence pair approach from Figure <figr fid="F3">3</figr> (i.e. control generation by columnwise shuffle of global sequence alignment at parameter <it>M </it>= 8), to illustrate the difference in prediction quality.</p>
               </text>
               <graphic file="1471-2105-7-173-7"/>
            </fig>
            <p>Because RNAz outputs the probability (<it>P </it>value) that an input sequence alignment is ncRNA, it is possible to construct an RNAz ROC curve for the same test set as Dynalign, except by varying the sensitivity/specificity tradeoff by iterating over <it>P </it>value cutoffs for classification. If a sequence alignment input to RNAz has a <it>P </it>value greater than the cutoff, it is classified as ncRNA. The quality of classification for RNAz as compared to the Dynalign <it>z </it>score method is shown in Figure <figr fid="F8">8</figr>. While RNAz is more sensitive at specificities above approximately 98.5%, the Dynalign <it>z </it>score method is more sensitive at lower specificities.</p>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Comparison of the Dynalign <it>z </it>score method with RNAz for sequence pairs of all identities</p>
               </caption>
               <text>
                  <p><b>Comparison of the Dynalign <it>z </it>score method with RNAz for sequence pairs of all identities</b>. ROC curves for the Dynalign <it>z </it>score classification method (running 20 controls for each input sequence pair to determine <it>z </it>score, <it>M </it>= 8; blue) and RNAz (red), both tested on the same test set of sequence pairs as in Figure <figr fid="F3">3</figr>.</p>
               </text>
               <graphic file="1471-2105-7-173-8"/>
            </fig>
            <p>RNAz requires pre-aligned sequences as input, which is a disadvantage at lower sequence identities because, for highly divergent sequences, an optimal <it>sequence </it>alignment prepared by an algorithm that minimizes an alignment identity score may not necessarily be the optimal <it>structural </it>alignment that takes into account the common secondary structures of the RNA sequences <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Dynalign does not suffer from this limitation because it simultaneously optimizes the common secondary structure and the structural alignment, and thus does not need pre-alignment of the input sequence pair. To illustrate this advantage of Dynalign over RNAz at low sequence pair identities, Figure <figr fid="F9">9</figr> compares the ROC curves of the Dynalign <it>z </it>score method and RNAz only for sequences in the test set that are below 50% identity. At this level of low sequence identity, the Dynalign <it>z </it>score method is more sensitive than RNAz at all specificities.</p>
            <fig id="F9">
               <title>
                  <p>Figure 9</p>
               </title>
               <caption>
                  <p>Comparison of the Dynalign <it>z </it>score method with RNAz for sequence pairs below 50% identity</p>
               </caption>
               <text>
                  <p><b>Comparison of the Dynalign <it>z </it>score method with RNAz for sequence pairs below 50% identity</b>. ROC curves for the Dynalign <it>z </it>score classification method (running 20 controls for each input sequence pair, <it>M </it>= 8; blue) and RNAz (red), both tested only on those sequence pairs from the Figure <figr fid="F3">3</figr> test set that have less than 50% sequence pair identity. Dynalign becomes more sensitive than RNAz at low sequence pair identities for all specificities.</p>
               </text>
               <graphic file="1471-2105-7-173-9"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Tests by support vector machine (SVM) classification</p>
            </st>
            <p>Generating a large number of controls for each input sequence pair is an accurate, but time-consuming method for classifying sequences, making a whole genome screen costly. To speed the calculation, a support vector machine (SVM) can be used. SVMs are a set of machine learning methods capable of performing non-linear regression and classification of numerical data <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B68">68</abbr></abbrgrp>. For example, RNAz uses a regression SVM to compute single sequence <it>z </it>scores and a classification SVM to determine whether a multiple sequence alignment is ncRNA or not on the basis of a set of input parameters.</p>
            <p>To classify sequence pairs without performing explicit control calculations to generate a <it>z </it>score, a binary SVM classifier was employed (using the LIBSVM <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> implementation). The classifier takes as input the Dynalign-computed &#916;G&#176;<sub>total </sub>of the input sequence pair, the length of the shorter sequence, and A, U, and C nucleotide frequencies of sequence 1 and sequence 2. This Dynalign/LIBSVM classifier was trained on a set of 59,535 real and negative sequence pairs in a 1:2 ratio; the real sequence pairs were composed of two 5S rRNA or two tRNA, and two negative sequence pairs were generated from each real sequence pair using two different sequence shuffling methods. The classifier was trained to output a classification probability (<it>P </it>value) of the input sequence pair being ncRNA, thus allowing for the construction of ROC curves because the ncRNA classification cutoff could be set at any desired <it>P </it>value.</p>
            <p>To benchmark the performance of the Dynalign/LIBSVM classifier versus RNAz and QRNA, the three methods were applied to a test set of 90,539 5S rRNA and tRNA sequence pairs and 181,078 negative sequence pairs (generated in the same fashion as the set used to train the model, with two negatives for each real sequence pair). For comparison of the Dynalign/LIBSVM classifier and RNAz, ROC curves are plotted for all sequence pairs in Figure <figr fid="F10">10</figr>, and for sequence pairs below 50% identity in Figure <figr fid="F11">11</figr>. Because QRNA compares scores for three different models (ncRNA, open reading frame, or other) to make the classification, an ROC curve cannot be constructed for it as for RNAz and the Dynalign/LIBSVM method, so QRNA classification benchmark results are listed in Table <tblr tid="T4">4</tblr>.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>Comparison of the Dynalign/LIBSVM classifier with RNAz for sequence pairs of all identities</p>
               </caption>
               <text>
                  <p><b>Comparison of the Dynalign/LIBSVM classifier with RNAz for sequence pairs of all identities</b>. ROC curves for the Dynalign/LIBSVM classifier (blue) and RNAz (red), both based on a test set of 38,069 5S rRNA sequence pairs, 52,470 tRNA sequence pairs, plus two negative sequence pairs generated from each real sequence pair &#8211; one by a columnwise shuffle of a global alignment, one by an Altschul-Erikson dinucleotide shuffle of each sequence in the pair separately, yielding 90,539 real trial sequence pairs and 181,078 negative trial sequence pairs.</p>
               </text>
               <graphic file="1471-2105-7-173-10"/>
            </fig>
            <fig id="F11">
               <title>
                  <p>Figure 11</p>
               </title>
               <caption>
                  <p>Comparison of the Dynalign/LIBSVM classifier with RNAz for sequence pairs below 50% identity</p>
               </caption>
               <text>
                  <p><b>Comparison of the Dynalign/LIBSVM classifier with RNAz for sequence pairs below 50% identity</b>. ROC curves for the Dynalign/LIBSVM classifier method (blue) and RNAz (red), both tested only on those sequence pairs from the Figure <figr fid="F10">10</figr> test set that have less than 50% sequence pair identity.</p>
               </text>
               <graphic file="1471-2105-7-173-11"/>
            </fig>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Sensitivities of the Dynalign/LIBSVM classifier, RNAz, and QRNA broken down by percent identity.</p>
               </caption>
               <tblbdy cols="12">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5" ca="center">
                        <p>
                           <b>% sensitivity</b>
                        </p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>
                           <b>% specificity</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p>
                           <b>% identity range</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% of real set</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign (<it>P </it>> 0.819)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz (<it>P </it>> 0.789)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>QRNA</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>N</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% of negative set</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign (<it>P </it>&lt;= 0.819)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz (<it>P </it>&lt;= 0.789)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>QRNA</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="12">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[0</p>
                     </c>
                     <c ca="center">
                        <p>10)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[10</p>
                     </c>
                     <c ca="center">
                        <p>20)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[20</p>
                     </c>
                     <c ca="center">
                        <p>30)</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>0.0033</p>
                     </c>
                     <c ca="center">
                        <p>100.0</p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>66.6667</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>0.0271</p>
                     </c>
                     <c ca="center">
                        <p>97.9592</p>
                     </c>
                     <c ca="center">
                        <p>100.0</p>
                     </c>
                     <c ca="center">
                        <p>95.9184</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[30</p>
                     </c>
                     <c ca="center">
                        <p>40)</p>
                     </c>
                     <c ca="center">
                        <p>1337</p>
                     </c>
                     <c ca="center">
                        <p>1.4767</p>
                     </c>
                     <c ca="center">
                        <p>71.4286</p>
                     </c>
                     <c ca="center">
                        <p>34.6298</p>
                     </c>
                     <c ca="center">
                        <p>48.6163</p>
                     </c>
                     <c ca="center">
                        <p>9037</p>
                     </c>
                     <c ca="center">
                        <p>4.9907</p>
                     </c>
                     <c ca="center">
                        <p>99.1922</p>
                     </c>
                     <c ca="center">
                        <p>97.0233</p>
                     </c>
                     <c ca="center">
                        <p>97.8975</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[40</p>
                     </c>
                     <c ca="center">
                        <p>50)</p>
                     </c>
                     <c ca="center">
                        <p>22328</p>
                     </c>
                     <c ca="center">
                        <p>24.6612</p>
                     </c>
                     <c ca="center">
                        <p>63.6465</p>
                     </c>
                     <c ca="center">
                        <p>63.1539</p>
                     </c>
                     <c ca="center">
                        <p>46.5559</p>
                     </c>
                     <c ca="center">
                        <p>85008</p>
                     </c>
                     <c ca="center">
                        <p>46.9455</p>
                     </c>
                     <c ca="center">
                        <p>99.3612</p>
                     </c>
                     <c ca="center">
                        <p>98.8542</p>
                     </c>
                     <c ca="center">
                        <p>99.2789</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[50</p>
                     </c>
                     <c ca="center">
                        <p>60)</p>
                     </c>
                     <c ca="center">
                        <p>42733</p>
                     </c>
                     <c ca="center">
                        <p>47.1984</p>
                     </c>
                     <c ca="center">
                        <p>73.0606</p>
                     </c>
                     <c ca="center">
                        <p>75.6488</p>
                     </c>
                     <c ca="center">
                        <p>56.2469</p>
                     </c>
                     <c ca="center">
                        <p>55602</p>
                     </c>
                     <c ca="center">
                        <p>30.7061</p>
                     </c>
                     <c ca="center">
                        <p>99.2320</p>
                     </c>
                     <c ca="center">
                        <p>99.5018</p>
                     </c>
                     <c ca="center">
                        <p>99.0306</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[60</p>
                     </c>
                     <c ca="center">
                        <p>70)</p>
                     </c>
                     <c ca="center">
                        <p>17346</p>
                     </c>
                     <c ca="center">
                        <p>19.1586</p>
                     </c>
                     <c ca="center">
                        <p>88.4930</p>
                     </c>
                     <c ca="center">
                        <p>92.2461</p>
                     </c>
                     <c ca="center">
                        <p>86.0775</p>
                     </c>
                     <c ca="center">
                        <p>23952</p>
                     </c>
                     <c ca="center">
                        <p>13.2274</p>
                     </c>
                     <c ca="center">
                        <p>99.1566</p>
                     </c>
                     <c ca="center">
                        <p>99.3028</p>
                     </c>
                     <c ca="center">
                        <p>98.8143</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[70</p>
                     </c>
                     <c ca="center">
                        <p>80)</p>
                     </c>
                     <c ca="center">
                        <p>4061</p>
                     </c>
                     <c ca="center">
                        <p>4.4854</p>
                     </c>
                     <c ca="center">
                        <p>94.5826</p>
                     </c>
                     <c ca="center">
                        <p>94.5087</p>
                     </c>
                     <c ca="center">
                        <p>94.1394</p>
                     </c>
                     <c ca="center">
                        <p>4672</p>
                     </c>
                     <c ca="center">
                        <p>2.5801</p>
                     </c>
                     <c ca="center">
                        <p>97.5813</p>
                     </c>
                     <c ca="center">
                        <p>98.4375</p>
                     </c>
                     <c ca="center">
                        <p>97.9238</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[80</p>
                     </c>
                     <c ca="center">
                        <p>90)</p>
                     </c>
                     <c ca="center">
                        <p>2035</p>
                     </c>
                     <c ca="center">
                        <p>2.2477</p>
                     </c>
                     <c ca="center">
                        <p>95.5774</p>
                     </c>
                     <c ca="center">
                        <p>91.9410</p>
                     </c>
                     <c ca="center">
                        <p>93.9066</p>
                     </c>
                     <c ca="center">
                        <p>2062</p>
                     </c>
                     <c ca="center">
                        <p>1.1387</p>
                     </c>
                     <c ca="center">
                        <p>90.8341</p>
                     </c>
                     <c ca="center">
                        <p>98.1086</p>
                     </c>
                     <c ca="center">
                        <p>96.6052</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>[90</p>
                     </c>
                     <c ca="center">
                        <p>100)</p>
                     </c>
                     <c ca="center">
                        <p>654</p>
                     </c>
                     <c ca="center">
                        <p>0.7223</p>
                     </c>
                     <c ca="center">
                        <p>98.4709</p>
                     </c>
                     <c ca="center">
                        <p>72.6300</p>
                     </c>
                     <c ca="center">
                        <p>59.7859</p>
                     </c>
                     <c ca="center">
                        <p>654</p>
                     </c>
                     <c ca="center">
                        <p>0.3612</p>
                     </c>
                     <c ca="center">
                        <p>60.5505</p>
                     </c>
                     <c ca="center">
                        <p>96.7890</p>
                     </c>
                     <c ca="center">
                        <p>95.2599</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p>[100 ]</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="center">
                        <p>0.0464</p>
                     </c>
                     <c ca="center">
                        <p>92.8571</p>
                     </c>
                     <c ca="center">
                        <p>61.9048</p>
                     </c>
                     <c ca="center">
                        <p>11.9048</p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="center">
                        <p>0.0232</p>
                     </c>
                     <c ca="center">
                        <p>61.9048</p>
                     </c>
                     <c ca="center">
                        <p>90.4762</p>
                     </c>
                     <c ca="center">
                        <p>95.2381</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p>
                           <b>totals</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90539</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>100.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>75.3366</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>81.2854</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>62.0108</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>181078</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>100.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.9938</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.9927</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.9905</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A comparison of sensitivities of the three ncRNA classification/detection programs within each percent identity range. <it>N </it>is the number of sequences within each range. Probability cutoffs for RNAz and the Dynalign/LIBSVM classifier were selected such that overall specificities for the entire test set match the specificity of QRNA as closely as possible.</p>
               </tblfn>
            </tbl>
            <p>The benchmarks on 5S rRNA and tRNA indicate that the Dynalign/LIBSVM classifier is more sensitive than RNAz if the desired specificity is below approximately 98.3%. However, for higher specificities, RNAz becomes more sensitive. When only sequence pairs below 50% identity are considered, the difference between the two methods in prediction quality at high specificities narrows; RNAz is more sensitive than Dynalign at above approximately 99.2% specificity, but less sensitive at all specificities below that.</p>
            <p>Table <tblr tid="T4">4</tblr> illustrates the effectiveness of the three programs broken down by percent identity of the sequence pairs. Because the Dynalign/LIBSVM classifier and RNAz allow selection of a <it>P </it>value cutoff, the cutoffs were chosen so that the specificities of the programs on the test set match those of QRNA, allowing sensitivities to be compared. It should be noted that in Table <tblr tid="T4">4</tblr>, the QRNA-based specificity maps to a point on the Dynalign/LIBSVM classifier and RNAz ROC curves (see Figure <figr fid="F10">10</figr>) where RNAz is more sensitive than Dynalign, which is not true for <it>all </it>specificities. Table <tblr tid="T4">4</tblr> illustrates that the sensitivity of the Dynalign/LIBSVM classifier remains more consistent than RNAz or QRNA at low sequence identity. This is primarily because Dynalign optimizes the structural alignment based on secondary structure, rather than requiring a fixed alignment as input, because the optimal sequence alignment may not necessarily be the optimal structural alignment at lower identities.</p>
            <p>While Figure <figr fid="F9">9</figr> clearly illustrates that Dynalign is the better, albeit slower, tool for classifying low-identity sequence pairs if using the <it>z </it>score method, this apparently does not carry over as effectively into the Dynalign/LIBSVM classifier. Figure <figr fid="F12">12</figr> illustrates the ROC curve for the Dynalign/LIBSVM classifier plotted with ROC curves for the <it>z </it>score method from Figure <figr fid="F3">3</figr>, indicating that the quality of prediction with the Dynalign/LIBSVM classifier is worse than the best <it>z </it>score control generation method, although being approximately 20 times faster because no explicit controls have to be run.</p>
            <fig id="F12">
               <title>
                  <p>Figure 12</p>
               </title>
               <caption>
                  <p>Comparison of the Dynalign <it>z </it>score method with the Dynalign/LIBSVM classifier</p>
               </caption>
               <text>
                  <p><b>Comparison of the Dynalign <it>z </it>score method with the Dynalign/LIBSVM classifier</b>. ROC curves for the Dynalign <it>z </it>score method, <it>M </it>= 8 (blue, column shuffle controls of the global alignments; orange, Altschul-Erickson dinucleotide shuffle controls; green, first-order Markov chain sampling controls) versus the Dynalign/LIBSVM classifier (pink). The <it>z </it>score ROC curves are from Figure <figr fid="F3">3</figr>; the Dynalign/LIBSVM ROC curve is from Figure <figr fid="F10">10</figr>.</p>
               </text>
               <graphic file="1471-2105-7-173-12"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Detection of long ncRNAs</p>
            </st>
            <p>Because the runtime complexity of Dynalign prohibits an efficient whole genome screen using long scanning windows, the hypothesis that the Dynalign/LIBSVM classifier could pick up long ncRNAs by scanning through them using short windows was tested. Three 16S rRNA and three 23S rRNA sequence pairs were chosen randomly from a database of sequences <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, and, for each pair, a global alignment was constructed. The alignments were scanned with windows of size 150 nucleotides, stepping 75 nucleotides at a time. The alignment information was removed from each window prior to input to Dynalign because Dynalign takes two unaligned sequences as input. The Dynalign/LIBSVM classifier was used to compute the probability (<it>P </it>value) of each window being ncRNA.</p>
            <p>Table <tblr tid="T5">5</tblr> shows the <it>P </it>values for all the windows, demonstrating that each of the long ncRNAs has at least one high-probability (<it>P </it>> 0.9) window that would detect it in a whole genome screen. In most cases the number of high-probability windows is large. This indicates that it should be possible to discover long ncRNAs in a whole genome screen by going through them in short windows. In fact, given multiple short windows for most long ncRNAs, the overall sensitivity of long ncRNA discovery should be higher than for short sequences found in only one window. Examples of the distributions of <it>P </it>values by window for representative 16S and 23S rRNA are shown in Figures <figr fid="F13">13</figr> and <figr fid="F14">14</figr>.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Detection of long ncRNAs using scanning windows.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>percent identity of entire alignment</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>total number of scanning windows</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>number of scanning windows with <it>P </it>> 0.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>number of scanning windows with <it>P </it>> 0.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>number of scanning windows with <it>P </it>> 0.99</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6" ca="left">
                        <p>
                           <b>16S rRNA</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Borrelia burgdorferi </it>and <it>Bacillus subtilis</it></p>
                     </c>
                     <c ca="center">
                        <p>74.5%</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Homo sapiens </it>(mitochondrial) and <it>Thermotoga maritima</it></p>
                     </c>
                     <c ca="center">
                        <p>39.3%</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Archaeoglobus fulgidus </it>and <it>Borrelia burgdorferi</it></p>
                     </c>
                     <c ca="center">
                        <p>61.8%</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>22</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>14</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6" ca="left">
                        <p>
                           <b>23S rRNA</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Escherichia coli </it>and <it>Thermoplasma acidophilum</it></p>
                     </c>
                     <c ca="center">
                        <p>59.4%</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                     <c ca="center">
                        <p>49</p>
                     </c>
                     <c ca="center">
                        <p>41</p>
                     </c>
                     <c ca="center">
                        <p>37</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Bacillus subtilis </it>and <it>Bos taurus</it></p>
                     </c>
                     <c ca="center">
                        <p>37.1%</p>
                     </c>
                     <c ca="center">
                        <p>57</p>
                     </c>
                     <c ca="center">
                        <p>35</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Bacillus subtilis </it>and <it>Thermoproteus tenax</it></p>
                     </c>
                     <c ca="center">
                        <p>60.1%</p>
                     </c>
                     <c ca="center">
                        <p>61</p>
                     </c>
                     <c ca="center">
                        <p>3</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>1</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The Dynalign/LIBSVM classifier is used to compute <it>P </it>values for sets of 150-nucleotide scanning windows iterating (in steps of 75 nucleotides) through global alignments of three 16S and three 23S rRNA pairs randomly selected from a database [48]. The quantity of windows above three <it>P </it>value cutoffs is listed, indicating that long ncRNAs can be detected with short scanning windows.</p>
               </tblfn>
            </tbl>
            <fig id="F13">
               <title>
                  <p>Figure 13</p>
               </title>
               <caption>
                  <p>ncRNA probabilities (<it>P </it>values) of scanning windows iterating through a 16S rRNA</p>
               </caption>
               <text>
                  <p><b>ncRNA probabilities (<it>P </it>values) of scanning windows iterating through a 16S rRNA</b>. Probabilities of ncRNA computed by the Dynalign/LIBSVM classifier for 30 150-nucleotide-long scanning windows iterating through a global alignment of <it>Borrelia burgdorferi </it>and <it>Bacillus subtilis </it>16S rRNA in steps of 75.</p>
               </text>
               <graphic file="1471-2105-7-173-13"/>
            </fig>
            <fig id="F14">
               <title>
                  <p>Figure 14</p>
               </title>
               <caption>
                  <p>ncRNA probabilities (<it>P </it>values) of scanning windows iterating through a 23S rRNA</p>
               </caption>
               <text>
                  <p><b>ncRNA probabilities (<it>P </it>values) of scanning windows iterating through a 23S rRNA</b>. Probabilities of ncRNA computed by the Dynalign/LIBSVM classifier for 57 150-nucleotide-long scanning windows iterating through a global alignment of <it>Bacillus subtilis </it>and <it>Bos taurus </it>23S rRNA in steps of 75.</p>
               </text>
               <graphic file="1471-2105-7-173-14"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Whole genome screen using the Dynalign/LIBSVM classifier</p>
            </st>
            <p>The capability of the Dynalign/LIBSVM classifier as a ncRNA detection tool was tested on whole genome alignments of <it>Escherichia coli </it>K-12 MG1655 <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> and the main chromosome of <it>Salmonella enterica serovar Typhi </it>(<it>Salmonella typhi</it>) CT18 <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Two different methods of preparing a whole genome alignment were used. In each case, nucleotides known to be in open reading frames (ORFs) were removed to speed the calculation. With the WuBLASTn <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> method, nucleotides in known open reading frames (ORFs) of <it>E. coli </it>(but not <it>S. typhi</it>) were dropped before the alignment and the screen; in the MUMmer <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> method, nucleotides in known ORFs in both genomes were retained for the alignment, but dropped before the screen. This has the disadvantage that ncRNA overlapping with or complementary to ORFs would be truncated or dropped before the screen, but the lack of a significant number of such ncRNAs did not render this a problem. For example, in <it>E. coli</it>, only eight known ncRNAs partially overlap coding regions and no known ncRNAs completely exist in coding regions. Out of the known 156 <it>E. coli </it>ncRNAs, the MUMmer whole genome alignment contained 129 completely (the ncRNA was entirely within an alignment block), 3 partially (the ncRNA was truncated in the alignment block), and 24 ncRNA did not show up at all in the alignment. The WuBLASTn alignment contained 148 completely, 7 partially, and 1 ncRNA did not show up at all. Therefore, the maximum number of detectable <it>E. coli </it>ncRNAs was 132 for the MUMmer alignment, and 155 for the WuBLASTn alignment.</p>
            <p>For the first method of preparing whole genome screen windows, a MUMmer <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> whole genome alignment was performed of the entire <it>E. coli </it>genome with the entire <it>S. typhi </it>main chromosome. Alignment columns containing known ORF nucleotides in either genome were removed after the alignment; ORF regions were retained for the alignment step only to serve as "anchors" to produce greater coverage and better align intergenic regions. The resulting alignment blocks were scanned with windows of size 150 alignment columns, stepping 75 at a time. The alignment information is removed from each window prior to input to Dynalign, but retained for input to QRNA and RNAz because they require pre-aligned sequences. This produced 15,214 total windows (counting reverse complements) containing 2,216,188 alignment columns. The distribution of percent identities for these windows is reported in Figure <figr fid="F15">15</figr>. The large number of alignment columns relative to intergenic region size is explained by the same sequences producing multiple alignment blocks, due to the quantity of repetitive elements in both genomes. After screening, overlapping and contiguous windows that are classified as ncRNA are merged and considered a single ncRNA.</p>
            <fig id="F15">
               <title>
                  <p>Figure 15</p>
               </title>
               <caption>
                  <p>Distribution of scanning window percent identities in the MUMmer whole genome ncRNA screen</p>
               </caption>
               <text>
                  <p><b>Distribution of scanning window percent identities in the MUMmer whole genome ncRNA screen</b>. Histogram showing the distribution of percent identities of 15,214 genomic windows (size 150 alignment columns, scanning step size 75 alignment columns), generated from the MUMmer whole genome alignment of <it>E. coli </it>and <it>S. typhi</it>.</p>
               </text>
               <graphic file="1471-2105-7-173-15"/>
            </fig>
            <p>Table <tblr tid="T6">6</tblr> shows the results of the MUMmer whole genome screen at various <it>P </it>value cutoffs, compared against RNAz at the same cutoffs and QRNA. Given our current knowledge of ncRNAs in these genomes, the Dynalign/LIBSVM classifier is the most sensitive method for genomic screening, picking up a greater quantity of known ncRNAs. It also appears to generate either less or a roughly equivalent number of "other" hits &#8211; high-probability contiguous regions that are not annotated in the sources of known ncRNA that were used for this screen. It is currently unknown whether this indicates that the Dynalign/LIBSVM classifier method is more specific, because either fewer false positives are generated, or fewer previously unknown ncRNAs are discovered, or a combination of both. Considering the large number of these "other" hits, it is likely that this indicates a lower genomic false positive rate, but this cannot be conclusively determined. The total number of nucleotides in these "other" regions is given in Table <tblr tid="T6">6</tblr> for a crude estimate of the number of probes that would be required for a biochemical verification screen.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Comparison of three ncRNA detection programs on a whole genome screen using the MUMmer alignment.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6" ca="center">
                        <p>
                           <b>probability cutoff for ncRNA classification</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.5</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.9</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.99</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>QRNA</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Known ncRNAs found (percent of total known ncRNAs in parentheses)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(156 ncRNAs known)</p>
                     </c>
                     <c ca="center">
                        <p>128 (82.05)</p>
                     </c>
                     <c ca="center">
                        <p>125 (80.13)</p>
                     </c>
                     <c ca="center">
                        <p>123 (78.85)</p>
                     </c>
                     <c ca="center">
                        <p>104 (66.67)</p>
                     </c>
                     <c ca="center">
                        <p>107 (68.59)</p>
                     </c>
                     <c ca="center">
                        <p>91 (58.33)</p>
                     </c>
                     <c ca="center">
                        <p>67 (42.95)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(110 ncRNAs known)</p>
                     </c>
                     <c ca="center">
                        <p>103 (93.64)</p>
                     </c>
                     <c ca="center">
                        <p>98 (89.09)</p>
                     </c>
                     <c ca="center">
                        <p>102 (92.73)</p>
                     </c>
                     <c ca="center">
                        <p>84 (76.36)</p>
                     </c>
                     <c ca="center">
                        <p>93 (84.55)</p>
                     </c>
                     <c ca="center">
                        <p>70 (63.64)</p>
                     </c>
                     <c ca="center">
                        <p>64 (58.18)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Number of contiguous, non-overlapping hits that are not known ncRNAs (i.e. novel ncRNA candidates)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>E. coli</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1,183</p>
                     </c>
                     <c ca="center">
                        <p>1,255</p>
                     </c>
                     <c ca="center">
                        <p>872</p>
                     </c>
                     <c ca="center">
                        <p>996</p>
                     </c>
                     <c ca="center">
                        <p>578</p>
                     </c>
                     <c ca="center">
                        <p>678</p>
                     </c>
                     <c ca="center">
                        <p>661</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. typhi</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>1,178</p>
                     </c>
                     <c ca="center">
                        <p>1,255</p>
                     </c>
                     <c ca="center">
                        <p>857</p>
                     </c>
                     <c ca="center">
                        <p>977</p>
                     </c>
                     <c ca="center">
                        <p>568</p>
                     </c>
                     <c ca="center">
                        <p>662</p>
                     </c>
                     <c ca="center">
                        <p>634</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Number of nucleotides classified as ncRNA that are not in known ncRNAs (i.e. nucleotides in novel ncRNA candidates)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(each strand = 4,639,675 nt)</p>
                     </c>
                     <c ca="center">
                        <p>169,580</p>
                     </c>
                     <c ca="center">
                        <p>174,790</p>
                     </c>
                     <c ca="center">
                        <p>123,563</p>
                     </c>
                     <c ca="center">
                        <p>128,343</p>
                     </c>
                     <c ca="center">
                        <p>81,936</p>
                     </c>
                     <c ca="center">
                        <p>80,054</p>
                     </c>
                     <c ca="center">
                        <p>87,577</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(each strand = 4,809,037 nt)</p>
                     </c>
                     <c ca="center">
                        <p>163,037</p>
                     </c>
                     <c ca="center">
                        <p>174,126</p>
                     </c>
                     <c ca="center">
                        <p>117,277</p>
                     </c>
                     <c ca="center">
                        <p>126,393</p>
                     </c>
                     <c ca="center">
                        <p>76,289</p>
                     </c>
                     <c ca="center">
                        <p>79,713</p>
                     </c>
                     <c ca="center">
                        <p>88,099</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Total number of nucleotides classified as ncRNA (i.e. nucleotides in both known and unknown ncRNAs)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(each strand = 4,639,675 nt)</p>
                     </c>
                     <c ca="center">
                        <p>224,051</p>
                     </c>
                     <c ca="center">
                        <p>222,817</p>
                     </c>
                     <c ca="center">
                        <p>175,174</p>
                     </c>
                     <c ca="center">
                        <p>166,676</p>
                     </c>
                     <c ca="center">
                        <p>129,086</p>
                     </c>
                     <c ca="center">
                        <p>104,428</p>
                     </c>
                     <c ca="center">
                        <p>113,090</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(each strand = 4,809,037 nt)</p>
                     </c>
                     <c ca="center">
                        <p>213,549</p>
                     </c>
                     <c ca="center">
                        <p>218,867</p>
                     </c>
                     <c ca="center">
                        <p>166,187</p>
                     </c>
                     <c ca="center">
                        <p>162,077</p>
                     </c>
                     <c ca="center">
                        <p>122,269</p>
                     </c>
                     <c ca="center">
                        <p>102,464</p>
                     </c>
                     <c ca="center">
                        <p>114,434</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>QRNA, RNAz, and the Dynalign/LIBSVM classifier are compared in their ability to detect known ncRNA in the <it>E. coli </it>and <it>S. typhi </it>genomes, based on a MUMmer whole genome alignment. For RNAz and the Dynalign/LIBSVM classifier, results are listed for three P value classification cutoffs. "Number of nucleotides" = number of nucleotides on the plus strand + number of nucleotides on the minus strand, not accounting for overlap of complementary strands.</p>
               </tblfn>
            </tbl>
            <p>For the second method of preparing whole genome screen windows, intergenic regions of <it>E. coli </it>(defined as the entire genome minus known ORFs, resulting in 587,347 intergenic nucleotides) were used as WuBLASTn <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> queries against the entire <it>S. typhi </it>main chromosome, resulting in 90,404 total windows (counting the reverse complements) containing 10,265,161 alignment columns. The distribution of percent identities for these windows is reported in Figure <figr fid="F16">16</figr>. Like in the MUMmer alignment, the large number of alignment columns is due to the same sequences appearing in multiple alignment blocks. The windows were created by scanning through the resulting WuBLASTn alignment blocks in the same manner as with the MUMmer screen, using windows of size 150 alignment columns, step size 75. The alignment information was removed from each window prior to input to Dynalign, but retained for input to QRNA and RNAz because they require pre-aligned sequences. Once again, after screening, contiguous or overlapping windows classified as ncRNA were merged into single ncRNA.</p>
            <fig id="F16">
               <title>
                  <p>Figure 16</p>
               </title>
               <caption>
                  <p>Distribution of scanning window percent identities in the WuBLASTn whole genome ncRNA screen</p>
               </caption>
               <text>
                  <p><b>Distribution of scanning window percent identities in the WuBLASTn whole genome ncRNA screen</b>. Histogram showing the distribution of percent identities of 90,404 genomic windows (size 150 alignment columns, scanning step size 75 alignment columns), generated from the WuBLASTn whole genome alignment of <it>E. coli </it>and <it>S. typhi</it>.</p>
               </text>
               <graphic file="1471-2105-7-173-16"/>
            </fig>
            <p>The results of the WuBLASTn genomic screen listed in Table <tblr tid="T7">7</tblr> differ from the results of the MUMmer genomic screen (Table <tblr tid="T6">6</tblr>). The number and coverage of "other" hits in <it>S. typhi </it>is much greater than in <it>E. coli </it>(whereas in the MUMmer screen they were comparable), presumably because <it>E. coli </it>intergenic regions are used as queries against the entire <it>S. typhi </it>chromosome that here, unlike in the MUMmer screen, did not have any ORFs removed prior to generating scanning windows, thus resulting in more <it>S. typhi </it>sequence present. The performance of the Dynalign/LIBSVM classifier and RNAz at the <it>P </it>> 0.99 cutoff in the WuBLASTn screen is comparable to their performance at the <it>P </it>> 0.5 cutoff in the MUMmer screen; QRNA also seems to be more sensitive and less specific in the WuBLASTn screen than MUMmer. This indicates that a MUMmer whole genome alignment would be more desirable for high-specificity whole genome screens.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Comparison of three ncRNA detection programs on a whole genome screen using WuBLASTn genomic windows. </p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6" ca="center">
                        <p>
                           <b>probability cutoff for ncRNA classification</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.5</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.9</b>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <b><it>P </it>> 0.99</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Dynalign</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>RNAz</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>QRNA</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Known ncRNAs found (percent of total known ncRNAs in parentheses)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(156 ncRNAs known)</p>
                     </c>
                     <c ca="center">
                        <p>147 (94.23)</p>
                     </c>
                     <c ca="center">
                        <p>148 (94.87)</p>
                     </c>
                     <c ca="center">
                        <p>141 (90.38)</p>
                     </c>
                     <c ca="center">
                        <p>140 (89.74)</p>
                     </c>
                     <c ca="center">
                        <p>128 (82.05)</p>
                     </c>
                     <c ca="center">
                        <p>124 (79.49)</p>
                     </c>
                     <c ca="center">
                        <p>121 (77.56)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(110 ncRNAs known)</p>
                     </c>
                     <c ca="center">
                        <p>109 (99.09)</p>
                     </c>
                     <c ca="center">
                        <p>108 (98.18)</p>
                     </c>
                     <c ca="center">
                        <p>107 (97.27)</p>
                     </c>
                     <c ca="center">
                        <p>106 (96.36)</p>
                     </c>
                     <c ca="center">
                        <p>103 (93.64)</p>
                     </c>
                     <c ca="center">
                        <p>95 (86.36)</p>
                     </c>
                     <c ca="center">
                        <p>100 (90.91)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Number of contiguous, non-overlapping hits that are not known ncRNAs (i.e. novel ncRNA candidates)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>E. coli</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>2,569</p>
                     </c>
                     <c ca="center">
                        <p>1,898</p>
                     </c>
                     <c ca="center">
                        <p>1,828</p>
                     </c>
                     <c ca="center">
                        <p>1,568</p>
                     </c>
                     <c ca="center">
                        <p>1,211</p>
                     </c>
                     <c ca="center">
                        <p>1,257</p>
                     </c>
                     <c ca="center">
                        <p>1,403</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>S. typhi</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3,936</p>
                     </c>
                     <c ca="center">
                        <p>2,503</p>
                     </c>
                     <c ca="center">
                        <p>2,440</p>
                     </c>
                     <c ca="center">
                        <p>1,986</p>
                     </c>
                     <c ca="center">
                        <p>1,520</p>
                     </c>
                     <c ca="center">
                        <p>1,520</p>
                     </c>
                     <c ca="center">
                        <p>1,611</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Number of nucleotides classified as ncRNA that are not in known ncRNAs (i.e. nucleotides in novel ncRNA candidates)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(each strand = 4,639,675 nt)</p>
                     </c>
                     <c ca="center">
                        <p>324,033</p>
                     </c>
                     <c ca="center">
                        <p>275,704</p>
                     </c>
                     <c ca="center">
                        <p>235,227</p>
                     </c>
                     <c ca="center">
                        <p>221,802</p>
                     </c>
                     <c ca="center">
                        <p>167,157</p>
                     </c>
                     <c ca="center">
                        <p>174,682</p>
                     </c>
                     <c ca="center">
                        <p>206,840</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(each strand = 4,809,037 nt)</p>
                     </c>
                     <c ca="center">
                        <p>514,650</p>
                     </c>
                     <c ca="center">
                        <p>387,099</p>
                     </c>
                     <c ca="center">
                        <p>352,039</p>
                     </c>
                     <c ca="center">
                        <p>296,608</p>
                     </c>
                     <c ca="center">
                        <p>235,018</p>
                     </c>
                     <c ca="center">
                        <p>220,485</p>
                     </c>
                     <c ca="center">
                        <p>248,724</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8" ca="left">
                        <p>
                           <b>Total number of nucleotides classified as ncRNA (i.e. nucleotides in both known and unknown ncRNAs)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>E. coli </it>(each strand = 4,639,675 nt)</p>
                     </c>
                     <c ca="center">
                        <p>385,611</p>
                     </c>
                     <c ca="center">
                        <p>332,000</p>
                     </c>
                     <c ca="center">
                        <p>292,940</p>
                     </c>
                     <c ca="center">
                        <p>270,267</p>
                     </c>
                     <c ca="center">
                        <p>220,352</p>
                     </c>
                     <c ca="center">
                        <p>209,807</p>
                     </c>
                     <c ca="center">
                        <p>242,534</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>S. typhi </it>(each strand = 4,809,037 nt)</p>
                     </c>
                     <c ca="center">
                        <p>570,994</p>
                     </c>
                     <c ca="center">
                        <p>436,137</p>
                     </c>
                     <c ca="center">
                        <p>405,022</p>
                     </c>
                     <c ca="center">
                        <p>338,254</p>
                     </c>
                     <c ca="center">
                        <p>283,999</p>
                     </c>
                     <c ca="center">
                        <p>250,610</p>
                     </c>
                     <c ca="center">
                        <p>283,508</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>QRNA, RNAz, and the Dynalign/LIBSVM classifier are compared in their ability to detect known ncRNA in the <it>E. coli </it>and <it>S. typhi </it>genomes, based on genomic scanning windows prepared using WuBLASTn. For RNAz and the Dynalign/LIBSVM classifier, results are listed for three <it>P </it>value classification cutoffs. "Number of nucleotides" = number of nucleotides on the plus strand + number of nucleotides on the minus strand, not accounting for overlap of complementary strands.</p>
               </tblfn>
            </tbl>
            <p>The complete datasets for both genomic screens are presented in "Additional Files." <supplr sid="S2">Additional Files 2</supplr> and <supplr sid="S3">3</supplr> give the classification of each window in the MUMmer and WuBLASTn genome screens, respectively. <supplr sid="S4">Additional Files 4</supplr> and <supplr sid="S5">5</supplr> likewise provide the input data to the SVM classifier for the MUMmer and WuBLASTn genome screens.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>Side-by-side comparison of Dynalign, RNAz, and QRNA classifications for each window in the MUMmer whole genome screen. Plain text, whitespace-delimited tabular data file. Each row is a window in the MUMmer whole genome alignment (15,214 windows total) of <it>E. coli </it>and <it>S. typhi</it>. Columns 1, 2, and 3: <it>E. coli </it>start and end nucleotide indices and strand (plus or minus) for that window. Columns 4, 5, and 6: <it>S. typhi </it>start and end nucleotide indices and strand (plus or minus) for that window. Column 7: Dynalign/LIBSVM probability that the window is ncRNA. Column 8: RNAz probability that the window is ncRNA. Column 9: QRNA classification of the window (ncRNA, ORF, or other).</p>
               </text>
               <file name="1471-2105-7-173-S2.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>Side-by-side comparison of Dynalign, RNAz, and QRNA classifications for each window in the WuBLASTn whole genome screen. Plain text, whitespace-delimited tabular data file. Each row is a window in the WuBLASTn whole genome alignment (90,404 windows total) of <it>E. coli </it>and <it>S. typhi</it>. Columns 1, 2, and 3: <it>E. coli </it>start and end nucleotide indices and strand (plus or minus) for that window. Columns 4, 5, and 6: <it>S. typhi </it>start and end nucleotide indices and strand (plus or minus) for that window. Column 7: Dynalign/LIBSVM probability that the window is ncRNA. Column 8: RNAz probability that the window is ncRNA. Column 9: QRNA classification of the window (ncRNA, ORF, or other).</p>
               </text>
               <file name="1471-2105-7-173-S3.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>MUMmer whole genome screen input data to the Dynalign/LIBSVM classifier. Plain text data file formatted for input to LIBSVM (not scaled). This is the MUMmer whole genome screen dataset input to the Dynalign/LIBSVM classifier (before scaling). There is a one-to-one correspondence between rows of this file and rows of <supplr sid="S2">Additional File 2</supplr> &#8211; that is, row <it>N </it>in this file corresponds to the window described in row <it>N </it>in <supplr sid="S2">Additional File 2</supplr>. Column 1 is the data label (all windows are initially assumed negatives and labelled "-1," but this is irrelevant for these purposes as this is essentially just a placeholder column for LIBSVM). Column 2 is the Dynalign-computed &#916;G&#176;<sub>total</sub>; column 3 is the length of shorter sequence; columns 4, 5, and 6 are A, U, and C frequencies of sequence 1 (<it>E. coli</it>); columns 7, 8, and 9 are A, U, and C frequencies of sequence 2 (<it>S. typhi</it>).</p>
               </text>
               <file name="1471-2105-7-173-S4.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>WuBLASTn whole genome screen input data to the Dynalign/LIBSVM classifier. Plain text data file formatted for input to LIBSVM (not scaled). This is the WuBLASTn whole genome screen dataset input to the Dynalign/LIBSVM classifier (before scaling). There is a one-to-one correspondence between rows of this file and rows of <supplr sid="S3">Additional File 3</supplr> &#8211; that is, row <it>N </it>in this file corresponds to the window described in row <it>N </it>in <supplr sid="S3">Additional File 3</supplr>. Column 1 is the data label (all windows are initially assumed negatives and labelled "-1," but this is irrelevant for these purposes as this is essentially just a placeholder column for LIBSVM). Column 2 is the Dynalign-computed &#916;G&#176;<sub>total</sub>; column 3 is the length of shorter sequence; columns 4, 5, and 6 are A, U, and C frequencies of sequence 1 (<it>E. coli</it>); columns 7, 8, and 9 are A, U, and C frequencies of sequence 2 (<it>S. typhi</it>).</p>
               </text>
               <file name="1471-2105-7-173-S5.txt">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>It has been shown that the &#916;G&#176;<sub>total </sub>calculated by Dynalign can be used as an effective parameter for detecting ncRNAs. Also, because Dynalign predicts a secondary structure common to two sequences, it is possible to incorporate additional structure-based parameters into the classification model. A recent benchmark of various structural alignment programs <abbrgrp><abbr bid="B66">66</abbr></abbrgrp> reports that Dynalign structural alignments are among the best at reflecting conserved secondary structure, becoming the best at sequence identities below 50%. The potential of Dynalign as a ncRNA detection tool can be yet further explored. For example, it would be interesting to see if methods could be improved if strictly probabilistic or evolution-based scores <abbrgrp><abbr bid="B72">72</abbr></abbrgrp> were added as input to the Dynalign/LIBSVM classifier. Additionally, considering that all testing here was based on only two ncRNA families, it would also be interesting to test how well the Dynalign/LIBSVM classifier would perform if the training set were made more diverse, or if the SVM was optimized further. The LIBSVM input data set for all possible pairwise alignments of 5S rRNA, tRNA, and negative sequences generated from them are presented in <supplr sid="S6">Additional File 6</supplr> in "Additional Files" for such purposes.</p>
         <suppl id="S6">
            <title>
               <p>Additional File 6</p>
            </title>
            <text>
               <p>LIBSVM datasets for every possible sequence pair of 5S rRNA, tRNA, and negative sequences. Nine plain text data files formatted for input to LIBSVM (not scaled) and three plain text files containing sequence codes for the LIBSVM files, all archived with GNU 'tar' and compressed with GNU 'gzip'. Our training and testing sets for the Dynalign/LIBSVM classifier were prepared from this dataset as described in "Methods." The file 'LIBSVM-set.5s-real' is every possible pairing of known 309 5S rRNA sequences in our database, not counting sequences paired with themselves. The file 'LIBSVM-set.trna-real' is every possible pairing of known 479 tRNA sequences in our database, not counting sequences paired with themselves. The file 'LIBSVM-set.100ident-real' is the 309 5S rRNA and 479 tRNA sequences paired with themselves (i.e. real sequence pairs of 100% identity). The files denoted 'neg-column' are columnwise-shuffled negatives generated from the corresponding real sequences; the files denoted 'neg-AE' are negatives generated from the corresponding real sequences by the Altschul-Erikson shuffle (see "Methods" for description of both shuffles). The files denoted 'seqlist' contain the codes for sequence pairs (or for sequences aligned with themselves) with lines in a one-to-one correspondence with the appropriate LIBSVM files &#8211; for example, line 42 of file 'seqlist.5s-pairs' contains the codes of the two 5S rRNA sequences which were used to generate the data on lines 42 in files 'LIBSVM-set.5s-real', 'LIBSVM-set.5s-neg-column', and 'LIBSVM-set.5s-neg-AE'. For LIBSVM files, column 1 the data label (1 for real, -1 for negative); column 2 is the Dynalign-computed &#916;G&#176;<sub>total</sub>; column 3 is the length of shorter sequence; columns 4, 5, and 6 are A, U, and C frequencies of sequence 1; columns 7, 8, and 9 are A, U, and C frequencies of sequence 2.</p>
            </text>
            <file name="1471-2105-7-173-S6.gz">
               <p>Click here for file</p>
            </file>
         </suppl>
         <p>The advantages of using Dynalign over existing ncRNA detection methods are that it is more sensitive at most specificities and that it produces higher quality predictions at low sequence identities. The latter is important, since the number of conserved low-identity regions in some genomes of interest may be high. For example, Figure <figr fid="F17">17</figr> illustrates a distribution of percent identities of a human-mouse BLASTZ genome alignment <abbrgrp><abbr bid="B73">73</abbr></abbrgrp> broken down into 50 nucleotide non-overlapping windows. 25% of the alignment is in the below 50% identity region where the Dynalign <it>z </it>score method outperforms RNAz. Additionally, Table <tblr tid="T4">4</tblr> seems to indicate that the Dynalign/LIBSVM classification method is more consistent across varying percent identities than the other two programs.</p>
         <fig id="F17">
            <title>
               <p>Figure 17</p>
            </title>
            <caption>
               <p>Distribution of percent identities of 50-nucleotide windows in the human-mouse genome alignment</p>
            </caption>
            <text>
               <p><b>Distribution of percent identities of 50-nucleotide windows in the human-mouse genome alignment</b>. The BLASTZ pairwise alignment of the human and mouse genomes [73] is broken down into 50-nucleotide-long non-overlapping windows and the percent identity for each is calculated, then plotted in this histogram. There are 22,456,315 windows total.</p>
            </text>
            <graphic file="1471-2105-7-173-17"/>
         </fig>
         <p>The disadvantages to Dynalign as a ncRNA detection method are that the number of input sequences is currently limited to two, the algorithm does not allow pseudoknots (a common limitation for secondary structure prediction algorithms), and that the runtime is longer than that of many other ncRNA classification programs, especially in the case of explicitly running controls for each input sequence pair; however, optimizations resulting in significant decreases in Dynalign runtime have been achieved as shown in Table <tblr tid="T3">3</tblr>. While control generation can be circumvented by using a classification SVM, the quality of prediction of such a method (as implemented and benchmarked here) appears to drop slightly. However, this simple classification SVM approach, which does not directly incorporate a <it>z </it>score into the classification model, is still more sensitive for known ncRNAs in a whole genome screen than RNAz or QRNA. It may be possible to improve the quality of classification by using a regression model to determine the <it>z </it>score separately from the classification SVM step, which is a strategy successfully employed by RNAz, except that in this case the <it>z </it>score would be based on the &#916;G&#176;<sub>total</sub>s of sequence pairs instead of single sequences, increasing the complexity of the regression model.</p>
         <p>The FOLDALIGN program <abbrgrp><abbr bid="B62">62</abbr><abbr bid="B74">74</abbr></abbrgrp> is closely related to Dynalign and can also be used for ncRNA detection. FOLDALIGN also uses a dynamic programming algorithm to find the secondary structure common to two, unaligned sequences and the sequence alignment that facilitates the structure. FOLDALIGN should therefore share the same advantages and disadvantages that Dynalign has for ncRNA detection at low sequence identity. FOLDALIGN maximizes a score that includes a subset of the free energy change nearest neighbor parameters <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp> and terms that score sequence similarity <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. A scanning version of FOLDALIGN has been reported <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> that takes long sequences as input, but limits the length of structural motifs to a parameter, &#955;, and so does not require that the sequence be broken into windows.</p>
         <p>Because it is fast, prediction from single sequences (such as using RNAstructure <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>) could be used as a rapid pre-filtering step to eliminate a large number of genomic sequence when doing a whole genome screen using these methods. For example, Figures <figr fid="F2">2</figr> and <figr fid="F7">7</figr> indicate that at 36% specificity, the sensitivity of prediction is approximately 99% for 5S rRNA and tRNA tests using RNAstructure. Assuming these numbers are indicative of performance on all ncRNA families, we could use single sequence prediction to quickly eliminate 36% of the negatives in a whole genome screen without sacrificing an overwhelming majority of the real ncRNA. Then, the reduced amount of sequence could be screened with the more time-consuming approach of prediction from sequence pairs, thus speeding up the overall screen.</p>
         <p>The Dynalign ncRNA detection method that we have outlined is computationally costly, but feasible for analysis of long genomes. For example, we estimate that a Dynalign/LIBSVM screen (using 150-nucleotide-long scanning windows, step size 75) of the human-mouse whole genome alignment regions below 50% sequence identity in Figure <figr fid="F17">17</figr>, which contain approximately 563 million alignment columns (this includes the reverse complements of each window), would require approximately 1.4 CPU years after single sequence pre-filtering, or approximately 100 days of wall time on a reasonably sized 50-CPU computation cluster. Additionally, other pre-filtering methods could be employed to eliminate repetitive and other sequences prior to the Dynalign computation.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Local and global Dynalign implementations</p>
            </st>
            <p>The original Dynalign algorithm performs global alignments of the two sequences, i.e. gaps are penalized at the ends of the alignments by applying the &#916;G&#176;<sub>gap penalty </sub>term for each gap when calculating the value of &#916;G&#176;<sub>total</sub>. To facilitate calculations for ncRNA discovery, a local alignment option was programmed. In the local alignment Dynalign, the per nucleotide gap penalty (&#916;G&#176;<sub>gap penalty</sub>) is not applied to gaps at either end of either sequence in the alignment. Because the energy function (equation 3) contains no terms for sequence matching, this allows the local Dynalign to find optimal structural alignments with any portion of each sequence.</p>
         </sec>
         <sec>
            <st>
               <p>Generation of global alignments and calculation of percent identities</p>
            </st>
            <p>To generate global alignments of two sequences, the EMBOSS (version 2.9.0) <abbrgrp><abbr bid="B75">75</abbr></abbrgrp> Stretcher global alignment tool (with default parameters) was used. All percent identities are calculated as follows:</p>
            <p>
               <m:math name="1471-2105-7-173-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                  <m:semantics>
                     <m:mrow>
                        <m:mtext>%&#160;identity</m:mtext>
                        <m:mo>=</m:mo>
                        <m:mfrac>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mtext>#&#160;of&#160;alignment&#160;columns</m:mtext>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mtext>with&#160;matching&#160;nucleotides</m:mtext>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mtext>total&#160;#&#160;of&#160;alignment&#160;columns</m:mtext>
                                 </m:mtd>
                              </m:mtr>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mtext>(including&#160;columns&#160;with&#160;gaps)</m:mtext>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mfrac>
                        <m:mtext>&#160;&#160;&#160;&#160;&#160;</m:mtext>
                        <m:mo stretchy="false">[</m:mo>
                        <m:mtext>eq</m:mtext>
                        <m:mo>.</m:mo>
                        <m:mtext>&#160;7]</m:mtext>
                     </m:mrow>
                     <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqqGLaqjcqqGGaaicqqGPbqAcqqGKbazcqqGLbqzcqqGUbGBcqqG0baDcqqGPbqAcqqG0baDcqqG5bqEcqGH9aqpdaWcaaabaiqabaGaee4iamIaeeiiaaIaee4Ba8MaeeOzayMaeeiiaaIaeeyyaeMaeeiBaWMaeeyAaKMaee4zaCMaeeOBa4MaeeyBa0MaeeyzauMaeeOBa4MaeeiDaqNaeeiiaaIaee4yamMaee4Ba8MaeeiBaWMaeeyDauNaeeyBa0MaeeOBa4Maee4CamhabaGaee4DaCNaeeyAaKMaeeiDaqNaeeiAaGMaeeiiaaIaeeyBa0MaeeyyaeMaeeiDaqNaee4yamMaeeiAaGMaeeyAaKMaeeOBa4Maee4zaCMaeeiiaaIaeeOBa4MaeeyDauNaee4yamMaeeiBaWMaeeyzauMaee4Ba8MaeeiDaqNaeeyAaKMaeeizaqMaeeyzauMaee4Camhaaqaaceqaaiabbsha0jabb+gaVjabbsha0jabbggaHjabbYgaSjabbccaGiabbocaJiabbccaGiabb+gaVjabbAgaMjabbccaGiabbggaHjabbYgaSjabbMgaPjabbEgaNjabb6gaUjabb2gaTjabbwgaLjabb6gaUjabbsha0jabbccaGiabbogaJjabb+gaVjabbYgaSjabbwha1jabb2gaTjabb6gaUjabbohaZbqaaiabbIcaOiabbMgaPjabb6gaUjabbogaJjabbYgaSjabbwha1jabbsgaKjabbMgaPjabb6gaUjabbEgaNjabbccaGiabbogaJjabb+gaVjabbYgaSjabbwha1jabb2gaTjabb6gaUjabbohaZjabbccaGiabbEha3jabbMgaPjabbsha0jabbIgaOjabbccaGiabbEgaNjabbggaHjabbchaWjabbohaZjabbMcaPaaaaiaaxMaacaWLjaGaei4waSLaeeyzauMaeeyCaeNaeiOla4IaeeiiaaIaee4naCJaeeyxa0faaa@C8CB@</m:annotation>
                  </m:semantics>
               </m:math>
            </p>
         </sec>
         <sec>
            <st>
               <p>Construction of test set for benchmark of Dynalign z score classification method on known ncRNA sequence pairs</p>
            </st>
            <p>The sequence pair test set was constructed by randomly drawing and pairing real sequences from a pool of 309 known 5S rRNAs from the 5S ribosomal RNA database <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B76">76</abbr></abbrgrp> and 482 known tRNAs from the Sprinzl database <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B77">77</abbr></abbrgrp> (two tRNAs were not allowed because they contained an "X" (unknown) nucleotide that did not permit a dinucleotide shuffle to be done). This resulted in 755 real 5S rRNA and 896 real tRNA sequence pairs, whose distribution of percent identities was consistent with the distribution of percent identities of every possible pairwise alignment of the pools of all 5S rRNA and tRNA.</p>
            <p>To test specificity, for each real sequence pair, a negative sequence pair was created by globally aligning the real pair, randomly shuffling the alignment columns (without regard for gap placement or local conservation), then removing the gaps. Prior to input to RNAz and QRNA, the shuffled sequences were globally re-aligned. The resulting test set contained 3,302 sequence pairs total.</p>
            <p>It should be noted that two other methods for generating negative sequence pairs from real sequence pairs were additionally tried. The first was the "sre_shuffle" command line option in QRNA, which shuffles columns in an alignment while preserving gap position. Columns in the alignment are separated into three categories: nucleotide aligned to nucleotide, gap in sequence 1 aligned to nucleotide in sequence 2, and gap in sequence 2 aligned to a nucleotide in sequence 1; each column is shuffled only with other columns in its category. The second was the "SHUFFLEALN.PL" program <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> by Washietl <it>et al</it>, which, in the case of a pairwise sequence alignment as input, preserves gap position in the same fashion, but also preserves local conservation. Just as in QRNA's "sre_shuffle," alignment columns are divided into categories and each column is only shuffled with other columns in its category, but the nucleotide-aligned-to-nucleotide category is further subdivided into two categories &#8211; columns where the nucleotides are the same, i.e. conserved, and columns where nucleotides are different. However, benchmarks of the Dynalign <it>z </it>score classification method, RNAz, and QRNA showed that specificity for each program was sufficiently similar regardless of the method for generating negatives (data not shown), so for all tests a columnwise shuffle of a global alignment (without regard for gap placement or local conservation) was used to generate negative sequence pairs from real sequence pairs.</p>
         </sec>
         <sec>
            <st>
               <p>Generation of controls for <it>z </it>score determination</p>
            </st>
            <p>Three methods were used for generating control sets for sequence pairs (only the Altschul-Erikson shuffle is used for generating control sets for single sequences):</p>
            <p>(1) A columnwise shuffle (without regard for gap placement or local conservation) of a global alignment of the original sequence pair.</p>
            <p>(2) Separately generating each sequence in the control pair by sampling from a first-order Markov chain as described in <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> without regard for alignment; for each sequence in the original pair, the nucleotide and dinucleotide frequencies are calculated, the first nucleotide in the control sequence is selected by sampling from the nucleotide frequencies of the original, then for the remainder of the sequence, each following nucleotide is sampled from the dinucleotide frequencies of the original, given that the first nucleotide is known. The dinucleotide frequencies of a sequence generated by this method would approach the dinucleotide frequencies of the original sequence in the limit of infinite length; however, since the lengths must be finite, the dinucleotide frequencies of the control are only approximately similar to the original sequence.</p>
            <p>(3) The Altschul-Erikson dinucleotide shuffle <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> (implemented in Python by P. Clote, <abbrgrp><abbr bid="B78">78</abbr></abbrgrp>) of each sequence in the original pair separately, which exactly preserves their nucleotide and dinucleotide frequencies, except that the shuffled sequence has the same first and last nucleotide as the original sequence.</p>
            <p>All &#916;G&#176;<sub>total</sub>s in these trials were computed using the "global alignment" mode of Dynalign for both the input sequence pairs and the controls.</p>
         </sec>
         <sec>
            <st>
               <p>Construction of ROC curves</p>
            </st>
            <p>To construct ROC curves for the Dynalign <it>z </it>score classification method, the <it>z </it>score cutoff was incremented from -11 to 3 in steps of 0.01 to generate test set sensitivity/specificity pairs ranging from 100% specificity to 100% sensitivity, then sensitivity was plotted as a function of the false positive rate (1 &#8211; specificity). Where the SVM probability (<it>P </it>value) was used as the classification cutoff, whether for RNAz or the Dynalign/LIBSVM classifier, the same was done, except <it>P </it>was incremented from 0 to 1 in steps of 0.001.</p>
         </sec>
         <sec>
            <st>
               <p>Testing of QRNA and RNAz</p>
            </st>
            <p>QRNA (version 2.0.2c) <abbrgrp><abbr bid="B59">59</abbr></abbrgrp> and RNAz (version 0.1.1) <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> require a pre-aligned sequence pair as input. RNAz can take a multiple sequence alignment as input, but here was only tested on sequence pairs. For benchmarks on test sets of known ncRNAs and negatives, this input was prepared by performing a global alignment of the two sequences. Whenever negative sequence pairs are produced from real sequence pairs by a columnwise shuffle of a global alignment, gaps are removed after the shuffle, and the sequences are globally re-aligned again to mimic the alignment that would be expected if these sequences were to appear in an actual ncRNA screen.</p>
            <p>For whole genome screen tests, the genomic alignment windows used for input to the two programs were taken directly from the MUMmer or WuBLASTn alignment.</p>
         </sec>
         <sec>
            <st>
               <p>Training and testing of the Dynalign/LIBSVM classifier</p>
            </st>
            <p>The LIBSVM <abbrgrp><abbr bid="B69">69</abbr></abbrgrp> implementation of a support vector machine was employed for the Dynalign/SVM classifier method. The binary classifier SVM with a radial basis function (RBF) kernel was used. All LIBSVM classifier models were trained with command line parameters <it>-b 1 -c 32 -w-1 5 -g 6.10352e-05 </it>(the values were empirically determined), where <it>-b 1 </it>indicates that the model is trained to calculate probabilities of binary classification, <it>-c </it>specifies the value (C = 2<sup>5</sup>) of the penalty parameter of the error term, <it>-w-1 5 </it>specifies that the penalty of misclassifying negative sequence pairs as real sequence pair (i.e. misclassifying those labelled "-1" as those labelled "1") is 5 times the penalty specified by <it>-c </it>(this has the effect of reducing false positives), and <it>-g </it>specifies the value of &#947; in the RBF (&#947; = 2<sup>-14</sup>). Classification was done with the -<it>b 1 </it>parameter to output probabilities (<it>P </it>values), allowing for variation of the cutoff <it>P </it>value for classification and for construction of ROC curves. Input to LIBSVM was the Dynalign-computed &#916;G&#176;<sub>total</sub>, length of shorter sequence of the sequence pair, A, U, and C frequencies of sequence 1, and A, U, and C frequencies of sequence 2. Prior to input to LIBSVM, values for each parameter were scaled to the range [-1, 1]. The ranges for each parameter across the datasets is follows: &#916;G&#176;<sub>total </sub>was from -1868 to 0 (units of 10*kcal/mol); length of shorter sequence was from 50 to 150 nucleotides; frequencies of A, U, and C in sequence 1 were from 0.0701754 to 0.518519, from 0.0701754 to 0.518519, and from 0.0638298 to 0.42953, respectively; frequencies of A, U, and C in sequence 2 were from 0.0588235 to 0.623377, from 0.0588235 to 0.623377, and from 0.0402685 to 0.436364, respectively.</p>
            <p>To train the SVM classifier, a training set containing every possible sequence pairing (not including pairing of sequences to themselves) was prepared from a pool of 309 known 5S rRNAs <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B76">76</abbr></abbrgrp> and 479 known tRNAs <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B77">77</abbr></abbrgrp> (two tRNAs that contained an "X" nucleotide and three tRNAs with three-way multibranch loops instead of the canonical four-way multibranch loops were removed from the Sprinzl database pool prior to this). This resulted in 47,586 5S rRNA and 114,481 tRNA sequence pairs. Two negative sequence pairs for each real sequence pair were generated: one by columnwise shuffle of a global sequence alignment, one by Altschul-Erikson shuffle of each sequence separately. This was intended to reduce the false positive rate by training the SVM classifier on a more diverse set of negative sequence pairs. The Dynalign &#916;G&#176;<sub>total </sub>(using the "local alignment" Dynalign mode and parameter <it>M </it>= 8) and the other SVM input data were computed for every sequence pair in the resulting training set of 486,201 data points. The free energies and sequence characteristics for each pair in the entire set are provided as <supplr sid="S6">Additional File 6</supplr> in the "Additional Files" section for reference, formatted for input to LIBSVM.</p>
            <p>However, this set was unnecessarily large and biased towards tRNA, so for final SVM model training, only every 5th 5S rRNA and every 12th tRNA sequence pair were kept, producing a training set of a more realistic size of 9,517 5S rRNA and 9,540 tRNA sequence pairs, with 19,034 and 19,080 negative sequence pairs generated from them, for a training set size of 57,171 data points. All of the remaining 5S rRNA and one-half (every 2nd sequence pair, taken so that tRNA would not be over-represented) of the remaining tRNA sequence pairs were used to construct a test set for benchmarks of the Dynalign/LIBSVM classifier, RNAz, and QRNA.</p>
            <p>Moreover, an additional 2,364 data points were added to the training set, which were calculated from alignments of sequences in the pool of 309 5S rRNA and 479 tRNA to themselves, with two negative sequence pairs generated from each real sequence pair as before (i.e. all of these 2,364 data points had 100% identity). This addition to the training set was done in order to train the Dynalign/LIBSVM classifier to more accurately classify high-identity genomic windows, of which there was a very large number in the whole genome screen (e.g. 21% of the genomic windows in the MUMmer whole genome screen method have identity above 98%, which does not reflect the distribution of percent identities in the original training set). Thus, the final training set size was 59,535 data points.</p>
            <p>The result of the training is a LIBSVM model file that used by LIBSVM for classification and probability estimation. The model file is supplied as <supplr sid="S7">Additional File 7</supplr>, although it should be noted that it will only work correctly on data scaled as described above.</p>
            <suppl id="S7">
               <title>
                  <p>Additional File 7</p>
               </title>
               <text>
                  <p>LIBSVM model file for the Dynalign/LIBSVM classifier. The model file for a LIBSVM classifier, trained as described in "Methods." LIBSVM classifications with this model file also outputs a probability of prediction (<it>P </it>value), in addition to the prediction itself. Use this with LIBSVM on datasets that have been scaled as described in "Methods" and note that datasets scaled differently will be incorrectly classified. The input dataset should be a plain text, whitespace-delimited tabular file formatted as described in <supplr sid="S6">Additional File 6</supplr> and in the LIBSVM documentation <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>.</p>
               </text>
               <file name="1471-2105-7-173-S7.model">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Sources of genomic data</p>
            </st>
            <p>All whole genome screens were conducted using the complete 4,639,675-nucleotide genome of <it>Escherichia coli </it>K-12 MG1655 [RefSeq NC_000913] and the complete 4,809,037-nucleotide main chromosome of <it>Salmonella enterica serovar Typhi </it>(<it>Salmonella typhi</it>) strain CT18 [RefSeq NC_003198].</p>
            <p>For <it>E. coli</it>, the lists and genomic coordinates of 4,237 known open reading frames (ORFs) and 156 known ncRNAs were obtained from the NCBI Entrez Genome Project database <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. The intergenic region size is 587,347 nucleotides.</p>
            <p>For <it>S. typhi</it>, the lists and genomic coordinates of 4,594 known ORFs and 110 known ncRNAs were obtained from The Wellcome Trust Sanger Institute <it>S. typhi </it>database <abbrgrp><abbr bid="B80">80</abbr></abbrgrp>. The intergenic region size is 604,213 nucleotides.</p>
         </sec>
         <sec>
            <st>
               <p>Intergenic region alignment with WuBLASTn</p>
            </st>
            <p>To prepare genomic windows for a ncRNA screen of <it>E. coli </it>and <it>S. typhi </it>using WuBLASTn, intergenic regions of <it>E. coli </it>were constructed by taking the entire genome and removing all nucleotides in the 4,237 known ORFs, resulting in 587,347 nucleotides total. Each resulting segment was used as a WuBLASTn (version 2.0 <abbrgrp><abbr bid="B70">70</abbr></abbrgrp>, using default parameters) query against the entire <it>S. typhi </it>main chromosome. To maximize coverage, none of the resulting alignment blocks were filtered, except to throw out all those where the block length was less than 50 alignment columns. Alignment blocks length 50 to 150 (inclusive) were used as genomic windows directly; blocks with length greater than 150 were scanned with windows of size 150 alignment columns, step size 75. This resulted in 45,202 total genomic windows for the WuBLASTn ncRNA genomic screen. Reverse complements of sequences in each window were also scanned, resulting in 90,404 windows total as input to the Dynalign/LIBSVM classifier, RNAz, and QRNA, containing 10,265,161 alignment columns.</p>
         </sec>
         <sec>
            <st>
               <p>Whole genome alignment with MUMmer</p>
            </st>
            <p>To prepare genomic windows for a ncRNA screen of <it>E. coli </it>and <it>S. typhi </it>using MUMmer, a whole genome alignment was generated using MUMmer 3.15 <abbrgrp><abbr bid="B71">71</abbr></abbrgrp> with parameters <it>-b 1600 -c 10 </it>to increase genomic coverage (all other parameters were left at default values). All alignment columns containing nucleotides in known ORFs of either genome were removed from the resulting alignment blocks. The resulting alignment blocks were scanned with windows of size 150 alignment columns, step size 75, to generate windows for the genomic screen; because unlike WuBLASTn, the MUMmer whole genome alignment contains long stretches of gaps in some regions, some windows had to be dropped because one sequence in the window was aligned to only gaps for the other sequence. Additionally, windows containing a sequence less than 50 nucleotides in length were also dropped. After taking the reverse complement of each window, a total of 15,214 windows were input to the Dynalign/LIBSVM classifier, RNAz, and QRNA, containing 2,216,188 alignment columns.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p>&#8226; <b>Project name: </b>Dynalign</p>
         <p>&#8226; <b>Project home page: </b><url>http://rna.urmc.rochester.edu/dynalign.html</url></p>
         <p>&#8226; <b>Operating system(s): </b>Platform independent</p>
         <p>&#8226; <b>Programming language: </b>C++</p>
         <p>&#8226; <b>Other requirements: </b>none</p>
         <p>&#8226; <b>License: </b>GNU GPL</p>
         <p>&#8226; <b>Any restrictions to use by non-academics: </b>none</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>5S rRNA &#8211; 5S subunit ribosomal RNA</p>
         <p>AE &#8211; the Altschul-Erikson dinucleotide sequence shuffle method</p>
         <p>columnwise &#8211; the columnwise sequence pair shuffle method</p>
         <p>dinuc &#8211; the sampling from first-order Markov chain sequence shuffle method</p>
         <p>ncRNA &#8211; non-coding RNA</p>
         <p>ORF &#8211; open reading frame</p>
         <p>PPV &#8211; positive predictive value</p>
         <p>RBF &#8211; radial basis function</p>
         <p>ROC &#8211; receiver operating characteristic</p>
         <p>SCI &#8211; structure conservation index</p>
         <p>SVM &#8211; support vector machine</p>
         <p>tRNA &#8211; transport RNA</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AVU performed the computational experiments, analyzed the data, and drafted the manuscript. JMK optimized the Dynalign code to decrease runtime. DHM programmed the changes to Dynalign, conceived of the study, and contributed to the manuscript. All authors have read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors thank Michael Zuker for suggesting the change in <it>M </it>parameter implementation and Douglas H. Turner and Peter Clote for helpful discussions. Computer time was made available by the IBM SUR (Shared University Research) program, located in the Computational Biology and Bioinformatics Laboratory in CASCI (the Center for Advancing the Study of CyberInfrastructure) at Rochester Institute of Technology.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The structural basis of ribosomal activity in peptide bond synthesis</p>
            </title>
            <aug>
               <au>
                  <snm>Nissen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hansen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ban</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>289</volume>
            <fpage>920</fpage>
            <lpage>930</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.289.5481.920</pubid>
                  <pubid idtype="pmpid" link="fulltext">10937990</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Structural insights into peptide bond formation</p>
            </title>
            <aug>
               <au>
                  <snm>Hansen</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Schmeing</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>PB</fnm>
               </au>
               <au>
                  <snm>Steitz</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>11670</fpage>
            <lpage>11675</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">129327</pubid>
                  <pubid idtype="pmpid" link="fulltext">12185246</pubid>
                  <pubid idtype="doi">10.1073/pnas.172404099</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum</p>
            </title>
            <aug>
               <au>
                  <snm>Walter</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Blobel</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1982</pubdate>
            <volume>299</volume>
            <fpage>691</fpage>
            <lpage>698</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/299691a0</pubid>
                  <pubid idtype="pmpid">6181418</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>RNA interference: antiviral defense and genetic tool</p>
            </title>
            <aug>
               <au>
                  <snm>Cullen</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Nat Immunol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>597</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmpid" link="fulltext">12087412</pubid>
                  <pubid idtype="doi">10.1038/ni0702-597</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The chemical repertoire of natural ribozymes</p>
            </title>
            <aug>
               <au>
                  <snm>Doudna</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Cech</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>418</volume>
            <fpage>222</fpage>
            <lpage>228</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/418222a</pubid>
                  <pubid idtype="pmpid" link="fulltext">12110898</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>RNA and the epigenetic regulation of X chromosome inactivation</p>
            </title>
            <aug>
               <au>
                  <snm>Panning</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Jaenisch</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1998</pubdate>
            <volume>93</volume>
            <fpage>305</fpage>
            <lpage>308</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(00)81155-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">9590161</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The end of the (DNA) line</p>
            </title>
            <aug>
               <au>
                  <snm>Blackburn</snm>
                  <fnm>EH</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>847</fpage>
            <lpage>850</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/79594</pubid>
                  <pubid idtype="pmpid" link="fulltext">11017190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A guided tour: small RNA function in Archaea</p>
            </title>
            <aug>
               <au>
                  <snm>Dennis</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Omer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>40</volume>
            <fpage>509</fpage>
            <lpage>519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2001.02381.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">11359559</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The expanding snoRNA world</p>
            </title>
            <aug>
               <au>
                  <snm>Bachellerie</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Cavaille</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Biochimie</source>
            <pubdate>2002</pubdate>
            <volume>84</volume>
            <fpage>775</fpage>
            <lpage>790</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0300-9084(02)01402-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">12457565</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Lau</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>858</fpage>
            <lpage>862</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1065062</pubid>
                  <pubid idtype="pmpid" link="fulltext">11679671</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Identification of novel genes coding for small expressed RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Lagos-Quintana</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rauhut</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lendeckel</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>853</fpage>
            <lpage>858</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1064921</pubid>
                  <pubid idtype="pmpid" link="fulltext">11679670</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>A conserved RNA structure (thi box) is involved in regulation of thiamin biosynthetic gene expression in bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Miranda-Rios</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Navarror</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sober&#243;n</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9736</fpage>
            <lpage>9741</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55522</pubid>
                  <pubid idtype="pmpid" link="fulltext">11470904</pubid>
                  <pubid idtype="doi">10.1073/pnas.161168098</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Noncoding regulatory RNAs database</p>
            </title>
            <aug>
               <au>
                  <snm>Szymanski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Erdmann</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Barciszewska</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>429</fpage>
            <lpage>431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165571</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520042</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg124</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A novel sRNA component of the carbon storage regulatory system of Escherichia coli</p>
            </title>
            <aug>
               <au>
                  <snm>Weilbacher</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dubey</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Gudapaty</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Morozov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Georgellis</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Babitzke</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Romeo</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>48</volume>
            <fpage>657</fpage>
            <lpage>670</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2003.03459.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">12694612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Rfam: an RNA family database</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>439</fpage>
            <lpage>441</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165453</pubid>
                  <pubid idtype="pmpid" link="fulltext">12520045</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg006</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Rfam: annotating non-coding RNAs in complete genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Moxon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Marshall</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Khanna</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D121</fpage>
            <lpage>124</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540035</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608160</pubid>
                  <pubid idtype="doi">10.1093/nar/gki081</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>RNAdb &#8212; a comprehensive mammalian noncoding RNA database</p>
            </title>
            <aug>
               <au>
                  <snm>Pang</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Stephen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Engstr&#246;m</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Tajul-Arifin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Wahlestedt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Mattick</snm>
                  <fnm>JS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>D125</fpage>
            <lpage>D130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15608161</pubid>
                  <pubid idtype="doi">10.1093/nar/gki089</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Non-coding RNA genes and the modern RNA world</p>
            </title>
            <aug>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>919</fpage>
            <lpage>929</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35103511</pubid>
                  <pubid idtype="pmpid" link="fulltext">11733745</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Secondary structure alone is not statistically significant for the detection of noncoding RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Rivas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>583</fpage>
            <lpage>605</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.7.583</pubid>
                  <pubid idtype="pmpid" link="fulltext">11038329</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Small RNAs in bacteria: diverse regulators of gene expression in response to environmental changes.</p>
            </title>
            <aug>
               <au>
                  <snm>Wassarman</snm>
                  <fnm>KM</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2002</pubdate>
            <volume>109</volume>
            <fpage>141</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(02)00717-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">12007399</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The SINE-encoded mouse B2 RNA represses mRNA transcription in response to heat shock</p>
            </title>
            <aug>
               <au>
                  <snm>Allen</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Von Kaenel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Goodrich</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Kugel</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>Nat Struct Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>11</volume>
            <fpage>816</fpage>
            <lpage>821</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb813</pubid>
                  <pubid idtype="pmpid" link="fulltext">15300240</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>The Sm-like Hfq protein increases OxyS RNA interaction with target mRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Wassarman</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Ortega</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Steven</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Storz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>11</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(01)00437-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">11804582</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse</p>
            </title>
            <aug>
               <au>
                  <snm>H&#252;ttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kiefmann</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Meier-Ewert</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>O'Brien</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lehrach</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bachellerie</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Brosius</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Embo J</source>
            <pubdate>2001</pubdate>
            <volume>20</volume>
            <fpage>2943</fpage>
            <lpage>2953</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">125495</pubid>
                  <pubid idtype="pmpid" link="fulltext">11387227</pubid>
                  <pubid idtype="doi">10.1093/emboj/20.11.2943</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Okazaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Furuno</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bono</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nikaido</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Osato</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Saito</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yamanaka</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kiyosawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yagi</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nogami</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schonbach</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Baldarelli</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hill</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Hume</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schriml</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Kanapin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Matsuda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Batalov</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Beisel</snm>
                  <fnm>KW</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Bradt</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brusic</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Corbani</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Cousins</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dalla</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dragani</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Fletcher</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Forrest</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Frazer</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Gariboldi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gissi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Grimmond</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gustincich</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hirokawa</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Jackson</snm>
                  <fnm>IJ</fnm>
               </au>
               <au>
                  <snm>Jarvis</snm>
                  <fnm>ED</fnm>
               </au>
               <au>
                  <snm>Kanai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawaji</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kawasawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kedzierski</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>King</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Konagaya</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kurochkin</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lenhard</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lyons</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Maltais</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Marchionni</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>McKenzie</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Miki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nagashima</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Numata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Okido</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Pavan</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Pertea</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesole</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Petrovsky</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pillai</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Pontius</snm>
                  <fnm>JU</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ramachandran</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ravasi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Reed</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Reid</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ring</snm>
                  <fnm>BZ</fnm>
               </au>
               <au>
                  <snm>Ringwald</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sandelin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Schneider</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Semple</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Setou</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shimada</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sultana</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Takenaka</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Teasdale</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Tomita</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Verardo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wahlestedt</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wilming</snm>
                  <fnm>LG</fnm>
               </au>
               <au>
                  <snm>Wynshaw-Boris</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yanagisawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Yuan</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zimmer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Carninci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Hayatsu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hirozane-Kishikawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Konno</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakazume</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shiraki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Waki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Aizawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Arakawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fukuda</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hara</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hashizume</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Imotani</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishii</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Itoh</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kagawa</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Miyazaki</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sakai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sasaki</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Shibata</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shinagawa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yasunishi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yoshino</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Waterston</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lander</snm>
                  <fnm>ES</fnm>
               </au>
               <au>
                  <snm>Rogers</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Birney</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <cnm>FANTOM Consortium</cnm>
               </au>
               <au>
                  <cnm>RIKEN Genome Exploration Research Group Phase I &amp; II Team</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>563</fpage>
            <lpage>573</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01266</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466851</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Computational identification of noncoding RNAs in E. coli by comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Rivas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Klein</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1369</fpage>
            <lpage>1373</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00401-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">11553332</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>McCutcheon</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <fpage>4119</fpage>
            <lpage>4128</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165953</pubid>
                  <pubid idtype="pmpid" link="fulltext">12853629</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg438</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Identification of cyanobacterial non-coding RNAs by comparative genome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Axmann</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>Kensche</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kohl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Herzel</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hess</snm>
                  <fnm>WR</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R73</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1242208</pubid>
                  <pubid idtype="pmpid" link="fulltext">16168080</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-9-r73</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Identification of novel small RNAs using comparative genomics and microarrays</p>
            </title>
            <aug>
               <au>
                  <snm>Wassarman</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Repoila</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rosenow</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Storz</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gottesman</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2001</pubdate>
            <volume>15</volume>
            <fpage>1637</fpage>
            <lpage>1651</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">312727</pubid>
                  <pubid idtype="pmpid" link="fulltext">11445539</pubid>
                  <pubid idtype="doi">10.1101/gad.901001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Novel small RNA-encoding genes in the intergenic regions of Escherichia coli</p>
            </title>
            <aug>
               <au>
                  <snm>Argaman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Herschberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Altuva</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>941</fpage>
            <lpage>950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00270-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11448770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Lukasser</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2005</pubdate>
            <volume>23</volume>
            <fpage>1383</fpage>
            <lpage>1390</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt1144</pubid>
                  <pubid idtype="pmpid" link="fulltext">16273071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>The complete genome sequence of Escherichia coli K-12</p>
            </title>
            <aug>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Bloch</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rode</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Gregor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>NW</fnm>
               </au>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Goeden</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Shao</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>277</volume>
            <fpage>1453</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.277.5331.1453</pubid>
                  <pubid idtype="pmpid" link="fulltext">9278503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Complete genome sequence of a multiple drug resistant Salmonella enterica serovar Typhi CT18</p>
            </title>
            <aug>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dougan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Pickard</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Wain</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Sebaihia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Basham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chillingworth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Connerton</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Dowd</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Farrar</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Feltwell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Haque</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hien</snm>
                  <fnm>TT</fnm>
               </au>
               <au>
                  <snm>Holroyd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jagels</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Larsen</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Leather</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Moule</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>O'Gaora</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Parry</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Simmonds</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Skelton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Whitehead</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>413</volume>
            <fpage>848</fpage>
            <lpage>852</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35101607</pubid>
                  <pubid idtype="pmpid" link="fulltext">11677608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Genome sequence of the nematode C. elegans: a platform for investigating biology</p>
            </title>
            <aug>
               <au>
                  <cnm>C. elegans Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1998</pubdate>
            <volume>282</volume>
            <fpage>2012</fpage>
            <lpage>2018</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.282.5396.2012</pubid>
                  <pubid idtype="pmpid" link="fulltext">9851916</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms</p>
            </title>
            <aug>
               <au>
                  <snm>Christie</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Weng</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Balakrishnan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Costanzo</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Dolinski</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Dwight</snm>
                  <fnm>SS</fnm>
               </au>
               <au>
                  <snm>Engel</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Feierbach</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fisk</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Hirschman</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>EL</fnm>
               </au>
               <au>
                  <snm>Issel-Tarver</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Nash</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sethuraman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Starr</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Theesfeld</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Andrada</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Binkley</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dong</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Lane</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Botstein</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Cherry</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>D311</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308767</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681421</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh033</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The Drosophila melanogaster genome</p>
            </title>
            <aug>
               <au>
                  <snm>Celniker</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Annu Rev Genomics Hum Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>89</fpage>
            <lpage>117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.genom.4.070802.110323</pubid>
                  <pubid idtype="pmpid" link="fulltext">14527298</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Initial sequencing and comparative analysis of the mouse genome</p>
            </title>
            <aug>
               <au>
                  <cnm>Mouse Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>420</volume>
            <fpage>520</fpage>
            <lpage>562</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01262</pubid>
                  <pubid idtype="pmpid" link="fulltext">12466850</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Finishing the euchromatic sequence of the human genome </p>
            </title>
            <aug>
               <au>
                  <cnm>International Human Genome Sequencing Consortium</cnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <fpage>927</fpage>
            <lpage>930</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03062</pubid>
                  <pubid idtype="pmpid" link="fulltext">15496912</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>A program for predicting significant RNA secondary structures</p>
            </title>
            <aug>
               <au>
                  <snm>Le</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Currey</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Maizel</snm>
                  <fnm>JVJ</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1988</pubdate>
            <volume>4</volume>
            <fpage>153</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2454711</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Thermodynamic stability and statistical significance of potential stem-loop structures situated at the frameshift sites of retroviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Le</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Maizel</snm>
                  <fnm>JV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1989</pubdate>
            <volume>17</volume>
            <fpage>6143</fpage>
            <lpage>6152</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">318267</pubid>
                  <pubid idtype="pmpid">2549508</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>A computational procedure for assessing the significance of RNA secondary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Le</snm>
                  <fnm>SY</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Currey</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Maizel</snm>
                  <fnm>JV</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1990</pubdate>
            <volume>6</volume>
            <fpage>7</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1690072</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency</p>
            </title>
            <aug>
               <au>
                  <snm>Clote</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ferre</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kranakis</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Krizanc</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2005</pubdate>
            <volume>11</volume>
            <fpage>578</fpage>
            <lpage>591</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370746</pubid>
                  <pubid idtype="pmpid" link="fulltext">15840812</pubid>
                  <pubid idtype="doi">10.1261/rna.7220505</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>No evidence that mRNAs have lower folding free energies than random sequences with the same dinucleotide distribution</p>
            </title>
            <aug>
               <au>
                  <snm>Workman</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>4816</fpage>
            <lpage>4822</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148783</pubid>
                  <pubid idtype="pmpid" link="fulltext">10572183</pubid>
                  <pubid idtype="doi">10.1093/nar/27.24.4816</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Fast and reliable prediction of noncoding RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2005</pubdate>
            <volume>102</volume>
            <fpage>2454</fpage>
            <lpage>2459</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">548974</pubid>
                  <pubid idtype="pmpid" link="fulltext">15665081</pubid>
                  <pubid idtype="doi">10.1073/pnas.0409169102</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>342</volume>
            <fpage>19</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.07.018</pubid>
                  <pubid idtype="pmpid" link="fulltext">15313604</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Dynalign: an algorithm for finding the secondary structure common to two RNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>317</volume>
            <fpage>191</fpage>
            <lpage>203</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.5351</pubid>
                  <pubid idtype="pmpid" link="fulltext">11902836</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Predicting a set of minimal free energy RNA secondary structures common to two sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>2246</fpage>
            <lpage>2253</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti349</pubid>
                  <pubid idtype="pmpid" link="fulltext">15731207</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick pairs</p>
            </title>
            <aug>
               <au>
                  <snm>Xia</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>SantaLucia</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Burkard</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Kierzek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Jiao</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <fpage>14719</fpage>
            <lpage>14735</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi9809425</pubid>
                  <pubid idtype="pmpid" link="fulltext">9778347</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA secondary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Sabina</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zuker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>288</volume>
            <fpage>911</fpage>
            <lpage>940</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.2700</pubid>
                  <pubid idtype="pmpid" link="fulltext">10329189</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Disney</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Childs</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Schroeder</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Zuker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <fpage>7287</fpage>
            <lpage>7292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">409911</pubid>
                  <pubid idtype="pmpid" link="fulltext">15123812</pubid>
                  <pubid idtype="doi">10.1073/pnas.0401799101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Probing RNA structure, function, and history by comparitive analysis.</p>
            </title>
            <aug>
               <au>
                  <snm>Woese</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Pace</snm>
                  <fnm>NR</fnm>
               </au>
            </aug>
            <source>The RNA World</source>
            <publisher>New York , Cold Spring Harbor Press</publisher>
            <editor>Gesteland RF, Atkins JF</editor>
            <pubdate>1993</pubdate>
            <fpage>91</fpage>
            <lpage>117</lpage>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Finding the hairpin in the haystack: searching for RNA motifs.</p>
            </title>
            <aug>
               <au>
                  <snm>Dandekar</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hentze</snm>
                  <fnm>MW</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1995</pubdate>
            <volume>11</volume>
            <fpage>45</fpage>
            <lpage>50</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)88996-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">7536364</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>955</fpage>
            <lpage>964</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146525</pubid>
                  <pubid idtype="pmpid" link="fulltext">9023104</pubid>
                  <pubid idtype="doi">10.1093/nar/25.5.955</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>A computational screen for methylation guide snoRNAs in yeast</p>
            </title>
            <aug>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>283</volume>
            <fpage>1168</fpage>
            <lpage>1171</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.283.5405.1168</pubid>
                  <pubid idtype="pmpid" link="fulltext">10024243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Prediction of signal recognition particle RNA genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Regalia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rosenblad</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Samuelsson</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>3368</fpage>
            <lpage>3377</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">137091</pubid>
                  <pubid idtype="pmpid" link="fulltext">12140321</pubid>
                  <pubid idtype="doi">10.1093/nar/gkf468</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>The microRNAs of Caenorhabditis elegans.</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Abdelhakim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <fpage>991</fpage>
            <lpage>1008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12672692</pubid>
                  <pubid idtype="doi">10.1101/gad.1074403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>A search for H/ACA snoRNAs in yeast using MFE secondary structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Edvardsson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Poole</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Hendy</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Moulton</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>865</fpage>
            <lpage>873</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg080</pubid>
                  <pubid idtype="pmpid" link="fulltext">12724297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>Genome-wide searching for pseudouridylation guide snoRNAs: analysis of the Saccharomyces cerevisiae genome</p>
            </title>
            <aug>
               <au>
                  <snm>Schattner</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Decatur</snm>
                  <fnm>WA</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Ares</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fournier</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Lowe</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>4281</fpage>
            <lpage>4296</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514388</pubid>
                  <pubid idtype="pmpid" link="fulltext">15306656</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh768</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>RSEARCH: finding homologs of single structures RNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Klein</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>44</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">239859</pubid>
                  <pubid idtype="pmpid" link="fulltext">14499004</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-4-44</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Noncoding RNA gene detection using comparative sequence analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Rivas</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eddy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>2</volume>
            <fpage>8</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">64605</pubid>
                  <pubid idtype="pmpid" link="fulltext">11801179</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-2-8</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>Accelerated probabilistic inference of RNA structure evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Holmes</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>73</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1090553</pubid>
                  <pubid idtype="pmpid" link="fulltext">15790387</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-73</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>Alignment of RNA base pairing probability matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
               <au>
                  <snm>Bernhart</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <fpage>2222</fpage>
            <lpage>2227</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth229</pubid>
                  <pubid idtype="pmpid" link="fulltext">15073017</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%</p>
            </title>
            <aug>
               <au>
                  <snm>Havgaard</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Lyngso</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Gorodkin</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>1815</fpage>
            <lpage>1824</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti279</pubid>
                  <pubid idtype="pmpid" link="fulltext">15657094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B63">
            <title>
               <p>On finding all suboptimal foldings of an RNA molecule</p>
            </title>
            <aug>
               <au>
                  <snm>Zuker</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1989</pubdate>
            <volume>244</volume>
            <fpage>48</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2468181</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B64">
            <title>
               <p>Mathews lab homepage</p>
            </title>
            <url>http://rna.urmc.rochester.edu</url>
         </bibl>
         <bibl id="B65">
            <title>
               <p>Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Erickson</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1985</pubdate>
            <volume>2</volume>
            <fpage>526</fpage>
            <lpage>538</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3870875</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B66">
            <title>
               <p>A benchmark of multiple sequence alignment programs upon structural RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Wilm</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Washietl</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>2433</fpage>
            <lpage>2439</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1087786</pubid>
                  <pubid idtype="pmpid" link="fulltext">15860779</pubid>
                  <pubid idtype="doi">10.1093/nar/gki541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B67">
            <title>
               <p>A training algorithm for optimal margin classifiers</p>
            </title>
            <aug>
               <au>
                  <snm>Boser</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Guyon</snm>
                  <fnm>IM</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <source>Proceedings of the 5th Annual Workshop on Computational Learning Theory</source>
            <publisher>ACM Press</publisher>
            <pubdate>1992</pubdate>
            <fpage>144</fpage>
            <lpage>152</lpage>
         </bibl>
         <bibl id="B68">
            <title>
               <p>Support-vector network.</p>
            </title>
            <aug>
               <au>
                  <snm>Cortes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1995</pubdate>
            <volume>20</volume>
            <fpage>273</fpage>
            <lpage>297</lpage>
         </bibl>
         <bibl id="B69">
            <title>
               <p>LIBSVM: a library for support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>CJ</fnm>
               </au>
            </aug>
            <url>http://www.csie.ntu.edu.tw/~cjlin/libsvm</url>
         </bibl>
         <bibl id="B70">
            <title>
               <p>WU BLAST 2.0</p>
            </title>
            <aug>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <url>http://blast.wustl.edu</url>
         </bibl>
         <bibl id="B71">
            <title>
               <p>Versatile and open software for comparing large genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Kurtz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Phillippy</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Smoot</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shumway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Antonescu</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R12</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">395750</pubid>
                  <pubid idtype="pmpid" link="fulltext">14759262</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B72">
            <title>
               <p>Using evolutionary Expectation Maximization to estimate indel rates</p>
            </title>
            <aug>
               <au>
                  <snm>Holmes</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>2294</fpage>
            <lpage>2300</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti177</pubid>
                  <pubid idtype="pmpid" link="fulltext">15731213</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B73">
            <title>
               <p>Human-mouse alignments with BLASTZ</p>
            </title>
            <aug>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Smit</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hardison</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>103</fpage>
            <lpage>107</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">430961</pubid>
                  <pubid idtype="pmpid" link="fulltext">12529312</pubid>
                  <pubid idtype="doi">10.1101/gr.809403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B74">
            <title>
               <p>Finding the most significant common sequence and structure in a set of RNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Gorodkin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Heyer</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Stormo</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3724</fpage>
            <lpage>3732</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146942</pubid>
                  <pubid idtype="pmpid" link="fulltext">9278497</pubid>
                  <pubid idtype="doi">10.1093/nar/25.18.3724</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B75">
            <title>
               <p>EMBOSS: the European Molecular Biology Open Software Suite</p>
            </title>
            <aug>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Longden</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bleasby</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>276</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10827456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B76">
            <title>
               <p>5S ribosomal RNA database Y2K</p>
            </title>
            <aug>
               <au>
                  <snm>Szymanski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Barciszewska</snm>
                  <fnm>MZ</fnm>
               </au>
               <au>
                  <snm>Barciszewski</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Erdmann</snm>
                  <fnm>VA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2000</pubdate>
            <volume>28</volume>
            <fpage>166</fpage>
            <lpage>167</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">102473</pubid>
                  <pubid idtype="pmpid" link="fulltext">10592212</pubid>
                  <pubid idtype="doi">10.1093/nar/28.1.166</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B77">
            <title>
               <p>Compilation of tRNA sequences and sequences of tRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Sprinzl</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Horn</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ioudovitch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Steinberg</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>148</fpage>
            <lpage>153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147216</pubid>
                  <pubid idtype="pmpid" link="fulltext">9399820</pubid>
                  <pubid idtype="doi">10.1093/nar/26.1.148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B78">
            <title>
               <p>Clote computational biology lab</p>
            </title>
            <aug>
               <au>
                  <snm>Clote</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <url>http://clavius.bc.edu/~clotelab/</url>
         </bibl>
         <bibl id="B79">
            <title>
               <p>NCBI Entrez Genome Project database</p>
            </title>
            <aug>
               <au>
                  <cnm>NCBI Entrez Genome Project database</cnm>
               </au>
            </aug>
            <url>http://www.ncbi.nlm.nih.gov/genomes/rnatab.cgi?gi=115&amp;db=Genome</url>
         </bibl>
         <bibl id="B80">
            <title>
               <p>The Wellcome Trust Sanger Institute S. typhi database</p>
            </title>
            <aug>
               <au>
                  <cnm>The Wellcome Trust Sanger Institute S. typhi database</cnm>
               </au>
            </aug>
            <url>http://www.sanger.ac.uk/Projects/S_typhi/</url>
         </bibl>
         <bibl id="B81">
            <title>
               <p>The comparative RNA web (CRW) site: An online database of comparative sequence and structure information for ribosomal, intron, and other RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Cannone</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Subramanian</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schnare</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Collett</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>D'Souza</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Feng</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Madabusi</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Pande</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gutell</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">65690</pubid>
                  <pubid idtype="pmpid" link="fulltext">11869452</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-3-2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B82">
            <title>
               <p>Comparative and functional anatomy of group II catalytic introns - a review</p>
            </title>
            <aug>
               <au>
                  <snm>Michel</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Umesono</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ozeki</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1989</pubdate>
            <volume>82</volume>
            <fpage>5</fpage>
            <lpage>30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0378-1119(89)90026-7</pubid>
                  <pubid idtype="pmpid">2684776</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B83">
            <title>
               <p>The ribonuclease P database</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>314</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148169</pubid>
                  <pubid idtype="pmpid" link="fulltext">9847214</pubid>
                  <pubid idtype="doi">10.1093/nar/27.1.314</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B84">
            <title>
               <p>The signal recognition particle database (SRPDB)</p>
            </title>
            <aug>
               <au>
                  <snm>Larsen</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Samuelsson</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zwieb</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>177</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147211</pubid>
                  <pubid idtype="pmpid" link="fulltext">9399828</pubid>
                  <pubid idtype="doi">10.1093/nar/26.1.177</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B85">
            <title>
               <p>Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element</p>
            </title>
            <aug>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Banerjee</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Luan</snm>
                  <fnm>DD</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>1997</pubdate>
            <volume>3</volume>
            <fpage>1</fpage>
            <lpage>16</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1369457</pubid>
                  <pubid idtype="pmpid" link="fulltext">8990394</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B86">
            <title>
               <p>Secondary structure models of the 3' untranslated regions of diverse R2 RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Ruschak</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Mathews</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Bibillo</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Spinelli</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Childs</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Eickbush</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Turner</snm>
                  <fnm>DH</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <fpage>978</fpage>
            <lpage>987</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370589</pubid>
                  <pubid idtype="pmpid" link="fulltext">15146081</pubid>
                  <pubid idtype="doi">10.1261/rna.5216204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
