<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-10-S1-S35</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>HHMMiR: efficient <it>de novo </it>prediction of microRNAs using hierarchical hidden Markov models</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Kadri</snm>
               <fnm>Sabah</fnm>
               <insr iid="I1"/>
               <email>sskadri@andrew.cmu.edu</email>
            </au>
            <au id="A2">
               <snm>Hinman</snm>
               <fnm>Veronica</fnm>
               <insr iid="I2"/>
               <email>vhinman@cmu.edu</email>
            </au>
            <au ca="yes" id="A3">
               <snm>Benos</snm>
               <mi>V</mi>
               <fnm>Panayiotis</fnm>
               <insr iid="I3"/>
               <email>benos@pitt.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <editor>Michael Q Zhang, Michael S Waterman and Xuegong Zhang</editor>
            <note>Research</note>
         </supplement>
         <conference>
            <title>
               <p>The Seventh Asia Pacific Bioinformatics Conference (APBC 2009)</p>
            </title>
            <location>Beijing, China</location>
            <date-range>13&#8211;16 January 2009</date-range>
            <url>http://bioinfo.au.tsinghua.edu.cn/apbc2009/</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2009</pubdate>
         <volume>10</volume>
         <issue>Suppl 1</issue>
         <fpage>S35</fpage>
         <url>http://www.biomedcentral.com/1471-2105/10/S1/S35</url>
         <xrefbib>
            
         <pubidlist><pubid idtype="pmpid">19208136</pubid><pubid idtype="doi">10.1186/1471-2105-10-S1-S35</pubid></pubidlist></xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>30</day>
               <month>1</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>Kadri et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>MicroRNA</it>s (miRNAs) are small non-coding single-stranded RNAs (20&#8211;23 nts) that are known to act as post-transcriptional and translational regulators of gene expression. Although, they were initially overlooked, their role in many important biological processes, such as development, cell differentiation, and cancer has been established in recent times. In spite of their biological significance, the identification of miRNA genes in newly sequenced organisms is still based, to a large degree, on extensive use of evolutionary conservation, which is not always available.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We have developed HHMMiR, a novel approach for <it>de novo </it>miRNA hairpin prediction in the absence of evolutionary conservation. Our method implements a <it>Hierarchical Hidden Markov Model </it>(HHMM) that utilizes region-based structural as well as sequence information of miRNA precursors. We first established a template for the structure of a typical miRNA hairpin by summarizing data from publicly available databases. We then used this template to develop the HHMM topology.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our algorithm achieved average sensitivity of 84% and specificity of 88%, on 10-fold cross-validation of human miRNA precursor data. We also show that this model, trained on human sequences, works well on hairpins from other vertebrate as well as invertebrate species. Furthermore, the human trained model was able to correctly classify ~97% of plant miRNA precursors. The success of this approach in such a diverse set of species indicates that sequence conservation is not necessary for miRNA prediction. This may lead to efficient prediction of miRNA genes in virtually any organism.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>MicroRNAs</p>
            </st>
            <p><it>MicroRNA</it>s (miRNAs) are small (~22 nucleotide long) non-coding RNAs that are part of a eukaryote-specific system of gene regulation at the RNA level. MiRNAs act as post-transcriptional regulators of gene expression by base pairing with their target mRNAs. MiRNAs are primarily transcribed by <it>RNA Pol II </it><abbrgrp><abbr bid="B1">1</abbr></abbrgrp> as regions of longer RNA molecules (pri-miRNA) <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Individual pre-miRNA loops (~70 nts) are cleaved from the pri-miRNA by RNAse III enzyme, <it>Drosha </it>and transported into the cytoplasm by <it>RAN-GTP </it>and <it>Exportin 5 </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp> to be processed further to a ~22 nt long duplex, with 3' overhangs, by <it>Dicer </it><abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. This duplex is commonly referred to as the miRNA:miRNA* duplex, where miRNA* is complementary to the miRNA. The miRNA:miRNA* duplex is subsequently unwound and the mature miRNA is loaded into multi-protein RISC (RNA-induced silencing complex) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> while miRNA* usually degrades. In some cases, both miRNA and miRNA* are functional <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The miRNA biogenesis is illustrated in Figure <figr fid="F1">1</figr>. Mature miRNAs can cause translation inhibition or mRNA cleavage, depending on the degree of complementarity between the miRNA and its target sequence. Each miRNA can have multiple targets and each gene can be targeted by multiple miRNAs. It has been predicted that more than one third of human genes is regulated by miRNAs <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Biogenesis of microRNAs</p>
               </caption>
               <text>
                  <p><b>Biogenesis of microRNAs</b>. miRNA genes are transcribed in the nucleus, where they undergo processing by DGCR8/Pasha and the RNAse III family enzyme, Drosha. The pre-miRNA is then transported into the cytoplasm where it is processed by Dicer, and the cofactor TRBP to generate a ~22 nt miRNA:miRNA* duplex. After unwinding, the miRNA forms part of the RISC assembly and causes mRNA degradation or translational repression.</p>
               </text>
               <graphic file="1471-2105-10-S1-S35-1"/>
            </fig>
            <p>Plant and animal miRNAs differ not only in their biogenesis, but also in target-miRNA interactions. Plant miRNAs base pair with their targets with perfect or near-perfect complementarity and they regulate their targets mostly through mRNA cleavage at single sites in coding regions. Animal miRNAs usually base pair with 3' UTRs of the mRNAs at multiple target sites through imperfect complementarity. Due to these and other differences, it has been suggested that this regulation mechanism may have evolved independently in plants and animals <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Some viruses have also been shown to encode miRNAs that play a role in expression regulation of host genes <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>MiRNA identification</p>
            </st>
            <p>The first animal miRNA genes, <it>let-7 </it>and <it>lin-4</it>, were discovered in <it>Caenorhabditis elegans </it>by forward genetics <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Currently, miRNA genes are biochemically identified by cloning and sequencing size-fractionated cDNA libraries. The main limitation of this method is that lowly expressed miRNAs may be missed <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Although deep sequencing can help overcome this problem, this is currently a costly solution. Still, some miRNAs may be difficult to clone due to their sequence composition and possible post-transcriptional modifications <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Deep sequencing is being used on a large scale to identify small non-coding RNAs, but this is an expensive method and can only identify miRNAs expressed in a single cell type or in a given condition.</p>
            <p>Computational predictive methods are fast and inexpensive and a number of approaches have been developed to predict miRNA genes, genome-wide. However, most of these approaches depend heavily on conservation of hairpins in closely related species <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. Some methods have used clustering or profiling to identify miRNAs, <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. The approach of Bentwich <it>et al. </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp> is interesting in that the whole genome is folded and scores are assigned to hairpins based on various features, including hairpin structural features and folding stability.</p>
            <p>Machine learning approaches in the past have used support vector machines with high dimensional basis functions for classification of genomics hairpins <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. Some of these methods depend on cross-species conservation for classification, while others do motif finding using multiple alignments. More recently, HMMs have been used in modelling miRNAs using both, evolutionary information and features related to the secondary structure <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Hierarchical Hidden Markov Models</p>
            </st>
            <p><it>Hierarchical Hidden Markov Models </it>(HHMMs) constitute a generalization of Hidden Markov Models (HMMs). They have been successfully used for modelling stochastic levels and length scales <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. In biology, HHMMs have been used in the past to model vertebrate splice sites <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and more recently in modelling <it>cis</it>-regulatory modules <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. An HHMM has two types of states: <it>internal states </it>and <it>production states</it>. Each internal state has its own HHMM but cannot emit symbols by itself. It can activate a sub-state by a vertical transition. Sub-states can also make vertical transitions, until the lowest level in the hierarchy (production state) is reached. Production states are the only states that can emit symbols from the alphabet <it>via </it>their own probability distributions. Sub-states at the same level of hierarchy will be activated through horizontal transitions till an "end state" is reached. Every level has only one "end state" for each parent state that shifts control back to the parent. Thus, each internal state can emit sequences instead of single symbols. The node at the highest level of the hierarchy is called the "root" node while the leaf nodes are the productions states. Please refer to <it>Methods </it>for information about HHMM parameters and their estimation.</p>
            <p>In this article, we report the results on the performance of an HHMM we developed for modelling miRNA hairpins. Although the model was trained on human sequences only, it was able to classify accurately hairpins from species as distant as worm, flies and plants, indicating that the degree of sequence and structural conservation for these genes may be high.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Data summarization</p>
            </st>
            <p>We consider the hairpin stem-loop for predictions since it is structurally, the most prominent feature during biogenesis (Figure <figr fid="F1">1</figr>). MiRNA genes can be divided into four regions depicted in Figure <figr fid="F2">2a</figr>. After transcription, the RNA strand folds to form the hairpin precursor (Figure <figr fid="F1">1</figr> and Figure <figr fid="F2">2a</figr>). The "loop" is the bulged end of the hairpin. The "miRNA" region defines the miRNA-miRNA* duplex (sans the 3' overhangs) that is processed by Dicer and further unwound. The region of the precursor extending from the end of the loop to the "miRNA" region is called the "extension". This region can be of variable length. The part of the hairpin sequence beyond the "miRNA" region may be part of the pri-miRNA in the nucleus and processed by Drosha. Thus, it has been named as "pri-extension", as suggested in Saetrom <it>et al. </it><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The miRNA hairpin</p>
               </caption>
               <text>
                  <p><b>The miRNA hairpin</b>. <b>(a) </b><it>Template</it>: In our model, the miRNA precursor has four regions- "Loop" is the bulge and the <it>loop </it>state outputs <it>indels </it>only; "Extension" is a variable length region between the miRNA duplex and the loop; "microRNA" represents the duplex, without 3' overhangs; "Pri-extension" is the rest of the hairpin. The latter three states can output <it>matches</it>, <it>mismatches </it>and <it>indels</it>. (The nucleotides distribution and lengths are not to scale) <b>(b) </b><it>Labelled precursor</it>: The precursor shown in (a) is labelled according to the regions it represents. This is the input format of training data for HHMMiR. L: Loop; E: Extension; R: MiRNA; P: Pri-miRNA.</p>
               </text>
               <graphic file="1471-2105-10-S1-S35-2"/>
            </fig>
            <p>The results presented in Table <tblr tid="T1">1</tblr> show that the differences that exist between vertebrate and invertebrate miRNA genes are rather small. So, a probabilistic method trained in data from one organism is likely to perform well in another organism. As evident from the results in Table <tblr tid="T1">1</tblr>, the differences between length distributions of plant and animal precursors are relatively drastic, with the former having longer extension regions. The lengths of miRNAs and loops, however, are conserved across the two kingdoms. More information about species-specific differences is provided in Additional File <supplr sid="S1">1</supplr>. These genomes constitute an excellent test set for our algorithm in that they span various taxonomic groups, with different miRNA characteristics. Thus, it will be very useful to see how well an HHMM trained on (say) human sequences will be able to predict miRNA stem-loops in another vertebrate or invertebrate species and plants.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>This file contains the results of summarization of the <it>microRNA registry </it>(version 10.1, December 2007) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> hairpin characteristics for each species.</p>
               </text>
               <file name="1471-2105-10-S1-S35-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Characteristics of miRNA hairpins in various taxonomic groups.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>HP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>LP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>MIR</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>EXT</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>PRI</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Mean</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Vertebrates</p>
                     </c>
                     <c ca="left">
                        <p>86.7</p>
                     </c>
                     <c ca="left">
                        <p>7.3</p>
                     </c>
                     <c ca="left">
                        <p>22.0</p>
                     </c>
                     <c ca="left">
                        <p>5.0</p>
                     </c>
                     <c ca="left">
                        <p>12.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Invertebrates</p>
                     </c>
                     <c ca="left">
                        <p>91.8</p>
                     </c>
                     <c ca="left">
                        <p>7.9</p>
                     </c>
                     <c ca="left">
                        <p>22.2</p>
                     </c>
                     <c ca="left">
                        <p>5.8</p>
                     </c>
                     <c ca="left">
                        <p>13.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plants</p>
                     </c>
                     <c ca="left">
                        <p>119.5</p>
                     </c>
                     <c ca="left">
                        <p>6.8</p>
                     </c>
                     <c ca="left">
                        <p>21.3</p>
                     </c>
                     <c ca="left">
                        <p>22.8</p>
                     </c>
                     <c ca="left">
                        <p>12.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><b>Std. Dev</b>.</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Vertebrates</p>
                     </c>
                     <c ca="left">
                        <p>13.8</p>
                     </c>
                     <c ca="left">
                        <p>3.5</p>
                     </c>
                     <c ca="left">
                        <p>0.9</p>
                     </c>
                     <c ca="left">
                        <p>3.4</p>
                     </c>
                     <c ca="left">
                        <p>7.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Invertebrates</p>
                     </c>
                     <c ca="left">
                        <p>13.1</p>
                     </c>
                     <c ca="left">
                        <p>3.9</p>
                     </c>
                     <c ca="left">
                        <p>1.3</p>
                     </c>
                     <c ca="left">
                        <p>4.5</p>
                     </c>
                     <c ca="left">
                        <p>5.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plants</p>
                     </c>
                     <c ca="left">
                        <p>43.2</p>
                     </c>
                     <c ca="left">
                        <p>3.7</p>
                     </c>
                     <c ca="left">
                        <p>1.0</p>
                     </c>
                     <c ca="left">
                        <p>18.5</p>
                     </c>
                     <c ca="left">
                        <p>9.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Minimum</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Vertebrates</p>
                     </c>
                     <c ca="left">
                        <p>55</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Invertebrates</p>
                     </c>
                     <c ca="left">
                        <p>54</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plants</p>
                     </c>
                     <c ca="left">
                        <p>57</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                     <c ca="left">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Maximum</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Vertebrates</p>
                     </c>
                     <c ca="left">
                        <p>153</p>
                     </c>
                     <c ca="left">
                        <p>22</p>
                     </c>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>34</p>
                     </c>
                     <c ca="left">
                        <p>50</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Invertebrates</p>
                     </c>
                     <c ca="left">
                        <p>215</p>
                     </c>
                     <c ca="left">
                        <p>30</p>
                     </c>
                     <c ca="left">
                        <p>28</p>
                     </c>
                     <c ca="left">
                        <p>55</p>
                     </c>
                     <c ca="left">
                        <p>32</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plants</p>
                     </c>
                     <c ca="left">
                        <p>337</p>
                     </c>
                     <c ca="left">
                        <p>35</p>
                     </c>
                     <c ca="left">
                        <p>24</p>
                     </c>
                     <c ca="left">
                        <p>102</p>
                     </c>
                     <c ca="left">
                        <p>78</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>HP</it>: Hairpin length; <it>LP</it>: Loop length; <it>MIR</it>: MiRNA length; <it>EXT</it>: Distance of miRNA duplex from end of loop; <it>PRI</it>: Length of extension from end of miRNA to end of precursor. The list of organisms used for this Table is provided as <it>Supplementary Data</it>.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>HHMM model</p>
            </st>
            <p>HHMMiR is built around the miRNA precursor template illustrated in Figure <figr fid="F2">2a</figr>. The figure presents the four characteristic regions of stem-loop of a typical miRNA gene as described above. The length distributions of each of these regions are derived from Table <tblr tid="T1">1</tblr>. Each region, except the loop itself has three states: <it>match</it>, <it>mismatch</it>, and <it>insertion/deletion </it>(<it>indel</it>). <it>Match </it>means a base pairing at that position in the stem-loop, while <it>mismatch </it>means bulges on both arms at that position in the folded hairpin.<it>Indel </it>means that a base in one strand has no counterpart in the opposite strand. The loop will only have the <it>indel </it>state. Examples of these states are presented in Figure <figr fid="F2">2a</figr>.</p>
            <p>The HHMM resulting from this scheme has three levels (Figure <figr fid="F3">3</figr>). <it>Hairpin </it>is the root node and can vertically transition to its <it>Loop </it>substate only. In our model, every hairpin begins with a loop. The four internal states at the second level correspond to the four main regions of the hairpin from Figure <figr fid="F2">2a</figr>. This level also has an <it>End </it>(L<sub>end</sub>) state to transfer control back to the <it>Hairpin</it>. Each internal state has a probabilistic model at the next lower level. A <it>Loop </it>cannot have base pairs and thus, has only one substate: <it>I </it>(<it>Indel). </it>The <it>Extension </it>state can only emit an <it>M </it>(<it>match</it>) state, when entered, since a mismatch or indel would become part of the loop. The <it>miRNA </it>and <it>pri-Ext </it>states can begin with a match, mismatch or indel. Each of these states has an <it>End </it>state (L<sub>end</sub>, R<sub>end</sub>, P<sub>end </sub>respectively)(see Figure <figr fid="F3">3</figr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>The HHMM state model (based on the microRNA hairpin template)</p>
               </caption>
               <text>
                  <p><b>The HHMM state model (based on the microRNA hairpin template)</b>. The oval shaped nodes represent the <it>internal states</it>. The colours correspond to the biological region presented in Figure 2a. The circular solid lined nodes correspond to the production states. The dotted lined states correspond to the silent end states. M: <it>Match </it>states, N: <it>Mismatch </it>states, I: <it>Indel </it>states, L<sub>end</sub>: Loop end state, R<sub>end</sub>: miRNA end state, P<sub>end</sub>: pri-extension end state.</p>
               </text>
               <graphic file="1471-2105-10-S1-S35-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Datasets and alphabet selection</p>
            </st>
            <p>The training dataset contained a total 527 human miRNA precursors (positive dataset) and ~500 random hairpins (negative dataset), based on criteria derived from summarization (see <it>Methods</it>). The <it>RNAfold </it>program from Vienna Package <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> was used to obtain the secondary structure of these hairpins with the <it>minimum fold energy </it>(<it>mfe</it>). The parameters of the model were estimated using a modified Baum-Welch algorithm (see <it>Methods </it>for details on data sets and algorithms). All tests were conducted with 10-fold cross validation with random sampling.</p>
            <p>We tested our model on two alphabets: &#931;<sub>1 </sub>with <it>matches M </it>= {AU, GC, GU}, <it>indels I </it>= {A-, G-, C-, U-} and <it>mismatches N</it><sub>1 </sub>= {AA, GG, CC, UU, AC, AG, CU}; and &#931;<sub>2</sub>, which is similar to &#931;<sub>1 </sub>except that the mismatch set is more concise: <it>N</it><sub>2 </sub>= {XX, XY}, where XX stands for one of {AA, GG, CC, UU} and XY stands for one of {AC, AG, CU}. In our alphabet, a <it>match</it>, say, AU has the same probability as UA, that is, an 'A' on either stem base paired with 'U' on the other stem. Cross-validation tests using <it>Maximum Likelihood Estimate </it>(MLE) showed that the model with alphabet &#931;<sub>1 </sub>performed substantially better, both in terms of sensitivity and specificity (Table <tblr tid="T2">2</tblr>) (see <it>Methods </it>for more details on these calculations).</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Results for different alphabet sizes: &#931;<sub>1 </sub>(larger alphabet) shows better accuracy than &#931;<sub>2 </sub>(smaller alphabet)</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Alphabet</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sn</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sp</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>FDR</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#931;<sub>1</sub></p>
                     </c>
                     <c ca="left">
                        <p>74.5</p>
                     </c>
                     <c ca="left">
                        <p>94.1</p>
                     </c>
                     <c ca="left">
                        <p>15.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>&#931;<sub>2</sub></p>
                     </c>
                     <c ca="left">
                        <p>55.0</p>
                     </c>
                     <c ca="left">
                        <p>48.5</p>
                     </c>
                     <c ca="left">
                        <p>51.0</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sn: Sensitivity; Sp: Specificity; FPR: False Positive rate; FDR: False Discovery rate. All numbers are in percentages.</p>
               </tblfn>
            </tbl>
            <p>It is surprising that &#931;<sub>1 </sub>performs better than &#931;<sub>2</sub>, because one would expect that mismatches in the stem-loop would not be characteristic of the miRNA sequence, since they do not contribute to the base pairing of the stem and thus the overall folding energy, on which other algorithms are based <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Furthermore, &#931;<sub>1 </sub>alphabet has more parameters. In order to rule out that the better performance is due to parameter overfitting, we repeated training with multiple datasets of different sizes and the results remained the same (<it>data not shown</it>). In the remaining of this paper we use the &#931;<sub>1 </sub>alphabet.</p>
         </sec>
         <sec>
            <st>
               <p>Training algorithms: performance evaluation</p>
            </st>
            <p>We implemented and compared variations of two existing algorithms for parameter estimation: Baum-Welch and MLE. The positive model was trained using MLE since by nature the positive training data (stem-loop hairpins) can be labelled as <it>loop</it>, <it>extension</it>, <it>miRNA </it>and <it>pri-extension </it>(Figure <figr fid="F2">2b</figr>) using existing annotations. Negative data on the other hand, are obviously unlabelled, so both algorithms were compared for training with this dataset. We will call the MLE trained negative model, MLE-HHMMiR, whereas the Baum Welch trained model will be called BW-HHMMiR for this evaluation. For MLE-HHMMiR, we used length distributions from database summarization (Table <tblr tid="T1">1</tblr>) to perform <it>random labelling </it>of the four regions on the negative datasets. Overall, we found both methods to perform practically the same. The area under the ROC curve (Figure <figr fid="F4">4</figr>) for the MLE-HHMMiR is 0.912 whereas for BW-HHMMiR is 0.920. The ratio of the log-likelihoods output by the two models decides the fate of the test hairpin. In order to decide a threshold for this ratio, the trade-off between sensitivity and specificity was considered by calculating the <it>Mathews correlation coefficient </it>(Table <tblr tid="T3">3</tblr>). The highest Mathews correlation coefficient value was 0.73 for BW-HHMMiR and 0.71 for MLE-HHMMiR, corresponding to likelihood ratio thresholds of 0.71 and 0.99, respectively. BW-HHMMiR achieved an average 84% sensitivity and 88% specificity using the 0.71 ratio as thresholds. Even though, the difference between the performances of the two algorithms is not great, we choose BW-HHMMiR for further tests. This is because MLE-HHMMiR depends on <it>random labelling </it>of hairpins and thus, performance will vary according to the labelling. The drawback of the Baum-Welch method is that it might be trapped on local optima, depending on the initialization. This problem is sometimes addressed by running the algorithm multiple times with different starting points. We use a uniform distribution for this initialization but can also use background frequencies for the same by folding the entire genome in question and then performing hairpin extraction for the same. In order to account for the absence of certain base pairs or <it>indels </it>in a certain sequence while using Baum-Welch, we introduce pseudo-counts to correct for the same.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>ROC curves for Baum-Welch and MLE training on the negative model</p>
               </caption>
               <text>
                  <p><b>ROC curves for Baum-Welch and MLE training on the negative model</b>. 10-fold cross-validation used with Baum-Welch (<it>black curve</it>) and MLE (<it>red curve</it>) for training the negative model. Positive model was trained using MLE in both cases.</p>
               </text>
               <graphic file="1471-2105-10-S1-S35-4"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Results for cross-validation using different algorithms.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Method</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sn (SD)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sp (SD)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>MCC</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>FDR (SD)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Baum-Welch</p>
                     </c>
                     <c ca="center">
                        <p>84.0 (18.6)</p>
                     </c>
                     <c ca="center">
                        <p>88.0 (6.6)</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>11.8 (5.6)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>MLE</p>
                     </c>
                     <c ca="center">
                        <p>74.5 (13.7)</p>
                     </c>
                     <c ca="center">
                        <p>94.1 (2.7)</p>
                     </c>
                     <c ca="center">
                        <p>0.71</p>
                     </c>
                     <c ca="center">
                        <p>15.9 (8.0)</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sn: sensitivity; Sp: specificity; MCC: Mathew's correlation coefficient; FDR: False Discovery Rate. Sn, Sp and FDR report the average percent values; standard deviations are reported in parentheses.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Testing prediction efficiency in other organisms</p>
            </st>
            <p>Next, we examined how well our model trained on human sequences could predict known miRNAs in other species. In particular, HHMMiR was tested on the following species: <it>M. musculus </it>(mammals), <it>G. gallus </it>(birds), <it>D. rerio </it>(fish), <it>C. elegans </it>(worms), <it>D. melanogaster </it>(flies), <it>A. thaliana </it>and <it>O. sativa</it>(plants). These species were chosen as representatives of their respective taxonomic groups, and because they are well studied and annotated. The results are shown in Table <tblr tid="T4">4</tblr>. HHMMiR is able to predict 85% of most animal precursors. Its overall sensitivity was also about 85%. What is more surprising, however, is the higher performance we observe in prediction of plant precursors, given the differences in length distributions of the miRNA stem-loops between plants and animals (Table <tblr tid="T1">1</tblr>). The fact that mouse miRNAs are predicted at lower rate probably reflects the larger number of hairpins registered for this species, many of which are not biochemically verified. Such discrepancies have been observed in other studies as well, although at a lesser extent (<it>e.g</it>., <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>). The specificity over the mouse data is very high (84%) and remains surprisingly high in the two invertebrate species (~75%) (<it>data not shown</it>).</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Results of tests on other species.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Total hairpins</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>% correctly predicted</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. musculus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>422</p>
                     </c>
                     <c ca="center">
                        <p>74.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>G. gallus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>147</p>
                     </c>
                     <c ca="center">
                        <p>89.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. rerio</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>334</p>
                     </c>
                     <c ca="center">
                        <p>88.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>131</p>
                     </c>
                     <c ca="center">
                        <p>85.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>143</p>
                     </c>
                     <c ca="center">
                        <p>93.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>A. thaliana</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>114</p>
                     </c>
                     <c ca="center">
                        <p>97.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>O. sativa</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>188</p>
                     </c>
                     <c ca="center">
                        <p>85.7</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Total</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>1479</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>85.1</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Comparison with other approaches</p>
            </st>
            <p>As described earlier, there are very few machine learning methods that do not require evolutionary information to predict miRNAs. To our knowledge, the only other probabilistic model is a motif finding method for mature miRNA region prediction <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. An SVM-based approach has been proposed <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> that parses the <it>mfe </it>structure in "triplets": structural information about the pairing states of every three nucleotides, represented using dot-bracket notation. This method showed an accuracy of ~90% using the data available in the registry at the time. We used the same training and test sets used by the "triplet SVM" to train and test our model, HHMMiR, and we found it to perform better in almost all datasets (Table <tblr tid="T5">5</tblr>). The only exceptions are the mouse (but not rat) and <it>A. thaliana </it>(but not rice). Also, their model was able to predict all of the then five known miRNAs from Epstein-Barr virus, whereas HHMMiR predicted four. Overall, HHMMiR exhibits sensitivity of 93.2% and specificity of 89% in these datasets. If we limit the comparison of the two methods in one representative species from each taxon (<it>M. musculus</it>, <it>G. gallus</it>, <it>D. rerio</it>, <it>C. elegans</it>, <it>D. melanogaster</it>, <it>A. thaliana</it>, <it>Epstein Barr virus</it>) in order to minimize the statistical dependence of the data, the difference in the performance becomes statistically significant at the 5% level (<it>p</it>-value = 0.03, Wilcoxon paired test on the predicted number of genes).</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Results for comparison between two precursor prediction methods.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Test Set</p>
                     </c>
                     <c ca="center">
                        <p>Total hairpins</p>
                     </c>
                     <c ca="center">
                        <p>Triplet SVM (%)</p>
                     </c>
                     <c ca="center">
                        <p>HHMMiR (%)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Positive Sets</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>New human hairpins in registry at the time.</p>
                     </c>
                     <c ca="center">
                        <p>39</p>
                     </c>
                     <c ca="center">
                        <p>92.3</p>
                     </c>
                     <c ca="center">
                        <p>97.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>M. musculus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>94.4</p>
                     </c>
                     <c ca="center">
                        <p>88.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>R. norvegicus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                     <c ca="center">
                        <p>80.0</p>
                     </c>
                     <c ca="center">
                        <p>84.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>G. gallus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>84.6</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. rerio</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6</p>
                     </c>
                     <c ca="center">
                        <p>66.7</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>110</p>
                     </c>
                     <c ca="center">
                        <p>86.4</p>
                     </c>
                     <c ca="center">
                        <p>90.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>C. briggsae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>95.9</p>
                     </c>
                     <c ca="center">
                        <p>95.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. melanogaster</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>91.6</p>
                     </c>
                     <c ca="center">
                        <p>95.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>D. pseudoobscura</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                     <c ca="center">
                        <p>90.1</p>
                     </c>
                     <c ca="center">
                        <p>98.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>A. thaliana</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                     <c ca="center">
                        <p>92.0</p>
                     </c>
                     <c ca="center">
                        <p>97.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>O. sativa</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>94.8</p>
                     </c>
                     <c ca="center">
                        <p>86.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Epstein Barr virus</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>80.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>TOTAL</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>620</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.2</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Negative Sets</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Folded genome hairpins from Chromosome 19</p>
                     </c>
                     <c ca="center">
                        <p>2444</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>88.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Negative hairpin Set</p>
                     </c>
                     <c ca="center">
                        <p>1000</p>
                     </c>
                     <c ca="center">
                        <p>88.1</p>
                     </c>
                     <c ca="center">
                        <p>89.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>TOTAL</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>3444</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.8</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The percentages represent the ratio of hairpins correctly predicted.</p>
               </tblfn>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>MiRNA genes constitute one of the most conserved mechanisms for gene regulation across all animal and plant species. The characteristics of the precursor miRNA stem-loops are well conserved in both vertebrate and invertebrate animals and fairly conserved between animals and plants. As seen in Table <tblr tid="T1">1</tblr>, plant hairpins tend to be generally longer than those in animals, while vertebrates have shorter precursors than invertebrates. Although, the "extension" and "pri-extension" regions may vary in length between animals and plants (much longer in plants), the lengths of the "miRNA" and "loop" regions are more similar. Thus, even across evolutionary time, the basic characteristics of miRNAs have not changed dramatically.</p>
         <p>We designed a template for a typical precursor miRNA stem-loop and we built an HHMM based on it. HHMMiR was able to attain an average sensitivity of 84% and specificity of 88% on 10-fold cross validation of human data. We trained HHMMiR on human sequences and the resulting model was able to successfully identify a large percentage of not only mouse, but also invertebrate, plant and virus miRNAs (Table <tblr tid="T4">4</tblr>). This is an encouraging result showing that HHMMiR may be very useful in predicting miRNA genes across long evolutionary distances without the requirement for evolutionary conservation of sequences. This would be very beneficial for identification of miRNA hairpins in organisms that do not have closely related species sequenced, such as <it>Strongylocentrotus purpuratus </it>(sea urchin) and <it>Ornithorhynchus anatinus </it>(platypus) <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         <p>This is the first time a hierarchical probabilistic model has been used for classification and identification of miRNA hairpins. Probabilistic learning was previously applied by Nam <it>et al. </it><abbrgrp><abbr bid="B32">32</abbr></abbrgrp> for identifying the miRNA pattern/motif in hairpins. The advantage of the hierarchy used by our HHMMiR is that it parses each hairpin into four distinct regions and processes each of them separately. This represents better the biological role of each region, which is reflected in the distinct length distributions and neighbourhood base-pairing characteristic of that region. Furthermore, the underlying HHMM provides an intuitive modelling of these regions. We compared two modifications of the MLE and Baum-Welch algorithms for modelling the negative datasets, and we found them to perform similarly. Baum-Welch was selected for this study, since it does not require (random) labelling of the negative set.</p>
         <p>The drawback of HHMMiR is that it depends on the <it>mfe </it>structure the <it>RNAfold </it>program returns <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. In the future, we will test more folding algorithms or use the probability distribution of a number of top scoring folding energy structures returned by this package.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The success of our approach shows that the conservation of the miRNA mechanism may be at a much deeper level than expected. Further developments of the HHMMiR algorithm include the extension of the precursor template model (Figure <figr fid="F3">3</figr>) to be able to predict pri-miRNA genes with multiple stem-loops. Another extension would be to train a model to decode all HHMMiR predicted hairpins to identify the miRNA genes in them. Finally, it will be interesting to extend our method to include evolutionary information, which will allow us to assess the significance of conservation in predicting miRNA genes.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data collection and processing</p>
            </st>
            <sec>
               <st>
                  <p>MiRNA dataset</p>
               </st>
               <p>MiRNA genes were obtained from the <it>microRNA registry</it>, version 10.1 (December 2007) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, which contains 3265 miRNAs from animals and 870 from plants. For training HHMMiR, we used the residual 525 human hairpins, after filtering out precursor genes with multiple loops. Each gene was folded with the <it>RNAfold </it>program, which is part of the Vienna package <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, using the default parameters to obtain the secondary structure with minimum fold energy. The negative set consists of coding regions and random genomic segments from the human genome that were obtained using the UCSC genome browser <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. These regions were folded and processed as described below.</p>
            </sec>
            <sec>
               <st>
                  <p>Hairpin processing</p>
               </st>
               <p>Genomic sequences were folded in windows of 1 Kb, 500 nts and 300 nts with an overlap of 150 nts between consecutive windows. Nodes from the TeraGrid project <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> were used for this purpose. We tested the various window sizes on the relatively small <it>C. elegans </it>genome and discovered that 500 nts windows cover most known miRNA hairpins. Windows of 300 nts exhibited high degree of redundancy without adding more hairpins to those of the 500 nts windows, while 1 kb windows missed a higher percentage of known miRNAs (<it>data not shown</it>). For this study, we used hairpins extracted from windows of 500 nts. We were able to recover ~92% of the known miRNAs from <it>C. elegans </it>in this way. The remaining 8% may have been accounted for by existence of multiple loops or specificity of the parameters used. The hairpins were extracted from these folded windows using the following parameters: each hairpin has at least 10 base pairs, has a maximum length of 20 bases for the loop, and a minimum length of 50 nucleotides. The data flow of this process is presented in Figure <figr fid="F5">5</figr>.</p>
               <fig id="F5">
                  <title>
                     <p>Figure 5</p>
                  </title>
                  <caption>
                     <p>Data flow for hairpin extraction from the genome</p>
                  </caption>
                  <text>
                     <p><b>Data flow for hairpin extraction from the genome</b>. The genome is first folded using windows of 500 nts with 150 nts overlap between consecutive windows. Hairpins are then extracted from the folded windows using the parameters described in the text. Hairpins are pre-processed into a suitable format for training/testing using the states shown in Figure 3 (L: Loop; E: Extension; R: miRNA; P: pri-miRNA extension). For the purpose of testing, the folded sequence is pre-processed into 2 lines of input representing the 2 stems of the hairpin. An example is given in Figure 2b.</p>
                  </text>
                  <graphic file="1471-2105-10-S1-S35-5"/>
               </fig>
               <p>After the hairpins are extracted, we process them to an input format representing the hairpin's secondary structure (Figure <figr fid="F5">5</figr> and Figure <figr fid="F2">2</figr>) to be compatible with the HHMM shown in Figure <figr fid="F3">3</figr>. The labelling is done only for training data. For the purpose of labelling, the miRNA is first mapped to the folded hairpin (on either or both arms), and then the region representing the miRNA is labelled as the duplex miRNA (R) region. Our method does not consider the 3' overhangs generated during Dicer processing. The main bulge is labelled as the loop (L), whereas the remaining region between loop and miRNA is represented as the extension (E). The rest of the hairpin beyond the miRNA is labelled as pri-extension (P). A detailed description of these regions is given in the <it>Results </it>section.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Parameter estimation and testing</p>
            </st>
            <sec>
               <st>
                  <p>Parameter estimation</p>
               </st>
               <p>Two separate HHMM models are trained, one on positive data set (miRNAs and their corresponding hairpins) and the other on negative data set (hairpins, randomly chosen from the coding parts of the genome). The hairpins are pre-processed and labelled (if needed) before parameter estimation. Baum-Welch requires no labelling, but for MLE, we applied random labelling, as described above (Figure <figr fid="F2">2a</figr>).</p>
               <p>The <it>alphabet </it>is denoted by &#931; = {&#963;<sub><it>i</it></sub>} and the observed finite string is denoted by <b><it>O </it></b>= <it>o</it><sub>1</sub><it>o</it><sub>2 </sub>... <it>o</it><sub><it>N </it></sub>such that <it>o</it><sub><it>i </it></sub>&#8712; &#931;. The <it>i</it><sup><it>th </it></sup>state at hierarchical level <it>d </it>is denoted as <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i1"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mi>d</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKbaaaaa@3019@</m:annotation></m:semantics></m:math></inline-formula> (denoted as <it>q</it><sup><it>d </it></sup>in absence of ambiguity). The highest level of hierarchy (that of the root) is 1 while the lowest (that of the production states) is <it>D </it>(in our case, <it>D </it>= 3). The number of substates of each <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i1"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mi>d</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKbaaaaa@3019@</m:annotation></m:semantics></m:math></inline-formula> (<it>d </it>&#8712; {1, 2, ... <it>D-1</it>} is |<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i1"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mi>d</m:mi></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKbaaaaa@3019@</m:annotation></m:semantics></m:math></inline-formula>|. The parameter set of an HHMM is denoted by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i2">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>&#955;</m:mi>
                              <m:mo>=</m:mo>
                              <m:msub>
                                 <m:mrow>
                                    <m:mrow>
                                       <m:mo>{</m:mo>
                                       <m:mrow>
                                          <m:msup>
                                             <m:mi>&#955;</m:mi>
                                             <m:mrow>
                                                <m:msup>
                                                   <m:mi>q</m:mi>
                                                   <m:mi>d</m:mi>
                                                </m:msup>
                                             </m:mrow>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mrow>
                                    <m:mi>d</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mrow>
                                       <m:mo>{</m:mo>
                                       <m:mrow>
                                          <m:mn>1</m:mn>
                                          <m:mo>,</m:mo>
                                          <m:mn>...</m:mn>
                                          <m:mo>,</m:mo>
                                          <m:mi>D</m:mi>
                                       </m:mrow>
                                       <m:mo>}</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:msub>
                              <m:mo>=</m:mo>
                              <m:mrow>
                                 <m:mo>{</m:mo>
                                 <m:mtable columnalign="left">
                                    <m:mtr>
                                       <m:mtd>
                                          <m:msub>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:mi>A</m:mi>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:msup>
                                                            <m:mi>q</m:mi>
                                                            <m:mi>d</m:mi>
                                                         </m:msup>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>d</m:mi>
                                                <m:mo>&#8712;</m:mo>
                                                <m:mrow>
                                                   <m:mo>{</m:mo>
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mn>...</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mi>D</m:mi>
                                                   </m:mrow>
                                                   <m:mo>}</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>,</m:mo>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:msub>
                                             <m:mrow>
                                                <m:mo>{</m:mo>
                                                <m:mrow>
                                                   <m:mo>&#8719;</m:mo>
                                                   <m:mrow>
                                                      <m:mo>(</m:mo>
                                                      <m:mrow>
                                                         <m:msup>
                                                            <m:mi>q</m:mi>
                                                            <m:mi>d</m:mi>
                                                         </m:msup>
                                                      </m:mrow>
                                                      <m:mo>)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mo>}</m:mo>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>d</m:mi>
                                                <m:mo>&#8712;</m:mo>
                                                <m:mrow>
                                                   <m:mo>{</m:mo>
                                                   <m:mrow>
                                                      <m:mn>1</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mn>...</m:mn>
                                                      <m:mo>,</m:mo>
                                                      <m:mi>D</m:mi>
                                                   </m:mrow>
                                                   <m:mo>}</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                          </m:msub>
                                          <m:mo>,</m:mo>
                                       </m:mtd>
                                    </m:mtr>
                                    <m:mtr>
                                       <m:mtd>
                                          <m:mrow>
                                             <m:mo>{</m:mo>
                                             <m:mrow>
                                                <m:mi>E</m:mi>
                                                <m:mrow>
                                                   <m:mo>(</m:mo>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mi>q</m:mi>
                                                         <m:mi>D</m:mi>
                                                      </m:msup>
                                                   </m:mrow>
                                                   <m:mo>)</m:mo>
                                                </m:mrow>
                                             </m:mrow>
                                             <m:mo>}</m:mo>
                                          </m:mrow>
                                       </m:mtd>
                                    </m:mtr>
                                 </m:mtable>
                                 <m:mo>}</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4UdWMaeyypa0ZaaiWaaeaacqaH7oaBdaahaaWcbeqaaiabdghaXnaaCaaameqabaGaemizaqgaaaaaaOGaay5Eaiaaw2haamaaBaaaleaacqWGKbazcqGHiiIZdaGadaqaaiabigdaXiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdseaebGaay5Eaiaaw2haaaqabaGccqGH9aqpdaGadaabaeqabaWaaiWaaeaacqWGbbqqdaqadaqaaiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLOaGaayzkaaaacaGL7bGaayzFaaWaaSbaaSqaaiabdsgaKjabgIGiopaacmaabaGaeGymaeJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemiraqeacaGL7bGaayzFaaaabeaakiabcYcaSaqaamaacmaabaGaey4dIu9aaeWaaeaacqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaayjkaiaawMcaaaGaay5Eaiaaw2haamaaBaaaleaacqWGKbazcqGHiiIZdaGadaqaaiabigdaXiabcYcaSiabc6caUiabc6caUiabc6caUiabcYcaSiabdseaebGaay5Eaiaaw2haaaqabaGccqGGSaalaeaadaGadaqaaiabdweafnaabmaabaGaemyCae3aaWbaaSqabeaacqWGebaraaaakiaawIcacaGLPaaaaiaawUhacaGL9baaaaGaay5Eaiaaw2haaaaa@7554@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
               <p><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i3"><m:semantics><m:mrow><m:mi>A</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>a</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:msubsup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mi>P</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>k</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mo>|</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyqae0aaeWaaeaacqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaayjkaiaawMcaaiabg2da9maabmaabaGaemyyae2aa0baaSqaaiabdQgaQjabdUgaRbqaaiabdghaXnaaCaaameqabaGaemizaqgaaaaaaOGaayjkaiaawMcaaiabg2da9iabdcfaqnaabmaabaGaemyCae3aa0baaSqaaiabdUgaRbqaaiabdsgaKjabgUcaRiabigdaXaaakmaaeeaabaGaemyCae3aa0baaSqaaiabdQgaQbqaaiabdsgaKjabgUcaRiabigdaXaaaaOGaay5bSdaacaGLOaGaayzkaaaaaa@4CA5@</m:annotation></m:semantics></m:math></inline-formula> denoted by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i4"><m:semantics><m:mrow><m:msub><m:mrow><m:mrow><m:mo>{</m:mo><m:mrow><m:mi>A</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:mrow><m:mi>d</m:mi><m:mo>&#8712;</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mn>1</m:mn><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:mi>D</m:mi></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaiWaaeaacqWGbbqqdaqadaqaaiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLOaGaayzkaaaacaGL7bGaayzFaaWaaSbaaSqaaiabdsgaKjabgIGiopaacmaabaGaeGymaeJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemiraqeacaGL7bGaayzFaaaabeaaaaa@3F2C@</m:annotation></m:semantics></m:math></inline-formula> is the state <it>transition matrix </it>of each internal substate, with <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i5"><m:semantics><m:mrow><m:msubsup><m:mi>a</m:mi><m:mrow><m:mi>j</m:mi><m:mi>k</m:mi></m:mrow><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:msubsup><m:mo>=</m:mo><m:mi>P</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mo>|</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyyae2aa0baaSqaaiabdQgaQjabdUgaRbqaaiabdghaXnaaCaaameqabaGaemizaqgaaaaakiabg2da9iabdcfaqnaabmaabaGaemyCae3aa0baaSqaaiabdQgaQbqaaiabdsgaKjabgUcaRiabigdaXaaakmaaeeaabaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaOGaay5bSdaacaGLOaGaayzkaaaaaa@448B@</m:annotation></m:semantics></m:math></inline-formula> representing the probability that the <it>j</it><sup><it>th </it></sup>substate of <it>q</it><sup><it>d </it></sup>will transition to the <it>k</it><sup><it>th </it></sup>substate of <it>q</it><sup><it>d</it></sup>. Each internal state <it>q</it><sup><it>d </it></sup>has also an <it>initial distribution vector </it><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i6"><m:semantics><m:mrow><m:mo>&#8719;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mi>&#960;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow><m:mo>=</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mi>P</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaey4dIu9aaeWaaeaacqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaayjkaiaawMcaaiabg2da9maacmaabaGaeqiWda3aaeWaaeaacqWGXbqCdaqhaaWcbaGaemOAaOgabaGaemizaqMaey4kaSIaeGymaedaaOWaaqqaaeaacqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaay5bSdaacaGLOaGaayzkaaaacaGL7bGaayzFaaGaeyypa0ZaaiWaaeaacqWGqbaudaqadaqaaiabdghaXnaaDaaaleaacqWGQbGAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGcdaabbaqaaiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLhWoaaiaawIcacaGLPaaaaiaawUhacaGL9baaaaa@539B@</m:annotation></m:semantics></m:math></inline-formula> denoted by <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i7"><m:semantics><m:mrow><m:msub><m:mrow><m:mrow><m:mo>{</m:mo><m:mrow><m:mo>&#8719;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:mrow><m:mi>d</m:mi><m:mo>&#8712;</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mn>1</m:mn><m:mo>,</m:mo><m:mn>...</m:mn><m:mo>,</m:mo><m:mi>D</m:mi></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaWaaiWaaeaacqGHpis1daqadaqaaiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLOaGaayzkaaaacaGL7bGaayzFaaWaaSbaaSqaaiabdsgaKjabgIGiopaacmaabaGaeGymaeJaeiilaWIaeiOla4IaeiOla4IaeiOla4IaeiilaWIaemiraqeacaGL7bGaayzFaaaabeaaaaa@3FB3@</m:annotation></m:semantics></m:math></inline-formula>, where <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i8"><m:semantics><m:mrow><m:mi>&#960;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqiWda3aaeWaaeaacqWGXbqCdaqhaaWcbaGaemOAaOgabaGaemizaqMaey4kaSIaeGymaedaaOWaaqqaaeaacqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaay5bSdaacaGLOaGaayzkaaaaaa@39C4@</m:annotation></m:semantics></m:math></inline-formula> is the probability that <it>q</it><sup><it>d </it></sup>will make a vertical transition to its <it>j</it><sup><it>th </it></sup>substate at level <it>d+</it>1, thus, activating it. The production states <it>q</it><sup><it>D </it></sup>will have <it>emission probability vector </it>or the <it>output distribution vector </it><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i9"><m:semantics><m:mrow><m:mi>E</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>D</m:mi></m:msup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mrow><m:mi>D</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mi>e</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msub><m:mi>&#963;</m:mi><m:mi>l</m:mi></m:msub><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>D</m:mi></m:msup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mrow><m:mi>D</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow><m:mo>=</m:mo><m:mrow><m:mo>{</m:mo><m:mrow><m:mi>P</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:msub><m:mi>&#963;</m:mi><m:mi>l</m:mi></m:msub><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>D</m:mi></m:msup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mrow><m:mi>D</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msup></m:mrow></m:mrow></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mo>}</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyrau0aaeWaaeaacqWGXbqCdaahaaqabeaacqWGebaraaGaeiilaWIaemyCae3aaWbaaeqabaGaemiraqKaeyOeI0IaeGymaedaaaGaayjkaiaawMcaaiabg2da9maacmaabaGaemyzau2aaeWaaeaacqaHdpWCdaWgaaqaaiabdYgaSbqabaWaaqqaaeaacqWGXbqCdaahaaqabeaacqWGebaraaGaeiilaWIaemyCae3aaWbaaeqabaGaemiraqKaeyOeI0IaeGymaedaaaGaay5bSdaacaGLOaGaayzkaaaacaGL7bGaayzFaaGaeyypa0ZaaiWaaeaacqWGqbaudaqadaqaaiabeo8aZnaaBaaabaGaemiBaWgabeaadaabbaqaaiabdghaXnaaCaaabeqaaiabdseaebaacqGGSaalcqWGXbqCdaahaaqabeaacqWGebarcqGHsislcqaIXaqmaaaacaGLhWoaaiaawIcacaGLPaaaaiaawUhacaGL9baaaaa@5C0A@</m:annotation></m:semantics></m:math></inline-formula> denoted by {<it>E</it>(<it>q</it><sup><it>D</it></sup>)} where <it>e</it>(&#963;<sub><it>l</it></sub>|<it>q</it><sup><it>D</it></sup>, <it>q</it><sup><it>D</it>-1</sup>) is the probability that production state <it>q</it><sup><it>D </it></sup>will emit symbol <it>&#963;</it><sub><it>l </it></sub>&#8712; &#931;.</p>
               <p>Now we will define the various probabilities that are required to be calculated for parameter estimation.</p>
               <p>(<it>i</it>) <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i10"><m:semantics><m:mrow><m:mi>&#945;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>t</m:mi><m:mo>,</m:mo><m:mi>t</m:mi><m:mo>+</m:mo><m:mi>k</m:mi><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mi>P</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>o</m:mi><m:mi>t</m:mi></m:msub><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:msub><m:mi>o</m:mi><m:mrow><m:mi>t</m:mi><m:mo>+</m:mo><m:mi>k</m:mi></m:mrow></m:msub><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqySde2aaeWaaeaacqWG0baDcqGGSaalcqWG0baDcqGHRaWkcqWGRbWAcqGGSaalcqWGXbqCdaqhaaWcbaGaemyAaKgabaGaemizaqMaey4kaSIaeGymaedaaOGaeiilaWIaemyCae3aaWbaaSqabeaacqWGKbazaaaakiaawIcacaGLPaaacqGH9aqpcqWGqbaucqGGOaakcqWGVbWBdaWgaaWcbaGaemiDaqhabeaakiabgwSixlabgwSixlabgwSixlabd+gaVnaaBaaaleaacqWG0baDcqGHRaWkcqWGRbWAaeqaaOGaeiilaWIaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@590B@</m:annotation></m:semantics></m:math></inline-formula> finished at <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i11"><m:semantics><m:mrow><m:msub><m:mi>o</m:mi><m:mrow><m:mi>t</m:mi><m:mo>+</m:mo><m:mi>k</m:mi></m:mrow></m:msub><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4Ba82aaSbaaSqaaiabdsha0jabgUcaRiabdUgaRbqabaGcdaabbaqaaiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLhWoaaaa@35AB@</m:annotation></m:semantics></m:math></inline-formula> started at <it>o</it><sub><it>t</it></sub>) is the <it>forward probability </it>of emitting the substring <it>o</it><sub><it>t </it></sub>... <it>o</it><sub><it>t</it>+<it>k </it></sub>of the observation sequence by the parent state <it>q</it><sup><it>d </it></sup>such that it was entered at <it>o</it><sub><it>t </it></sub>and the subsequence ended at substate <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i12"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@31EB@</m:annotation></m:semantics></m:math></inline-formula> and thus, it was the last state activated.</p>
               <p>(<it>ii</it>) <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i13"><m:semantics><m:mrow><m:mi>&#967;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>t</m:mi><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Xdm2aaeWaaeaacqWG0baDcqGGSaalcqWGXbqCdaqhaaWcbaGaemyAaKgabaGaemizaqMaey4kaSIaeGymaedaaOGaeiilaWIaemyCae3aaWbaaSqabeaacqWGKbazaaaakiaawIcacaGLPaaaaaa@3B59@</m:annotation></m:semantics></m:math></inline-formula> is the probability of making a vertical transition from parent <it>q</it><sup><it>d </it></sup>to <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i12"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@31EB@</m:annotation></m:semantics></m:math></inline-formula> just before the emission of <it>o</it><sub><it>t</it></sub>.</p>
               <p>(<it>iii</it>) <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i14"><m:semantics><m:mrow><m:mi>&#958;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>t</m:mi><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mi>P</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mi>o</m:mi><m:mn>1</m:mn></m:msub><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:msub><m:mi>o</m:mi><m:mi>t</m:mi></m:msub><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>&#8594;</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msub><m:mi>o</m:mi><m:mrow><m:mi>t</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msub><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:mo>&#8901;</m:mo><m:msub><m:mi>o</m:mi><m:mi>N</m:mi></m:msub><m:mrow><m:mo>|</m:mo><m:mi>&#955;</m:mi></m:mrow><m:mo stretchy="false">)</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeqOVdG3aaeWaaeaacqWG0baDcqGGSaalcqWGXbqCdaqhaaWcbaGaemyAaKgabaGaemizaqMaey4kaSIaeGymaedaaOGaeiilaWIaemyCae3aa0baaSqaaiabdQgaQbqaaiabdsgaKjabgUcaRiabigdaXaaakiabcYcaSiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLOaGaayzkaaGaeyypa0JaemiuaaLaeiikaGIaem4Ba82aaSbaaSqaaiabigdaXaqabaGccqGHflY1cqGHflY1cqGHflY1cqWGVbWBdaWgaaWcbaGaemiDaqhabeaakiabcYcaSiabdghaXnaaDaaaleaacqWGPbqAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGccqGHsgIRcqWGXbqCdaqhaaWcbaGaemOAaOgabaGaemizaqMaey4kaSIaeGymaedaaOGaeiilaWIaem4Ba82aaSbaaSqaaiabdsha0jabgUcaRiabigdaXaqabaGccqGHflY1cqGHflY1cqGHflY1cqWGVbWBdaWgaaWcbaGaemOta4eabeaakmaaeeaabaGaeq4UdWgacaGLhWoacqGGPaqkaaa@7478@</m:annotation></m:semantics></m:math></inline-formula> is the probability of making a horizontal transition from <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i12"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@31EB@</m:annotation></m:semantics></m:math></inline-formula> to <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i15"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>j</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdQgaQbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@31ED@</m:annotation></m:semantics></m:math></inline-formula> where both are substates of <it>q</it><sup><it>d </it></sup>after the emission of <it>o</it><sub><it>t </it></sub>and before the emission of <it>o</it><sub><it>t</it>+1</sub>.</p>
               <p>(<it>iv</it>) <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i16"><m:semantics><m:mrow><m:msub><m:mi>&#947;</m:mi><m:mrow><m:mi>i</m:mi><m:mi>n</m:mi></m:mrow></m:msub><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>t</m:mi><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow><m:mo>=</m:mo><m:mstyle displaystyle="true"><m:munderover><m:mo>&#8721;</m:mo><m:mrow><m:mi>k</m:mi><m:mo>=</m:mo><m:mn>1</m:mn></m:mrow><m:mrow><m:mrow><m:mo>|</m:mo><m:mrow><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>|</m:mo></m:mrow></m:mrow></m:munderover><m:mrow><m:mi>&#958;</m:mi><m:mrow><m:mo>(</m:mo><m:mrow><m:mi>t</m:mi><m:mo>&#8722;</m:mo><m:mn>1</m:mn><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>k</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup><m:mo>,</m:mo><m:msup><m:mi>q</m:mi><m:mi>d</m:mi></m:msup></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow></m:mstyle></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4SdC2aaSbaaSqaaiabdMgaPjabd6gaUbqabaGcdaqadaqaaiabdsha0jabcYcaSiabdghaXnaaDaaaleaacqWGPbqAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGccqGGSaalcqWGXbqCdaahaaWcbeqaaiabdsgaKbaaaOGaayjkaiaawMcaaiabg2da9maaqahabaGaeqOVdG3aaeWaaeaacqWG0baDcqGHsislcqaIXaqmcqGGSaalcqWGXbqCdaqhaaWcbaGaem4AaSgabaGaemizaqMaey4kaSIaeGymaedaaOGaeiilaWIaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaakiabcYcaSiabdghaXnaaCaaaleqabaGaemizaqgaaaGccaGLOaGaayzkaaaaleaacqWGRbWAcqGH9aqpcqaIXaqmaeaadaabdaqaaiabdghaXnaaCaaameqabaGaemizaqgaaaWccaGLhWUaayjcSdaaniabggHiLdaaaa@6364@</m:annotation></m:semantics></m:math></inline-formula> is the probability of performing a horizontal transition to <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i12"><m:semantics><m:mrow><m:msubsup><m:mi>q</m:mi><m:mi>i</m:mi><m:mrow><m:mi>d</m:mi><m:mo>+</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaaaaa@31EB@</m:annotation></m:semantics></m:math></inline-formula> which is substate of <it>q</it><sup><it>d </it></sup>before <it>o</it><sub><it>t </it></sub>is emitted. Further details on the algorithms are given in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and in Additional file <supplr sid="S2">2</supplr>.</p>
               <suppl id="S2">
                  <title>
                     <p>Additional File 2</p>
                  </title>
                  <text>
                     <p>This file contains a more detailed description of the algorithms used for parameter estimation and classification using HHMMs.</p>
                  </text>
                  <file name="1471-2105-10-S1-S35-S2.pdf">
                     <p>Click here for file</p>
                  </file>
               </suppl>
               <p>The parameters are estimated as follows:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i17">
                        <m:semantics>
                           <m:mrow>
                              <m:mtable columnalign="left">
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mover accent="true">
                                             <m:mi>&#960;</m:mi>
                                             <m:mo>^</m:mo>
                                          </m:mover>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msubsup>
                                                   <m:mi>q</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mn>2</m:mn>
                                                </m:msubsup>
                                                <m:mrow>
                                                   <m:mo>|</m:mo>
                                                   <m:mrow>
                                                      <m:msup>
                                                         <m:mi>q</m:mi>
                                                         <m:mn>1</m:mn>
                                                      </m:msup>
                                                   </m:mrow>
                                                </m:mrow>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                          <m:mo>=</m:mo>
                                          <m:mi>&#967;</m:mi>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mi>t</m:mi>
                                                <m:mo>,</m:mo>
                                                <m:msubsup>
                                                   <m:mi>q</m:mi>
                                                   <m:mi>i</m:mi>
                                                   <m:mn>2</m:mn>
                                                </m:msubsup>
                                                <m:mo>,</m:mo>
                                                <m:msup>
                                                   <m:mi>q</m:mi>
                                                   <m:mn>1</m:mn>
                                                </m:msup>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>&#960;</m:mi>
                                                         <m:mo>^</m:mo>
                                                      </m:mover>
                                                      <m:mrow>
                                                         <m:mo>(</m:mo>
                                                         <m:mrow>
                                                            <m:msubsup>
                                                               <m:mi>q</m:mi>
                                                               <m:mi>i</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>d</m:mi>
                                                                  <m:mo>+</m:mo>
                                                                  <m:mn>1</m:mn>
                                                               </m:mrow>
                                                            </m:msubsup>
                                                            <m:mrow>
                                                               <m:mo>|</m:mo>
                                                               <m:mrow>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>d</m:mi>
                                                                  </m:msup>
                                                               </m:mrow>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mo>)</m:mo>
                                                      </m:mrow>
                                                      <m:mo>=</m:mo>
                                                      <m:mfrac>
                                                         <m:mrow>
                                                            <m:mstyle displaystyle="true">
                                                               <m:munderover>
                                                                  <m:mo>&#8721;</m:mo>
                                                                  <m:mrow>
                                                                     <m:mi>t</m:mi>
                                                                     <m:mo>=</m:mo>
                                                                     <m:mn>1</m:mn>
                                                                  </m:mrow>
                                                                  <m:mi>T</m:mi>
                                                               </m:munderover>
                                                               <m:mrow>
                                                                  <m:mi>&#967;</m:mi>
                                                                  <m:mrow>
                                                                     <m:mo>(</m:mo>
                                                                     <m:mrow>
                                                                        <m:mi>t</m:mi>
                                                                        <m:mo>,</m:mo>
                                                                        <m:msubsup>
                                                                           <m:mi>q</m:mi>
                                                                           <m:mi>i</m:mi>
                                                                           <m:mrow>
                                                                              <m:mi>d</m:mi>
                                                                              <m:mo>+</m:mo>
                                                                              <m:mn>1</m:mn>
                                                                           </m:mrow>
                                                                        </m:msubsup>
                                                                        <m:mo>,</m:mo>
                                                                        <m:msup>
                                                                           <m:mi>q</m:mi>
                                                                           <m:mi>d</m:mi>
                                                                        </m:msup>
                                                                     </m:mrow>
                                                                     <m:mo>)</m:mo>
                                                                  </m:mrow>
                                                               </m:mrow>
                                                            </m:mstyle>
                                                         </m:mrow>
                                                         <m:mrow>
                                                            <m:mstyle displaystyle="true">
                                                               <m:munderover>
                                                                  <m:mo>&#8721;</m:mo>
                                                                  <m:mrow>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mo>=</m:mo>
                                                                     <m:mn>1</m:mn>
                                                                  </m:mrow>
                                                                  <m:mrow>
                                                                     <m:mrow>
                                                                        <m:mo>|</m:mo>
                                                                        <m:mrow>
                                                                           <m:msup>
                                                                              <m:mi>q</m:mi>
                                                                              <m:mi>d</m:mi>
                                                                           </m:msup>
                                                                        </m:mrow>
                                                                        <m:mo>|</m:mo>
                                                                     </m:mrow>
                                                                  </m:mrow>
                                                               </m:munderover>
                                                               <m:mrow>
                                                                  <m:mstyle displaystyle="true">
                                                                     <m:munderover>
                                                                        <m:mo>&#8721;</m:mo>
                                                                        <m:mrow>
                                                                           <m:mi>t</m:mi>
                                                                           <m:mo>=</m:mo>
                                                                           <m:mn>1</m:mn>
                                                                        </m:mrow>
                                                                        <m:mi>T</m:mi>
                                                                     </m:munderover>
                                                                     <m:mrow>
                                                                        <m:mi>&#967;</m:mi>
                                                                        <m:mrow>
                                                                           <m:mo>(</m:mo>
                                                                           <m:mrow>
                                                                              <m:mi>t</m:mi>
                                                                              <m:mo>,</m:mo>
                                                                              <m:msubsup>
                                                                                 <m:mi>q</m:mi>
                                                                                 <m:mi>i</m:mi>
                                                                                 <m:mrow>
                                                                                    <m:mi>d</m:mi>
                                                                                    <m:mo>+</m:mo>
                                                                                    <m:mn>1</m:mn>
                                                                                 </m:mrow>
                                                                              </m:msubsup>
                                                                              <m:mo>,</m:mo>
                                                                              <m:msup>
                                                                                 <m:mi>q</m:mi>
                                                                                 <m:mi>d</m:mi>
                                                                              </m:msup>
                                                                           </m:mrow>
                                                                           <m:mo>)</m:mo>
                                                                        </m:mrow>
                                                                     </m:mrow>
                                                                  </m:mstyle>
                                                               </m:mrow>
                                                            </m:mstyle>
                                                         </m:mrow>
                                                      </m:mfrac>
                                                   </m:mrow>
                                                </m:mtd>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mrow>
                                                         <m:mo>(</m:mo>
                                                         <m:mrow>
                                                            <m:mn>1</m:mn>
                                                            <m:mo>&lt;</m:mo>
                                                            <m:mi>d</m:mi>
                                                            <m:mo>&lt;</m:mo>
                                                            <m:mi>D</m:mi>
                                                            <m:mo>&#8722;</m:mo>
                                                            <m:mn>1</m:mn>
                                                         </m:mrow>
                                                         <m:mo>)</m:mo>
                                                      </m:mrow>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:msubsup>
                                             <m:mover accent="true">
                                                <m:mi>a</m:mi>
                                                <m:mo>^</m:mo>
                                             </m:mover>
                                             <m:mrow>
                                                <m:mi>j</m:mi>
                                                <m:mi>k</m:mi>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:msup>
                                                   <m:mi>q</m:mi>
                                                   <m:mi>d</m:mi>
                                                </m:msup>
                                             </m:mrow>
                                          </m:msubsup>
                                          <m:mo>=</m:mo>
                                          <m:mfrac>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>t</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:mi>T</m:mi>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mi>&#958;</m:mi>
                                                      <m:mrow>
                                                         <m:mo>(</m:mo>
                                                         <m:mrow>
                                                            <m:mi>t</m:mi>
                                                            <m:mo>,</m:mo>
                                                            <m:msubsup>
                                                               <m:mi>q</m:mi>
                                                               <m:mi>i</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>d</m:mi>
                                                                  <m:mo>+</m:mo>
                                                                  <m:mn>1</m:mn>
                                                               </m:mrow>
                                                            </m:msubsup>
                                                            <m:mo>,</m:mo>
                                                            <m:msubsup>
                                                               <m:mi>q</m:mi>
                                                               <m:mi>j</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>d</m:mi>
                                                                  <m:mo>+</m:mo>
                                                                  <m:mn>1</m:mn>
                                                               </m:mrow>
                                                            </m:msubsup>
                                                            <m:mo>,</m:mo>
                                                            <m:msup>
                                                               <m:mi>q</m:mi>
                                                               <m:mi>d</m:mi>
                                                            </m:msup>
                                                         </m:mrow>
                                                         <m:mo>)</m:mo>
                                                      </m:mrow>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mstyle displaystyle="true">
                                                   <m:munderover>
                                                      <m:mo>&#8721;</m:mo>
                                                      <m:mrow>
                                                         <m:mi>k</m:mi>
                                                         <m:mo>=</m:mo>
                                                         <m:mn>1</m:mn>
                                                      </m:mrow>
                                                      <m:mrow>
                                                         <m:mrow>
                                                            <m:mo>|</m:mo>
                                                            <m:mrow>
                                                               <m:msup>
                                                                  <m:mi>q</m:mi>
                                                                  <m:mi>d</m:mi>
                                                               </m:msup>
                                                            </m:mrow>
                                                            <m:mo>|</m:mo>
                                                         </m:mrow>
                                                      </m:mrow>
                                                   </m:munderover>
                                                   <m:mrow>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munderover>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mo>=</m:mo>
                                                               <m:mn>1</m:mn>
                                                            </m:mrow>
                                                            <m:mi>T</m:mi>
                                                         </m:munderover>
                                                         <m:mrow>
                                                            <m:mi>&#958;</m:mi>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mi>t</m:mi>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>d</m:mi>
                                                                        <m:mo>+</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>k</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>d</m:mi>
                                                                        <m:mo>+</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>d</m:mi>
                                                                  </m:msup>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                   </m:mrow>
                                                </m:mstyle>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                                 <m:mtr columnalign="left">
                                    <m:mtd columnalign="left">
                                       <m:mrow>
                                          <m:mtable>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mover accent="true">
                                                         <m:mi>e</m:mi>
                                                         <m:mo>^</m:mo>
                                                      </m:mover>
                                                      <m:mrow>
                                                         <m:mo>(</m:mo>
                                                         <m:mrow>
                                                            <m:msub>
                                                               <m:mi>&#963;</m:mi>
                                                               <m:mi>l</m:mi>
                                                            </m:msub>
                                                            <m:mrow>
                                                               <m:mo>|</m:mo>
                                                               <m:mrow>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>D</m:mi>
                                                                  </m:msup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>D</m:mi>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msup>
                                                               </m:mrow>
                                                            </m:mrow>
                                                         </m:mrow>
                                                         <m:mo>)</m:mo>
                                                      </m:mrow>
                                                      <m:mo>=</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munder>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:msub>
                                                                  <m:mi>o</m:mi>
                                                                  <m:mi>t</m:mi>
                                                               </m:msub>
                                                               <m:mo>=</m:mo>
                                                               <m:msub>
                                                                  <m:mi>&#963;</m:mi>
                                                                  <m:mi>l</m:mi>
                                                               </m:msub>
                                                            </m:mrow>
                                                         </m:munder>
                                                         <m:mrow>
                                                            <m:mi>&#967;</m:mi>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mi>t</m:mi>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>D</m:mi>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>D</m:mi>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msup>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mo>+</m:mo>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munder>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mo>&gt;</m:mo>
                                                               <m:mn>1</m:mn>
                                                               <m:mo>,</m:mo>
                                                               <m:msub>
                                                                  <m:mi>o</m:mi>
                                                                  <m:mi>t</m:mi>
                                                               </m:msub>
                                                               <m:mo>=</m:mo>
                                                               <m:msub>
                                                                  <m:mi>&#963;</m:mi>
                                                                  <m:mi>l</m:mi>
                                                               </m:msub>
                                                            </m:mrow>
                                                         </m:munder>
                                                         <m:mrow>
                                                            <m:msub>
                                                               <m:mi>&#947;</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>i</m:mi>
                                                                  <m:mi>n</m:mi>
                                                               </m:mrow>
                                                            </m:msub>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mi>t</m:mi>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>D</m:mi>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>D</m:mi>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msup>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>/</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munderover>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mo>=</m:mo>
                                                               <m:mn>1</m:mn>
                                                            </m:mrow>
                                                            <m:mi>T</m:mi>
                                                         </m:munderover>
                                                         <m:mrow>
                                                            <m:mi>&#967;</m:mi>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mi>t</m:mi>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>D</m:mi>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>D</m:mi>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msup>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                             <m:mtr>
                                                <m:mtd>
                                                   <m:mrow>
                                                      <m:mo>+</m:mo>
                                                      <m:mstyle displaystyle="true">
                                                         <m:munderover>
                                                            <m:mo>&#8721;</m:mo>
                                                            <m:mrow>
                                                               <m:mi>t</m:mi>
                                                               <m:mo>=</m:mo>
                                                               <m:mn>2</m:mn>
                                                            </m:mrow>
                                                            <m:mi>T</m:mi>
                                                         </m:munderover>
                                                         <m:mrow>
                                                            <m:msub>
                                                               <m:mi>&#947;</m:mi>
                                                               <m:mrow>
                                                                  <m:mi>i</m:mi>
                                                                  <m:mi>n</m:mi>
                                                               </m:mrow>
                                                            </m:msub>
                                                            <m:mrow>
                                                               <m:mo>(</m:mo>
                                                               <m:mrow>
                                                                  <m:mi>t</m:mi>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msubsup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mi>i</m:mi>
                                                                     <m:mi>D</m:mi>
                                                                  </m:msubsup>
                                                                  <m:mo>,</m:mo>
                                                                  <m:msup>
                                                                     <m:mi>q</m:mi>
                                                                     <m:mrow>
                                                                        <m:mi>D</m:mi>
                                                                        <m:mo>&#8722;</m:mo>
                                                                        <m:mn>1</m:mn>
                                                                     </m:mrow>
                                                                  </m:msup>
                                                               </m:mrow>
                                                               <m:mo>)</m:mo>
                                                            </m:mrow>
                                                         </m:mrow>
                                                      </m:mstyle>
                                                      <m:mo stretchy="false">)</m:mo>
                                                   </m:mrow>
                                                </m:mtd>
                                             </m:mtr>
                                          </m:mtable>
                                       </m:mrow>
                                    </m:mtd>
                                 </m:mtr>
                              </m:mtable>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabqqaaaaabaGafqiWdaNbaKaadaqadaqaaiabdghaXnaaDaaaleaacqWGPbqAaeaacqaIYaGmaaGcdaabbaqaaiabdghaXnaaCaaaleqabaGaeGymaedaaaGccaGLhWoaaiaawIcacaGLPaaacqGH9aqpcqaHhpWydaqadaqaaiabdsha0jabcYcaSiabdghaXnaaDaaaleaacqWGPbqAaeaacqaIYaGmaaGccqGGSaalcqWGXbqCdaahaaWcbeqaaiabigdaXaaaaOGaayjkaiaawMcaaaqaauaabeqabiaaaeaacuaHapaCgaqcamaabmaabaGaemyCae3aa0baaSqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaakmaaeeaabaGaemyCae3aaWbaaSqabeaacqWGKbazaaaakiaawEa7aaGaayjkaiaawMcaaiabg2da9KqbaoaalaaabaWaaabCaeaacqaHhpWydaqadaqaaiabdsha0jabcYcaSiabdghaXnaaDaaabaGaemyAaKgabaGaemizaqMaey4kaSIaeGymaedaaiabcYcaSiabdghaXnaaCaaabeqaaiabdsgaKbaaaiaawIcacaGLPaaaaeaacqWG0baDcqGH9aqpcqaIXaqmaeaacqWGubavaiabggHiLdaabaWaaabCaeaadaaeWbqaaiabeE8aJnaabmaabaGaemiDaqNaeiilaWIaemyCae3aa0baaeaacqWGPbqAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGaeiilaWIaemyCae3aaWbaaeqabaGaemizaqgaaaGaayjkaiaawMcaaaqaaiabdsha0jabg2da9iabigdaXaqaaiabdsfaubGaeyyeIuoaaeaacqWGPbqAcqGH9aqpcqaIXaqmaeaadaabdaqaaiabdghaXnaaCaaabeqaaiabdsgaKbaaaiaawEa7caGLiWoaaiabggHiLdaaaaGcbaWaaeWaaeaacqaIXaqmcqGH8aapcqWGKbazcqGH8aapcqWGebarcqGHsislcqaIXaqmaiaawIcacaGLPaaaaaaabaGafmyyaeMbaKaadaqhaaWcbaGaemOAaOMaem4AaSgabaGaemyCae3aaWbaaWqabeaacqWGKbazaaaaaOGaeyypa0tcfa4aaSaaaeaadaaeWbqaaiabe67a4naabmaabaGaemiDaqNaeiilaWIaemyCae3aa0baaeaacqWGPbqAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGaeiilaWIaemyCae3aa0baaeaacqWGQbGAaeaacqWGKbazcqGHRaWkcqaIXaqmaaGaeiilaWIaemyCae3aaWbaaeqabaGaemizaqgaaaGaayjkaiaawMcaaaqaaiabdsha0jabg2da9iabigdaXaqaaiabdsfaubGaeyyeIuoaaeaadaaeWbqaamaaqahabaGaeqOVdG3aaeWaaeaacqWG0baDcqGGSaalcqWGXbqCdaqhaaqaaiabdMgaPbqaaiabdsgaKjabgUcaRiabigdaXaaacqGGSaalcqWGXbqCdaqhaaqaaiabdUgaRbqaaiabdsgaKjabgUcaRiabigdaXaaacqGGSaalcqWGXbqCdaahaaqabeaacqWGKbazaaaacaGLOaGaayzkaaaabaGaemiDaqNaeyypa0JaeGymaedabaGaemivaqfacqGHris5aaqaaiabdUgaRjabg2da9iabigdaXaqaamaaemaabaGaemyCae3aaWbaaeqabaGaemizaqgaaaGaay5bSlaawIa7aaGaeyyeIuoaaaaakeaafaqadeWabaaabaGafmyzauMbaKaadaqadaqaaiabeo8aZnaaBaaaleaacqWGSbaBaeqaaOWaaqqaaeaacqWGXbqCdaahaaWcbeqaaiabdseaebaakiabcYcaSiabdghaXnaaCaaaleqabaGaemiraqKaeyOeI0IaeGymaedaaaGccaGLhWoaaiaawIcacaGLPaaacqGH9aqpcqGGOaakdaaeqbqaaiabeE8aJnaabmaabaGaemiDaqNaeiilaWIaemyCae3aa0baaSqaaiabdMgaPbqaaiabdseaebaakiabcYcaSiabdghaXnaaCaaaleqabaGaemiraqKaeyOeI0IaeGymaedaaaGccaGLOaGaayzkaaaaleaacqWGVbWBdaWgaaadbaGaemiDaqhabeaaliabg2da9iabeo8aZnaaBaaameaacqWGSbaBaeqaaaWcbeqdcqGHris5aaGcbaGaey4kaSYaaabuaeaacqaHZoWzdaWgaaWcbaGaemyAaKMaemOBa4gabeaakmaabmaabaGaemiDaqNaeiilaWIaemyCae3aa0baaSqaaiabdMgaPbqaaiabdseaebaakiabcYcaSiabdghaXnaaCaaaleqabaGaemiraqKaeyOeI0IaeGymaedaaaGccaGLOaGaayzkaaaaleaacqWG0baDcqGH+aGpcqaIXaqmcqGGSaalcqWGVbWBdaWgaaadbaGaemiDaqhabeaaliabg2da9iabeo8aZnaaBaaameaacqWGSbaBaeqaaaWcbeqdcqGHris5aOGaeiykaKIaei4la8IaeiikaGYaaabCaeaacqaHhpWydaqadaqaaiabdsha0jabcYcaSiabdghaXnaaDaaaleaacqWGPbqAaeaacqWGebaraaGccqGGSaalcqWGXbqCdaahaaWcbeqaaiabdseaejabgkHiTiabigdaXaaaaOGaayjkaiaawMcaaaWcbaGaemiDaqNaeyypa0JaeGymaedabaGaemivaqfaniabggHiLdaakeaacqGHRaWkdaaeWbqaaiabeo7aNnaaBaaaleaacqWGPbqAcqWGUbGBaeqaaOWaaeWaaeaacqWG0baDcqGGSaalcqWGXbqCdaqhaaWcbaGaemyAaKgabaGaemiraqeaaOGaeiilaWIaemyCae3aaWbaaSqabeaacqWGebarcqGHsislcqaIXaqmaaaakiaawIcacaGLPaaaaSqaaiabdsha0jabg2da9iabikdaYaqaaiabdsfaubqdcqGHris5aOGaeiykaKcaaaaaaaa@62CD@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Testing</p>
               </st>
               <p>As described above, classification of test hairpins depends on the ratio of the log-likelihoods generated by the positive and negative models. A threshold was decided for this ratio using the ROC curves shown in Figure <figr fid="F4">4</figr>. For each hairpin, the probability that a certain model emitted the hairpin is given by:</p>
               <p>
                  <display-formula>
                     <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i18">
                        <m:semantics>
                           <m:mrow>
                              <m:mi>P</m:mi>
                              <m:mrow>
                                 <m:mo>(</m:mo>
                                 <m:mrow>
                                    <m:mi>O</m:mi>
                                    <m:mrow>
                                       <m:mo>|</m:mo>
                                       <m:mi>&#955;</m:mi>
                                    </m:mrow>
                                 </m:mrow>
                                 <m:mo>)</m:mo>
                              </m:mrow>
                              <m:mo>=</m:mo>
                              <m:mstyle displaystyle="true">
                                 <m:munderover>
                                    <m:mo>&#8721;</m:mo>
                                    <m:mrow>
                                       <m:mi>i</m:mi>
                                       <m:mo>=</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>|</m:mo>
                                          <m:mrow>
                                             <m:msup>
                                                <m:mi>q</m:mi>
                                                <m:mn>1</m:mn>
                                             </m:msup>
                                          </m:mrow>
                                          <m:mo>|</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                 </m:munderover>
                                 <m:mrow>
                                    <m:mi>&#945;</m:mi>
                                    <m:mrow>
                                       <m:mo>(</m:mo>
                                       <m:mrow>
                                          <m:mn>1</m:mn>
                                          <m:mo>,</m:mo>
                                          <m:mi>T</m:mi>
                                          <m:mo>,</m:mo>
                                          <m:msubsup>
                                             <m:mi>q</m:mi>
                                             <m:mi>i</m:mi>
                                             <m:mn>2</m:mn>
                                          </m:msubsup>
                                          <m:mo>,</m:mo>
                                          <m:msup>
                                             <m:mi>q</m:mi>
                                             <m:mn>1</m:mn>
                                          </m:msup>
                                       </m:mrow>
                                       <m:mo>)</m:mo>
                                    </m:mrow>
                                 </m:mrow>
                              </m:mstyle>
                           </m:mrow>
                           <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemiuaa1aaeWaaeaacqWGpbWtdaabbaqaaiabeU7aSbGaay5bSdaacaGLOaGaayzkaaGaeyypa0ZaaabCaeaacqaHXoqydaqadaqaaiabigdaXiabcYcaSiabdsfaujabcYcaSiabdghaXnaaDaaaleaacqWGPbqAaeaacqaIYaGmaaGccqGGSaalcqWGXbqCdaahaaWcbeqaaiabigdaXaaaaOGaayjkaiaawMcaaaWcbaGaemyAaKMaeyypa0JaeGymaedabaWaaqWaaeaacqWGXbqCdaahaaadbeqaaiabigdaXaaaaSGaay5bSlaawIa7aaqdcqGHris5aaaa@4DFD@</m:annotation>
                        </m:semantics>
                     </m:math>
                  </display-formula>
               </p>
            </sec>
            <sec>
               <st>
                  <p>Measures of accuracy</p>
               </st>
               <p>The different terms and measures used to calculate the efficiency of HHMMiR are listed in the Table <tblr tid="T6">6</tblr>.</p>
               <tbl id="T6">
                  <title>
                     <p>Table 6</p>
                  </title>
                  <caption>
                     <p>Measures for accuracy calculation.</p>
                  </caption>
                  <tblbdy cols="2">
                     <r>
                        <c ca="left">
                           <p>
                              <b>Measure</b>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <b>Calculation</b>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="2">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Sensitivity (Sn)</p>
                        </c>
                        <c ca="center">
                           <p>
                              <inline-formula>
                                 <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i19">
                                    <m:semantics>
                                       <m:mrow>
                                          <m:mi>S</m:mi>
                                          <m:mi>n</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mfrac bevelled="true">
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>P</m:mi>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>P</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mi>F</m:mi>
                                                <m:mi>N</m:mi>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                       <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemOBa4Maeyypa0ZaaSGaaeaacqWGubavcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGobGtaaaaaa@3751@</m:annotation>
                                    </m:semantics>
                                 </m:math>
                              </inline-formula>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Specificity (Sp)</p>
                        </c>
                        <c ca="center">
                           <p>
                              <inline-formula>
                                 <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i20">
                                    <m:semantics>
                                       <m:mrow>
                                          <m:mi>S</m:mi>
                                          <m:mi>p</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mfrac bevelled="true">
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>N</m:mi>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>N</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mi>F</m:mi>
                                                <m:mi>P</m:mi>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                       <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaem4uamLaemiCaaNaeyypa0ZaaSGaaeaacqWGubavcqWGobGtaeaacqWGubavcqWGobGtcqGHRaWkcqWGgbGrcqWGqbauaaaaaa@3751@</m:annotation>
                                    </m:semantics>
                                 </m:math>
                              </inline-formula>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>False Discovery Rate (FDR)</p>
                        </c>
                        <c ca="center">
                           <p>
                              <inline-formula>
                                 <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i21">
                                    <m:semantics>
                                       <m:mrow>
                                          <m:mi>F</m:mi>
                                          <m:mi>D</m:mi>
                                          <m:mi>R</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mfrac bevelled="true">
                                             <m:mrow>
                                                <m:mi>F</m:mi>
                                                <m:mi>P</m:mi>
                                             </m:mrow>
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>P</m:mi>
                                                <m:mo>+</m:mo>
                                                <m:mi>F</m:mi>
                                                <m:mi>P</m:mi>
                                             </m:mrow>
                                          </m:mfrac>
                                       </m:mrow>
                                       <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOrayKaemiraqKaemOuaiLaeyypa0ZaaSGaaeaacqWGgbGrcqWGqbauaeaacqWGubavcqWGqbaucqGHRaWkcqWGgbGrcqWGqbauaaaaaa@37F8@</m:annotation>
                                    </m:semantics>
                                 </m:math>
                              </inline-formula>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c ca="left">
                           <p>Matthew's Correlation Coefficient (MCC)</p>
                        </c>
                        <c ca="center">
                           <p>
                              <inline-formula>
                                 <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-10-S1-S35-i22">
                                    <m:semantics>
                                       <m:mrow>
                                          <m:mi>M</m:mi>
                                          <m:mi>C</m:mi>
                                          <m:mi>C</m:mi>
                                          <m:mo>=</m:mo>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:mi>T</m:mi>
                                                <m:mi>P</m:mi>
                                                <m:mo>&#8901;</m:mo>
                                                <m:mi>T</m:mi>
                                                <m:mi>N</m:mi>
                                                <m:mo>&#8722;</m:mo>
                                                <m:mi>F</m:mi>
                                                <m:mi>P</m:mi>
                                                <m:mo>&#8901;</m:mo>
                                                <m:mi>F</m:mi>
                                                <m:mi>N</m:mi>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                          <m:mo>/</m:mo>
                                          <m:mrow>
                                             <m:mo>(</m:mo>
                                             <m:mrow>
                                                <m:msqrt>
                                                   <m:mrow>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>T</m:mi>
                                                      <m:mi>P</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mi>F</m:mi>
                                                      <m:mi>P</m:mi>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>T</m:mi>
                                                      <m:mi>P</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mi>F</m:mi>
                                                      <m:mi>N</m:mi>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>T</m:mi>
                                                      <m:mi>N</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mi>F</m:mi>
                                                      <m:mi>P</m:mi>
                                                      <m:mo stretchy="false">)</m:mo>
                                                      <m:mo>&#8901;</m:mo>
                                                      <m:mo stretchy="false">(</m:mo>
                                                      <m:mi>T</m:mi>
                                                      <m:mi>N</m:mi>
                                                      <m:mo>+</m:mo>
                                                      <m:mi>F</m:mi>
                                                      <m:mi>N</m:mi>
                                                      <m:mo stretchy="false">)</m:mo>
                                                   </m:mrow>
                                                </m:msqrt>
                                             </m:mrow>
                                             <m:mo>)</m:mo>
                                          </m:mrow>
                                       </m:mrow>
                                       <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyta0Kaem4qamKaem4qamKaeyypa0ZaaeWaaeaacqWGubavcqWGqbaucqGHflY1cqWGubavcqWGobGtcqGHsislcqWGgbGrcqWGqbaucqGHflY1cqWGgbGrcqWGobGtaiaawIcacaGLPaaacqGGVaWldaqadaqaamaakaaabaGaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemiuaaLaeiykaKIaeyyXICTaeiikaGIaemivaqLaemiuaaLaey4kaSIaemOrayKaemOta4KaeiykaKIaeyyXICTaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemiuaaLaeiykaKIaeyyXICTaeiikaGIaemivaqLaemOta4Kaey4kaSIaemOrayKaemOta4KaeiykaKcaleqaaaGccaGLOaGaayzkaaaaaa@6660@</m:annotation>
                                    </m:semantics>
                                 </m:math>
                              </inline-formula>
                           </p>
                        </c>
                     </r>
                  </tblbdy>
                  <tblfn>
                     <p>TP: <it>True Positives</it>; TN: <it>True Negatives</it>; FP: <it>False Positives</it>; FN: <it>False Negatives</it>.</p>
                  </tblfn>
               </tbl>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations used</p>
         </st>
         <p>HHMM: hierarchical hidden Markov model; mfe: minimum fold energy; miRNA: microRNA; MLE: maximum likelihood estimate.</p>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>PVB and SK designed the study, analyzed the results and wrote the paper. SK implemented the HHMM. VH supervised the data analysis and contributed to the writing of the paper.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Paul Samollow, Chakra Chennubhotla, Eleanor Feingold and an anonymous reviewer for helpful suggestions. PVB was supported by NIH grant 1R01LM009657-01. This research was supported in part by the National Science Foundation through TeraGrid resources provided by Pittsburgh Supercomputing Center. Supplementary material can be found at the journal's web site and at our web site <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 10 Supplement 1, 2009: Proceedings of The Seventh Asia Pacific Bioinformatics Conference (APBC) 2009. The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/10?issue=S1</url></p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>MicroRNA genes are transcribed by RNA polymerase II</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yeom</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Baek</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>VN</fnm>
               </au>
            </aug>
            <source>Embo J</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <issue>20</issue>
            <fpage>4051</fpage>
            <lpage>4060</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">524334</pubid>
                  <pubid idtype="pmpid" link="fulltext">15372072</pubid>
                  <pubid idtype="doi">10.1038/sj.emboj.7600385</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Hagedorn</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Cullen</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>12</issue>
            <fpage>1957</fpage>
            <lpage>1966</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370684</pubid>
                  <pubid idtype="pmpid" link="fulltext">15525708</pubid>
                  <pubid idtype="doi">10.1261/rna.7135204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Exportin-5 mediates the nuclear export of pre-microRNAs and short hairpin RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Yi</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Qin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Macara</snm>
                  <fnm>IG</fnm>
               </au>
               <au>
                  <snm>Cullen</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>24</issue>
            <fpage>3011</fpage>
            <lpage>3016</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">305252</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681208</pubid>
                  <pubid idtype="doi">10.1101/gad.1158803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>TRBP recruits the Dicer complex to Ago2 for microRNA processing and gene silencing</p>
            </title>
            <aug>
               <au>
                  <snm>Chendrimada</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Gregory</snm>
                  <fnm>RI</fnm>
               </au>
               <au>
                  <snm>Kumaraswamy</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cooch</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Nishikura</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Shiekhattar</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>436</volume>
            <issue>7051</issue>
            <fpage>740</fpage>
            <lpage>744</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03868</pubid>
                  <pubid idtype="pmpid" link="fulltext">15973356</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The nuclear RNase III Drosha initiates microRNA processing</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ahn</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yim</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Provost</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Radmark</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nature</source>
            <pubdate>2003</pubdate>
            <volume>425</volume>
            <issue>6956</issue>
            <fpage>415</fpage>
            <lpage>419</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature01957</pubid>
                  <pubid idtype="pmpid" link="fulltext">14508493</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Asymmetry in the assembly of the RNAi enzyme complex</p>
            </title>
            <aug>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Hutvagner</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Aronin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zamore</snm>
                  <fnm>PD</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2003</pubdate>
            <volume>115</volume>
            <issue>2</issue>
            <fpage>199</fpage>
            <lpage>208</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(03)00759-1</pubid>
                  <pubid idtype="pmpid" link="fulltext">14567917</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Identification of tissue-specific microRNAs from mouse</p>
            </title>
            <aug>
               <au>
                  <snm>Lagos-Quintana</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rauhut</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Yalcin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lendeckel</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>9</issue>
            <fpage>735</fpage>
            <lpage>739</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(02)00809-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12007417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets</p>
            </title>
            <aug>
               <au>
                  <snm>Lewis</snm>
                  <fnm>BP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>120</volume>
            <issue>1</issue>
            <fpage>15</fpage>
            <lpage>20</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.12.035</pubid>
                  <pubid idtype="pmpid" link="fulltext">15652477</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Plant and animal microRNAs: similarities and differences</p>
            </title>
            <aug>
               <au>
                  <snm>Millar</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Waterhouse</snm>
                  <fnm>PM</fnm>
               </au>
            </aug>
            <source>Functional &amp; integrative genomics</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <issue>3</issue>
            <fpage>129</fpage>
            <lpage>135</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s10142-005-0145-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">15875226</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>MicroRNAs: expression, avoidance and subversion by vertebrate viruses</p>
            </title>
            <aug>
               <au>
                  <snm>Sarnow</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jopling</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Schutz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wehner</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Nature reviews</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>9</issue>
            <fpage>651</fpage>
            <lpage>659</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1473</pubid>
                  <pubid idtype="pmpid" link="fulltext">16912711</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Feinbaum</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Ambros</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>75</volume>
            <issue>5</issue>
            <fpage>843</fpage>
            <lpage>854</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90529-Y</pubid>
                  <pubid idtype="pmpid" link="fulltext">8252621</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Posttranscriptional regulation of the heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Wightman</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ha</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>75</volume>
            <issue>5</issue>
            <fpage>855</fpage>
            <lpage>862</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90530-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">8252622</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Reinhart</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Slack</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Basson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pasquinelli</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Bettinger</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Rougvie</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Horvitz</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <issue>6772</issue>
            <fpage>901</fpage>
            <lpage>906</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35002607</pubid>
                  <pubid idtype="pmpid" link="fulltext">10706289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Approaches to microRNA discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <issue>Suppl</issue>
            <fpage>S2</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1794</pubid>
                  <pubid idtype="pmpid" link="fulltext">16736019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Modulation of microRNA processing and expression through RNA editing by ADAR deaminases</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Chendrimada</snm>
                  <fnm>TP</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Higuchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Seeburg</snm>
                  <fnm>PH</fnm>
               </au>
               <au>
                  <snm>Shiekhattar</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Nishikura</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Nature structural &amp; molecular biology</source>
            <pubdate>2006</pubdate>
            <volume>13</volume>
            <issue>1</issue>
            <fpage>13</fpage>
            <lpage>21</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsmb1041</pubid>
                  <pubid idtype="pmpid" link="fulltext">16369484</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>HEN1 recognizes 21&#8211;24 nt small RNA duplexes and deposits a methyl group onto the 2' OH of the 3' terminal nucleotide</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Ebright</snm>
                  <fnm>YW</fnm>
               </au>
               <au>
                  <snm>Yu</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>2</issue>
            <fpage>667</fpage>
            <lpage>675</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1356533</pubid>
                  <pubid idtype="pmpid" link="fulltext">16449203</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj474</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification</p>
            </title>
            <aug>
               <au>
                  <snm>Ohler</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>9</issue>
            <fpage>1309</fpage>
            <lpage>1322</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370619</pubid>
                  <pubid idtype="pmpid" link="fulltext">15317971</pubid>
                  <pubid idtype="doi">10.1261/rna.5206304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Computational and experimental identification of C. elegans microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Grad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Reinhart</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2003</pubdate>
            <volume>11</volume>
            <issue>5</issue>
            <fpage>1253</fpage>
            <lpage>1263</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(03)00153-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12769849</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Computational identification of Drosophila microRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lai</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>7</issue>
            <fpage>R42</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193629</pubid>
                  <pubid idtype="pmpid" link="fulltext">12844358</pubid>
                  <pubid idtype="doi">10.1186/gb-2003-4-7-r42</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>The microRNAs of Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Abdelhakim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>8</issue>
            <fpage>991</fpage>
            <lpage>1008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12672692</pubid>
                  <pubid idtype="doi">10.1101/gad.1074403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Profile-based detection of microRNA precursors in animal genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Legendre</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lambert</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gautheret</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>841</fpage>
            <lpage>845</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti073</pubid>
                  <pubid idtype="pmpid" link="fulltext">15509608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Identification of clustered microRNAs using an ab initio prediction method</p>
            </title>
            <aug>
               <au>
                  <snm>Sewer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paul</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aravin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brownstein</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>van Nimwegen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>267</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1315341</pubid>
                  <pubid idtype="pmpid" link="fulltext">16274478</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-267</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Identification of hundreds of conserved and nonconserved human microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Bentwich</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Avniel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Karov</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aharonov</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gilad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barad</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Barzilai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Einat</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Einav</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Meiri</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2005</pubdate>
            <volume>37</volume>
            <issue>7</issue>
            <fpage>766</fpage>
            <lpage>770</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1590</pubid>
                  <pubid idtype="pmpid" link="fulltext">15965474</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Identification of microRNAs of the herpesvirus family</p>
            </title>
            <aug>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sewer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lagos-Quintana</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sheridan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Grasser</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>van Dyk</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Shuman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chien</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Methods</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>269</fpage>
            <lpage>276</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nmeth746</pubid>
                  <pubid idtype="pmpid" link="fulltext">15782219</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Xue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>310</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1360673</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381612</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-6-310</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity</p>
            </title>
            <aug>
               <au>
                  <snm>Terai</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Komori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Asai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kin</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Rna</source>
            <pubdate>2007</pubdate>
            <volume>13</volume>
            <issue>12</issue>
            <fpage>2081</fpage>
            <lpage>2090</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2080609</pubid>
                  <pubid idtype="pmpid" link="fulltext">17959929</pubid>
                  <pubid idtype="doi">10.1261/rna.655107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The Hierarchical Hidden Markov Model: Analysis and Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Fine</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Tishby</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Machine Learning</source>
            <pubdate>1998</pubdate>
            <volume>32</volume>
            <fpage>41</fpage>
            <lpage>62</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1023/A:1007469218079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>A Hierarchical HMM Implementation for Vertebrate Gene Splice Site Prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Hu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ingram</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sirski</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pal</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Swamy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patten</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Technical Report</source>
            <publisher>Department of Computer Science, University of Waterloo</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B29">
            <title>
               <p>BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Lin</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ray</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Sandve</snm>
                  <fnm>GK</fnm>
               </au>
               <au>
                  <snm>Uguroglu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Xing</snm>
                  <fnm>EP</fnm>
               </au>
            </aug>
            <source>Research in Computational Molecular Biology (RECOMB), 12th Annual International Conference: 2008</source>
            <publisher>Singapore: Springer</publisher>
            <pubdate>2008</pubdate>
            <fpage>66</fpage>
            <lpage>81</lpage>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Conserved microRNA characteristics in mammals</p>
            </title>
            <aug>
               <au>
                  <snm>Saetrom</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Snove</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Nedland</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Grunfeld</snm>
                  <fnm>TB</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Bass</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Canon</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Oligonucleotides</source>
            <pubdate>2006</pubdate>
            <volume>16</volume>
            <issue>2</issue>
            <fpage>115</fpage>
            <lpage>144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/oli.2006.16.115</pubid>
                  <pubid idtype="pmpid" link="fulltext">16764537</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Vienna RNA secondary structure server</p>
            </title>
            <aug>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3429</fpage>
            <lpage>3431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169005</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824340</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Human microRNA prediction through a probabilistic co-learning model of sequence and structure</p>
            </title>
            <aug>
               <au>
                  <snm>Nam</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>BT</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>11</issue>
            <fpage>3570</fpage>
            <lpage>3581</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1159118</pubid>
                  <pubid idtype="pmpid" link="fulltext">15987789</pubid>
                  <pubid idtype="doi">10.1093/nar/gki668</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The opossum genome: insights and opportunities from an alternative mammal</p>
            </title>
            <aug>
               <au>
                  <snm>Samollow</snm>
                  <fnm>PB</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2008</pubdate>
            <volume>18</volume>
            <issue>8</issue>
            <fpage>1199</fpage>
            <lpage>1215</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.065326.107</pubid>
                  <pubid idtype="pmpid" link="fulltext">18676819</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>miRBase: the microRNA sequence database</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2006</pubdate>
            <volume>342</volume>
            <fpage>129</fpage>
            <lpage>138</lpage>
            <xrefbib>
               <pubid idtype="pmpid">16957372</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The human genome browser at UCSC</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Pringle</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Zahler</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Haussler</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>6</issue>
            <fpage>996</fpage>
            <lpage>1006</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186604</pubid>
                  <pubid idtype="pmpid" link="fulltext">12045153</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>TeraGrid: Analysis of Organization, System Architecture, and Middleware Enabling New Types of Applications</p>
            </title>
            <aug>
               <au>
                  <snm>Catlett</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Allcock</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Andrews</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aydt</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bair</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Balac</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Banister</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Bartelt</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Beckman</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>High Performance Computing and Grids in Action</source>
            <publisher>Amsterdam: IOS Press</publisher>
            <editor>Grandinetti L</editor>
            <pubdate>2008</pubdate>
            <volume>16</volume>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Benos laboratory web server</p>
            </title>
            <url>http://www.benoslab.pitt.edu/</url>
         </bibl>
      </refgrp>
   </bm>
</art>