<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-S12-S2</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Using a kernel density estimation based classifier to predict species-specific microRNA precursors</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Chang</snm>
               <mnm>Tien-Hao</mnm>
               <fnm>Darby</fnm>
               <insr iid="I1"/>
               <email>darby@ee.ncku.edu.tw</email>
            </au>
            <au id="A2">
               <snm>Wang</snm>
               <fnm>Chih-Ching</fnm>
               <insr iid="I1"/>
               <email>n2695199@mail.ncku.edu.tw</email>
            </au>
            <au id="A3">
               <snm>Chen</snm>
               <fnm>Jian-Wei</fnm>
               <insr iid="I1"/>
               <email>n2696187@mail.ncku.edu.tw</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Electrical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan, R.O.C.</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <supplement>
            <title>
               <p>Seventh International Conference on Bioinformatics (InCoB2008)</p>
            </title>
            <editor>Shoba Ranganathan, Wen-Lian Hsu, Ueng-Cheng Yang and Tin Wee Tan</editor>
            <note>Proceedings</note>
         </supplement>
         <conference>
            <title>
               <p>Asia Pacific Bioinformatics Network (APBioNet) Seventh International Conference on Bioinformatics (InCoB2008)</p>
            </title>
            <location>Taipei, Taiwan</location>
            <date-range>20&#8211;23 October 2008</date-range>
            <url>http://incob.apbionet.org/incob08/</url>
         </conference>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>Suppl 12</issue>
         <fpage>S2</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/S12/S2</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19091019</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-S12-S2</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>12</day>
               <month>12</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Chang et al; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>MicroRNAs (miRNAs) are short non-coding RNA molecules participating in post-transcriptional regulation of gene expression. There have been many efforts to discover miRNA precursors (pre-miRNAs) over the years. Recently, <it>ab initio </it>approaches obtain more attention because that they can discover species-specific pre-miRNAs. Most <it>ab initio </it>approaches proposed novel features to characterize RNA molecules. However, there were fewer discussions on the associated classification mechanism in a miRNA predictor.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>This study focuses on the classification algorithm for miRNA prediction. We develop a novel <it>ab initio </it>method, miR-KDE, in which most of the features are collected from previous works. The classification mechanism in miR-KDE is the relaxed variable kernel density estimator (RVKDE) that we have recently proposed. When compared to the famous support vector machine (SVM), RVKDE exploits more local information of the training dataset. MiR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. The experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We use a novel classifier of which the characteristic of exploiting local information is particularly suitable to predict species-specific pre-miRNAs. This study also provides a comprehensive analysis from the view of classification mechanism. The good performance of miR-KDE encourages more efforts on the classification methodology as well as the feature extraction in miRNA prediction.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>MicroRNAs are short RNAs (~20&#8211;22 nt) that can regulate target genes by binding to the mRNAs for cleavage or translational repression <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. The discovery of miRNA shows that RNA is not only a carrier of gene information, but also a mediator of gene expression. The first studied miRNAs are <it>lin-4 </it>and <it>let-7</it>, which have been found during studies of genetic defects in early larval <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. To date, 6396 miRNAs have been identified <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The rapid growth results from the development of not only the experiment techniques but also the computational methods <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
         <p>One of the most extensively developed computational methods for miRNA detection is the comparative approach. The most straightforward method is to align unknown RNA sequences to known pre-miRNAs through NCBI BlastN <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Advanced comparative approaches to discover pre-miRNAs strongly rely on sequence similarity <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> or on sequence profiles <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. One drawback of homology search is the generation of many false positives (RNAs containing no mature miRNA predicted to be pre-miRNAs). Subsequently, cross-species evolutionary conservation has been widely used to eliminate these false positives <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Another well known method to identify novel pre-miRNAs is using conservation patterns based on a set of homology sequences <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>Comparative approaches heavily rely on sequence similarity to known pre-miRNAs, and suffer lower sensitivity in detecting novel pre-miRNAs without known homology pre-miRNAs <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. To overcome this problem, many <it>ab initio </it>algorithms, requiring no sequence or structure alignment, have recently been developed to detect complete new pre-miRNAs for which no close homology are known <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Brameier and Wiuf <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> proposed a motif-based <it>ab initio </it>method, miRPred, yielded 90% sensitivity and 99.1% specificity for human miRNAs. These <it>ab initio </it>methods are suitable to predict species-specific and non-conserved pre-miRNAs, which occupy the majority of undiscovered pre-miRNAs <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Other methods improved the miRNA prediction by first predicting some miRNA-related motifs such as the conserved 7-mers in 3'-UTRs <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and Drosha processing sites <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>.</p>
         <p>Among these <it>ab initio </it>methods, Sewer <it>et al</it>. <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> used base pair frequencies and quantifying certain pre-miRNA structure elements as the characteristic features and detected 71% of pre-miRNAs with a low false positive rate of ~3% for virus. Triplet-SVM <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> used the frequencies of structure-sequence triplets as the characteristic features and yielded an overall accuracy of 90.9% for 11 species. BayesMiRfind <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> used sequence and structure features with comparative post-filtering and delivered >80% sensitivity and >90% specificity for <it>C. elegans </it>and <it>Mouse</it>. RNAmicro <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> introduced the thermodynamic properties with multiple sequence alignment and yielded >90% sensitivity and >99% specificity for <it>C. elegans </it>and <it>C. briggsae</it>. MiPred <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> used dinucleotide frequencies, six folding measures and five normalized folding quantities as the characteristic features and yielded an overall accuracy of 95.6% for 40 species.</p>
         <p>With the development of <it>ab initio </it>approaches, the characteristic features for describing RNA molecules have been extensively studied in recent years. However, there were fewer discussions on the associated classification mechanism. Most <it>ab initio </it>approaches proposed novel characteristic features, but adopted an off-the-shelf machine learning tool. Furthermore, most of them incorporated with the same classifier, support vector machine (SVM), because of its prevailing success in diverse bioinformatics problems <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>.</p>
         <p>In this study, we focus on the classification methodology for pre-miRNAs prediction. A novel <it>ab initio </it>method, miR-KDE, for identifying pre-miRNAs from other hairpin sequences with similar stem-loop features (we call them pseudo hairpins) is developed. The feature set comprises several sequence and structure characteristics collected from previous works. We incorporate the relaxed variable kernel density estimator (RVKDE) <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> to classify RNA sequences based on the feature set. RVKDE is an instance-based classifier that exploits more local information from the dataset than SVM. An analysis based on the decision boundary of classifiers is conducted in this study to elaborate this characteristic of RVKDE. The performance of miR-KDE is evaluated using a training set consisted of only human pre-miRNAs to predict a benchmark collected from 40 species. Experimental results show that miR-KDE delivers favorable performance in predicting human pre-miRNAs and has advantages for pre-miRNAs from the genera taxonomically distant to humans.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Experimental results on human pre-miRNAs</p>
            </st>
            <p>The performances of triplet-SVM, miPred and the present miR-KDE in predicting human pre-miRNAs are shown in Table <tblr tid="T1">1</tblr>. The %SE, %SP, %ACC, %Fm and %MCC of miR-KDE of five-fold cross-validation on the HU400 dataset are 90.5%, 97.5%, 94.0%, 93.8% and 88.2%, respectively. Most of the five measures are superior to triplet-SVM and miPred, except that miPred delivers a higher %SP. The comparison based on HU400 must be taken carefully, of course, because the parameters of alternative predictors are determined to maximize the performance for this dataset. Next, the three predictors are evaluated using HU400 to predict the HU216 dataset. The %SE, %SP, %ACC, %Fm and %MCC of miR-KDE are 88.9%, 92.6%, 90.7%, 90.6% and 81.5%. These results demonstrate the good performance of miR-KDE in identifying human pre-miRNAs from pseudo hairpins.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Performances of triplet-SVM, miPred and miR-KDE in predicting human pre-miRNAs.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SE</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%ACC</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%Fm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%MCC</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Five-fold cross-validation on HU400</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>triplet-SVM</p>
                     </c>
                     <c ca="left">
                        <p>86.5%</p>
                     </c>
                     <c ca="left">
                        <p>91.5%</p>
                     </c>
                     <c ca="left">
                        <p>89.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.7%</p>
                     </c>
                     <c ca="left">
                        <p>78.1%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>87.5%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>98.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>92.8%</p>
                     </c>
                     <c ca="left">
                        <p>92.3%</p>
                     </c>
                     <c ca="left">
                        <p>86.0%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.5%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>97.5%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.8%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>88.2%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Using HU400 to predict HU216</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>triplet-SVM</p>
                     </c>
                     <c ca="left">
                        <p>83.3%</p>
                     </c>
                     <c ca="left">
                        <p>86.1%</p>
                     </c>
                     <c ca="left">
                        <p>84.7%</p>
                     </c>
                     <c ca="left">
                        <p>84.5%</p>
                     </c>
                     <c ca="left">
                        <p>69.5%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>75.9%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>88.9%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.6%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.6%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>81.5%</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The best performance among each dataset is highlighted in bold.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Experimental results on non-human pre-miRNAs</p>
            </st>
            <p>Table <tblr tid="T2">2</tblr> extends the evaluation to the NH3350 dataset, which includes 1675 non-human pre-miRNAs from 39 species and 1675 human pseudo hairpins. The %SE, %SP, %ACC, %Fm and %MCC of miR-KDE are 95.8%, 93.5%, 94.7%, 94.7% and 89.3%. Most of these results are superior to triplet-SVM and miPred except that miPred delivers a higher %SE. We thus provide a sensitivity of miR-KDE under the condition of having the same specificity as miPred in the last row of Table <tblr tid="T2">2</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Performances of triplet-SVM, miPred and miR-KDE in predicting non-human pre-miRNAs.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SE</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%ACC</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%Fm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%MCC</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>triplet-SVM</p>
                     </c>
                     <c ca="left">
                        <p>91.5%</p>
                     </c>
                     <c ca="left">
                        <p>88.7%</p>
                     </c>
                     <c ca="left">
                        <p>90.1%</p>
                     </c>
                     <c ca="left">
                        <p>90.2%</p>
                     </c>
                     <c ca="left">
                        <p>80.2%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>96.7%</p>
                     </c>
                     <c ca="left">
                        <p>90.4%</p>
                     </c>
                     <c ca="left">
                        <p>93.6%</p>
                     </c>
                     <c ca="left">
                        <p>93.7%</p>
                     </c>
                     <c ca="left">
                        <p>87.3%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>95.8%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.5%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>89.3%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>with miPred's %SP</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>97.4%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>90.4%</p>
                     </c>
                     <c ca="left">
                        <p>93.9%</p>
                     </c>
                     <c ca="left">
                        <p>94.1%</p>
                     </c>
                     <c ca="left">
                        <p>88.1%</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The best performance among each dataset is highlighted in bold.</p>
               </tblfn>
            </tbl>
            <p>A further analysis is conducted to compare miPred and miR-KDE because of their comparable performance in Table <tblr tid="T2">2</tblr>. Table <tblr tid="T3">3</tblr> shows the performance of miPred and miR-KDE for the NH3350 dataset in terms of genus. This experiment divides the NH3350 dataset into five sub-datasets based on genus, where each sub-dataset contains equal number of pre-miRNAs and pseudo hairpins. The 1675 pseudo hairpins are randomly assigned to each sub-dataset without replacement. Table <tblr tid="T4">4</tblr> shows the size of these sub-datasets.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Performances of miPred and miR-KDE for the NH3350 dataset in terms of genus.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SE</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%SP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%ACC</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%Fm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%MCC</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Vertebrata</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>95.3%</p>
                     </c>
                     <c ca="left">
                        <p>88.8%</p>
                     </c>
                     <c ca="left">
                        <p>92.1%</p>
                     </c>
                     <c ca="left">
                        <p>92.3%</p>
                     </c>
                     <c ca="left">
                        <p>84.3%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>93.4%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.8%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.1%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.2%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>86.3%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="2">
                        <p>with miPred's %SP</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>96.1%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>88.8%</p>
                     </c>
                     <c ca="left">
                        <p>92.5%</p>
                     </c>
                     <c ca="left">
                        <p>92.7%</p>
                     </c>
                     <c ca="left">
                        <p>85.2%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Arthropoda</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>98.8%</p>
                     </c>
                     <c ca="left">
                        <p>89.0%</p>
                     </c>
                     <c ca="left">
                        <p>93.9%</p>
                     </c>
                     <c ca="left">
                        <p>94.2%</p>
                     </c>
                     <c ca="left">
                        <p>88.2%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>100.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>96.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>96.2%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.3%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Viridiplantae</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>98.2%</p>
                     </c>
                     <c ca="left">
                        <p>93.6%</p>
                     </c>
                     <c ca="left">
                        <p>95.9%</p>
                     </c>
                     <c ca="left">
                        <p>96.0%</p>
                     </c>
                     <c ca="left">
                        <p>91.9%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>98.4%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>95.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>96.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>96.8%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.4%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Nematoda</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>97.2%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>90.4%</p>
                     </c>
                     <c ca="left">
                        <p>93.8%</p>
                     </c>
                     <c ca="left">
                        <p>94.0%</p>
                     </c>
                     <c ca="left">
                        <p>87.8%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>97.2%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.9%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>95.0%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>89.9%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Viruses</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>97.2%</p>
                     </c>
                     <c ca="left">
                        <p>93.1%</p>
                     </c>
                     <c ca="left">
                        <p>95.1%</p>
                     </c>
                     <c ca="left">
                        <p>95.2%</p>
                     </c>
                     <c ca="left">
                        <p>90.4%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>94.4%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>97.2%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>95.8%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>95.8%</p>
                     </c>
                     <c ca="left">
                        <p>91.7%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="2">
                        <p>with miPred's %SP</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>98.6%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>93.1%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>95.8%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>95.9%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>91.8%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="6">
                        <p>Overall</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miPred</p>
                     </c>
                     <c ca="left">
                        <p>97.3% &#177; 1.3%</p>
                     </c>
                     <c ca="left">
                        <p>91.0% &#177; 2.3%</p>
                     </c>
                     <c ca="left">
                        <p>94.1% &#177; 1.5%</p>
                     </c>
                     <c ca="left">
                        <p>94.3% &#177; 1.4%</p>
                     </c>
                     <c ca="left">
                        <p>88.5% &#177; 2.9%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>miR-KDE</p>
                     </c>
                     <c ca="left">
                        <p>96.7% &#177; 2.7%</p>
                     </c>
                     <c ca="left">
                        <p><b>93.9% </b>&#177; 2.1%</p>
                     </c>
                     <c ca="left">
                        <p><b>95.3% </b>&#177; 1.4%</p>
                     </c>
                     <c ca="left">
                        <p><b>95.4% </b>&#177; 1.4%</p>
                     </c>
                     <c ca="left">
                        <p><b>90.7% </b>&#177; 2.8%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="2">
                        <p>with miPred's %SP</p>
                     </c>
                     <c ca="left">
                        <p><b>98.1% </b>&#177; 1.5%</p>
                     </c>
                     <c ca="left">
                        <p>92.3% &#177; 2.2%</p>
                     </c>
                     <c ca="left">
                        <p>95.2% &#177; 1.6%</p>
                     </c>
                     <c ca="left">
                        <p>95.3% &#177; 1.6%</p>
                     </c>
                     <c ca="left">
                        <p>90.5% &#177; 3.3%</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The best performance among each dataset is highlighted in bold.</p>
               </tblfn>
            </tbl>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Summary of sub-datasets derived from the NH3350 dataset.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Genus</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Number of pre-miRNAs</b>
                           <sup>1</sup>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Number of pseudo hairpins</b>
                           <sup>2</sup>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Vertebrata</p>
                     </c>
                     <c ca="left">
                        <p>824</p>
                     </c>
                     <c ca="left">
                        <p>824</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Arthropoda</p>
                     </c>
                     <c ca="left">
                        <p>163</p>
                     </c>
                     <c ca="left">
                        <p>163</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Viridiplantae</p>
                     </c>
                     <c ca="left">
                        <p>439</p>
                     </c>
                     <c ca="left">
                        <p>439</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nematoda</p>
                     </c>
                     <c ca="left">
                        <p>177</p>
                     </c>
                     <c ca="left">
                        <p>177</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Viruses</p>
                     </c>
                     <c ca="left">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="left">
                        <p>1675</p>
                     </c>
                     <c ca="left">
                        <p>1675</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>1</sup>Each sub-dataset contains pre-miRNAs from the corresponding genus. <sup>2</sup>All sub-datasets contain pseudo hairpins collected from human genome.</p>
               </tblfn>
            </tbl>
            <p>In this experiment, miR-KDE yields superior performance to miPred in terms of %SP, %ACC, %Fm and %MCC for all the genera. With respect to the %SE, miR-KDE performs better in <it>Arthropoda</it>, <it>Viridiplantae </it>and <it>Nematoda</it>, but worse in <it>Vertebrata </it>and <it>Viruses </it>than miPred. This is particularly of interest since <it>Vertebrata </it>is the closest genus taxonomically to humans, while <it>Viruses </it>is the most distant genus taxonomically to humans, among the five genera. One reasonable explanation is that viruses lack miRNA processing proteins such as Drosha, Dicer and RISC <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Viral miRNAs utilize such processing proteins from their hosts to regulate viral expression after infecting <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Thus, viral-encoded pre-miRNAs are likely to have very similar characteristics to those pre-miRNAs from the host (<it>i.e.</it>, human). As a result, the good performance of using human pre-miRNAs to predict <it>Arthropoda</it>, <it>Viridiplantae </it>and <it>Nematoda </it>ones indicates that miR-KDE is suitable for detecting species-specific pre-miRNAs.</p>
         </sec>
         <sec>
            <st>
               <p>Contribution of the classification mechanism</p>
            </st>
            <p>We next investigate the effect of using RVKDE by separating two differences of miR-KDE to miPred: 1) introducing the four stem-loop features and 2) using RVKDE instead of SVM. Table <tblr tid="T5">5</tblr> shows the performance of four possible predictors by individually enabling/disabling the two differences. The best %SE, %SP, %ACC, %Fm and %MCC in Table <tblr tid="T5">5</tblr> are achieved by predictors with the four stem-loop features, regardless of the classification mechanism and the testing set. This observation indicates that the four stem-loop features are helpful in identifying pre-miRNAs. In another respect, SVM delivers better %SE, while RVKDE delivers better %SP, regardless of the feature set and the testing set. With respect to the three overall measures, RVKDE performs almost identically to SVM for the HU216 dataset, and has some advantages for the NH3350 dataset. This reveals that the advantage of miR-KDE for specific-species miRNA prediction in Table <tblr tid="T3">3</tblr> benefits mainly from the classification mechanism.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Comparison of miPred and miR-KDE in terms of the feature set and the classification mechanism.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left" cspan="5">
                        <p>
                           <b>Without the four stem-loop features</b>
                           <sup>1</sup>
                        </p>
                     </c>
                     <c ca="left" cspan="5">
                        <p>
                           <b>With the four stem-loop features</b>
                           <sup>2</sup>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>%SE</p>
                     </c>
                     <c ca="left">
                        <p>%SP</p>
                     </c>
                     <c ca="left">
                        <p>%ACC</p>
                     </c>
                     <c ca="left">
                        <p>%Fm</p>
                     </c>
                     <c ca="left">
                        <p>%MCC</p>
                     </c>
                     <c ca="left">
                        <p>%SE</p>
                     </c>
                     <c ca="left">
                        <p>%SP</p>
                     </c>
                     <c ca="left">
                        <p>%ACC</p>
                     </c>
                     <c ca="left">
                        <p>%Fm</p>
                     </c>
                     <c ca="left">
                        <p>%MCC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="11">
                        <p>HU216<sup>3</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>SVM</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>75.9%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>90.7%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>81.5%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>RVKDE</p>
                     </c>
                     <c ca="left">
                        <p>85.2%</p>
                     </c>
                     <c ca="left">
                        <p>90.7%</p>
                     </c>
                     <c ca="left">
                        <p>88.0%</p>
                     </c>
                     <c ca="left">
                        <p>87.6%</p>
                     </c>
                     <c ca="left">
                        <p>76.0%</p>
                     </c>
                     <c ca="left">
                        <p>88.9%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>92.6%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>90.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>90.6%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>81.5%</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" cspan="11">
                        <p>NH3350<sup>4</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>SVM</p>
                     </c>
                     <c ca="left">
                        <p>96.7%</p>
                     </c>
                     <c ca="left">
                        <p>90.4%</p>
                     </c>
                     <c ca="left">
                        <p>93.6%</p>
                     </c>
                     <c ca="left">
                        <p>93.7%</p>
                     </c>
                     <c ca="left">
                        <p>87.3%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>97.3%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>91.3%</p>
                     </c>
                     <c ca="left">
                        <p>94.3%</p>
                     </c>
                     <c ca="left">
                        <p>94.4%</p>
                     </c>
                     <c ca="left">
                        <p>88.7%</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left" indent="1">
                        <p>RVKDE</p>
                     </c>
                     <c ca="left">
                        <p>94.8%</p>
                     </c>
                     <c ca="left">
                        <p>93.4%</p>
                     </c>
                     <c ca="left">
                        <p>94.1%</p>
                     </c>
                     <c ca="left">
                        <p>94.1%</p>
                     </c>
                     <c ca="left">
                        <p>88.2%</p>
                     </c>
                     <c ca="left">
                        <p>95.8%</p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>93.5%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>94.7%</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>89.3%</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The best performance among each dataset is highlighted with bold font. <sup>1</sup>Using the 29 features in miPred. <sup>2</sup>Using the 33 features in miR-KDE, <it>i.e.</it>, the 29 features derived from miPred and the four stem-loop features. <sup>3</sup>Using the HU400 dataset to predict the HU216 dataset. <sup>4</sup>Using the HU400 dataset to predict the NH3350 dataset.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Decision boundaries of SVM and RVKDE</p>
            </st>
            <p>To explain the characteristic of RVKDE in miRNA prediction, four cases are selected to demonstrate its difference to SVM from the view of decision boundary. For the four selected testing samples, miPred and miR-KDE make different predictions. In this analysis, miR-KDE adopts only 29 features derived from miPred to exclude the effect by introducing the four stem-loop features. Figure <figr fid="F1">1</figr> shows a testing pre-miRNA, <it>Caenorhabditis elegans </it>miR-260, and the training samples from HU400 on the decision boundary plots. The black circle represents the testing sample, red circles represent the training pre-miRNAs and blue circles represent the training pseudo hairpins. The background color indicates the predictor's decision. The details of generating the decision boundary plots can be found in the 'Materials and methods' section.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>
                     <b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE</b>
                  </p>
               </caption>
               <text>
                  <p><b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. </b>The <it>x</it>-axis is frequency of the dinucleotide "UU", and the <it>y</it>-axis is base pairing propensity<b/><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. The black circle is a testing pre-miRNA for the pre-miRNA <it>Caenorhabditis elegans </it>miR-260. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.</p>
               </text>
               <graphic file="1471-2105-9-S12-S2-1"/>
            </fig>
            <p>In Figure <figr fid="F1">1(a)</figr> and <figr fid="F1">1(b)</figr>, most the training samples locate at the top-left part in the plane. In this region, both SVM and RVKDE conclude that samples with larger <it>y</it>-axis tend to be pre-miRNAs and samples with smaller <it>y</it>-axis tend to be pseudo hairpins. The main inconsistence between the two classifiers occurs in the region including fewer training samples. Figure <figr fid="F1">1(c)</figr> and <figr fid="F1">1(d)</figr> hide the training samples that are not used to construct the decision boundary. Namely, Figure <figr fid="F1">1(c)</figr> shows only the support vectors, and Figure <figr fid="F1">1(d)</figr> shows only the <it>kt </it>nearest training samples to the testing sample (see the 'Materials and methods' section for details). In this example, RVKDE exploits more local information and generates an irregular decision boundary.</p>
            <p>Figure <figr fid="F2">2</figr>, Figure <figr fid="F3">3</figr> and Figure <figr fid="F4">4</figr> show other three testing cases classified differently by miPred and miR-KDE. Figure <figr fid="F2">2</figr> shows a pseudo hairpin classified incorrectly by miPred and correctly by miR-KDE. Figure <figr fid="F3">3</figr> shows a pre-miRNA, <it>Zea mays </it>miR168a, classified correctly by miPred but incorrectly by miR-KDE. Finally, Figure <figr fid="F4">4</figr> shows a pseudo hairpin correctly classified by miPred but incorrectly by miR-KDE. All these figures have a common characteristic: the testing sample usually locates at the region with fewer training samples. In other words, to use global or local information is less crucial for samples that are very close to existing samples. SVM is suitable for datasets with a good consistency among samples. For example, SVM performs well when using HU400 to predict HU216 in Table <tblr tid="T5">5</tblr>, because both datasets are extracted from the same species. RVKDE is suitable for datasets in which information is stored in local region, <it>i.e.</it>, to construct a global model for all the samples is not applicable. This echoes that RVKDE has some advantages when using human pre-miRNAs to predict pre-miRNAs from the genera taxonomically distant to humans.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>
                     <b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE</b>
                  </p>
               </caption>
               <text>
                  <p><b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. </b>The <it>x</it>-axis is frequency of the dinucleotide "CC", and the <it>y</it>-axis is frequency of the dinucleotide "GG". The black circle is a testing pseudo hairpin. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.</p>
               </text>
               <graphic file="1471-2105-9-S12-S2-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>
                     <b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE</b>
                  </p>
               </caption>
               <text>
                  <p><b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. </b>The <it>x</it>-axis is frequency of the dinucleotide "CG", and the <it>y</it>-axis is ratio of the minimum free energy to the sequence length<b/><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The black circle is a testing pre-miRNA for the pre-miRNA <it>Zea mays </it>miR168a. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.</p>
               </text>
               <graphic file="1471-2105-9-S12-S2-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>
                     <b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE</b>
                  </p>
               </caption>
               <text>
                  <p><b>The decision boundary plots, where (a) and (c) are generated by SVM and (b) and (d) are generated by RVKDE. </b>The <it>x</it>-axis is frequency of the dinucleotide "GG", and the <it>y</it>-axis is ratio of the minimum free energy to the sequence length<b/><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. The black circle is a testing pseudo hairpin. The red and blue circles represent positive and negative training samples. In (c) and (d), training samples not involved in the decision function of the classifiers are removed.</p>
               </text>
               <graphic file="1471-2105-9-S12-S2-4"/>
            </fig>
            <p>In summary, SVM and RVKDE are two distinct classification mechanisms. SVM uses support vectors to model the global information of training samples and to prevent being misguided by a few noisy samples. RVKDE is instance-based and highly dependent on the local information of training samples. The variable variance of each kernel function (see the 'Materials and methods' section for details) makes RVKDE deliver better performance than conventional instance-based classifiers and achieve the same level of performance as SVM <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>There have been many efforts on discovering pre-miRNAs over the years. Recently, several <it>ab initio </it>approaches are especially of interest, because of the ability to discover species-specific pre-miRNAs that usually evaded by comparative approaches. This study develops a novel <it>ab initio </it>miRNA predictor by focusing on the classification mechanism. The adopted RVKDE exploits more local information from the training samples than widely used SVM. Experimental results show that the characteristic of exploiting more local information makes miR-KDE more suitable for species-specific miRNA prediction. The decision boundary analysis shows that alternative machine learning algorithms feature different advantages. These results encourage more efforts on the classification methodology as well as the feature extraction in miRNA prediction.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Datasets</p>
            </st>
            <p>4039 miRNA precursors spanning across 45 species are downloaded from the miRBase registry database <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> (release 8.2). The CD-HIT clustering algorithm <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> with the similarity threshold set to 0.9 is then invoked to exclude homology sequences <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B28">28</abbr></abbrgrp>. Pre-miRNAs whose secondary structures contain multiple loops are excluded. The resultant positive set contains 1983 non-redundant pre-miRNAs from 40 species, including 308 human pre-miRNAs.</p>
            <p>For the negative set, we analyze 8494 pseudo hairpins from the protein-coding regions (CDSs) according to RefSeq <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and UCSC refGene <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> annotations. These RNA sequences are extracted from genomic regions where no experimentally validated splicing event has been reported <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. For each of the 8494 RNA sequences, we first predict its secondary structure by RNAfold <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. RNA sequences with &lt;18 base pairs on the stem, minimum free energy > -25 kcal/mol and multiple loops of the predicted secondary structure are removed. In summary, 3988 pseudo hairpins are collected. These pseudo hairpins are sequence segments similar to genuine pre-miRNAs in terms of length, stem-loop structure, and number of bulges but not have been reported as pre-miRNAs.</p>
            <p>Based on the positive and negative sets, one training set and two test sets are built to evaluate the miRNA predictors. The training set, HU400, comprises 200 human pre-miRNAs and 200 pseudo hairpins randomly selected from the positive and negative sets, respectively. The HU400 dataset is used for parameter estimation and model construction of the miRNA predictors. The first test set, HU216, comprises the remaining 108 human pre-miRNAs and randomly selected 108 pseudo hairpins. The HU216 dataset is used to evaluate the prediction performance for human pre-miRNAs. Another test set, NH3350, comprises the remaining 1675 non-human pre-miRNAs and randomly selected 1675 pseudo hairpins. The NH3350 dataset is used to evaluate the prediction performance for species-specific pre-miRNAs. Table <tblr tid="T6">6</tblr> shows a summary of these sets. Care has been taken to guarantee that no pseudo hairpin is included in the three datasets more than once.</p>
            <tbl id="T6">
               <title>
                  <p>Table 6</p>
               </title>
               <caption>
                  <p>Summary of the datasets employed in this study.</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Dataset</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Number of pre-miRNAs</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Number of pseudo hairpins</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Source of pre-miRNAs</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HU400</p>
                     </c>
                     <c ca="left">
                        <p>200</p>
                     </c>
                     <c ca="left">
                        <p>200</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Homo sapiens</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HU216</p>
                     </c>
                     <c ca="left">
                        <p>108</p>
                     </c>
                     <c ca="left">
                        <p>108</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Homo sapiens</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>HU3350</p>
                     </c>
                     <c ca="left">
                        <p>1675</p>
                     </c>
                     <c ca="left">
                        <p>1675</p>
                     </c>
                     <c ca="left">
                        <p>39 non-human species</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Feature set</p>
            </st>
            <p>In miR-KDE, each hairpin-like sequence is summarized as a 33-dimensional feature vector. The first 29 features are derived from miPred <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, including 17 sequence composition variables, 6 folding measures, 1 topological descriptor, and 5 normalized variants. The 17 sequence composition variables comprises of 16 dinucleotide frequencies and the proportion of G and C in the RNA molecule. Other features including base pairing propensity <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, Minimum Free Energy (MFE) and its variants <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr><abbr bid="B47">47</abbr></abbrgrp>, base pair distance <abbrgrp><abbr bid="B46">46</abbr><abbr bid="B48">48</abbr></abbrgrp>, Shannon entropy <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and degree of compactness <abbrgrp><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr></abbrgrp> have been shown useful in miRNA prediction.</p>
            <p>In addition, we introduce four additional features that focus on the continuously paired nucleotides on the stem and the loop length of hairpin structures. The four "stem-loop" features are based on the RNA secondary structures predicted with the RNAfold program <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Figure <figr fid="F5">5</figr> shows an example of the predicted RNA secondary structure in which each nucleotide has two states, "paired" or "unpaired", indicated by brackets and dots, respectively. A left bracket "(" indicates a paired nucleotide located at the 5' strand that would form a pair with another nucleotide at the 3' strand with a right bracket ")". As shown in Figure <figr fid="F5">5</figr>, the first stem-loop feature is "hairpin length" defined as the number of nucleotides from the first paired nucleotide at the 5' strand to its partner, the last paired nucleotide at the 3' strand. The second stem-loop feature is "loop length" defined as the number of nucleotides between the last paired nucleotide at the 5' strand and its partner, the first paired nucleotide at the 3' strand. The third stem-loop feature is "consecutive base-pairs" defined as the number of longest successive base-pairs. The fourth stem-loop feature is the ratio of loop length to hairpin length.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>The <it>Homo sapiens </it>miR-611 stem-loop structure</p>
               </caption>
               <text>
                  <p><b>The <it>Homo sapiens </it>miR-611 stem-loop structure</b>. The RNA sequence and its corresponding secondary structure sequence predicted by RNAfold <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> are shown. In the secondary structure sequence, each nucleotide has two states, "paired" or "unpaired", indicated by brackets and dots, respectively. A left bracket "(" indicates a paired nucleotide located at the 5' strand that would form a pair with another nucleotide at the 3' strand with a right bracket ")". The hairpin length of this sample pre-miRNA is 25+8+25 = 58. Its loop length is 8 and has 8 consecutive base pairs.</p>
               </text>
               <graphic file="1471-2105-9-S12-S2-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Relaxed variable kernel density estimator</p>
            </st>
            <p>MiR-KDE transforms samples into feature vectors as described above and then uses them to construct a relaxed variable kernel density estimator (RVKDE) <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. A kernel density estimator is in fact an approximate probability density function. Let {<b>s</b><sub>1</sub>, <b>s</b><sub>2 </sub>...<b>s</b><sub><it>n</it></sub>} be a set of sampling instances randomly and independently taken from the distribution governed by <it>f</it><sub><it>X </it></sub>in the <it>m</it>-dimensional vector space. Then, with the RVKDE algorithm, the value of <it>f</it><sub><it>X </it></sub>at point <b>v </b>is estimated as follows:</p>
            <p><inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i1"><m:semantics><m:mrow><m:mover accent="true"><m:mi>f</m:mi><m:mo>^</m:mo></m:mover><m:mo stretchy="false">(</m:mo><m:mstyle mathsize="normal" mathvariant="bold"><m:mi>v</m:mi></m:mstyle><m:mo stretchy="false">)</m:mo><m:mo>=</m:mo><m:mfrac><m:mn>1</m:mn><m:mrow><m:mo>|</m:mo><m:mi>n</m:mi><m:mo>|</m:mo></m:mrow></m:mfrac><m:mstyle displaystyle="true"><m:munder><m:mo>&#8721;</m:mo><m:mrow><m:msub><m:mstyle mathsize="normal" mathvariant="bold"><m:mi>s</m:mi></m:mstyle><m:mi>i</m:mi></m:msub></m:mrow></m:munder><m:mrow><m:msup><m:mrow><m:mrow><m:mo>(</m:mo><m:mrow><m:mfrac><m:mn>1</m:mn><m:mrow><m:msqrt><m:mrow><m:mn>2</m:mn><m:mi>&#960;</m:mi></m:mrow></m:msqrt><m:mo>&#8901;</m:mo><m:msub><m:mi>&#963;</m:mi><m:mi>i</m:mi></m:msub></m:mrow></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow><m:mi>m</m:mi></m:msup><m:mi>exp</m:mi><m:mo>&#8289;</m:mo><m:mrow><m:mo>(</m:mo><m:mrow><m:mo>&#8722;</m:mo><m:mfrac><m:mrow><m:mo>|</m:mo><m:mo>|</m:mo><m:mstyle mathsize="normal" mathvariant="bold"><m:mi>v</m:mi></m:mstyle><m:mo>&#8722;</m:mo><m:msub><m:mstyle mathsize="normal" mathvariant="bold"><m:mi>s</m:mi></m:mstyle><m:mi>i</m:mi></m:msub><m:mo>|</m:mo><m:msup><m:mo>|</m:mo><m:mn>2</m:mn></m:msup></m:mrow><m:mrow><m:mn>2</m:mn><m:msubsup><m:mi>&#963;</m:mi><m:mi>i</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow></m:mfrac></m:mrow><m:mo>)</m:mo></m:mrow></m:mrow></m:mstyle><m:mo>,</m:mo></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmOzayMbaKaacqGGOaakcqWH2bGDcqGGPaqkcqGH9aqpjuaGdaWcaaqaaiabigdaXaqaaiabcYha8jabd6gaUjabcYha8baakmaaqafabaWaaeWaaKqbagaadaWcaaqaaiabigdaXaqaamaakaaabaGaeGOmaiJaeqiWdahabeaacqGHflY1cqaHdpWCdaWgaaqaaiabdMgaPbqabaaaaaGccaGLOaGaayzkaaWaaWbaaSqabeaacqWGTbqBaaGccyGGLbqzcqGG4baEcqGGWbaCdaqadaqaaiabgkHiTKqbaoaalaaabaGaeiiFaWNaeiiFaWNaeCODayNaeyOeI0IaeC4Cam3aaSbaaeaacqWGPbqAaeqaaiabcYha8jabcYha8naaCaaabeqaaiabikdaYaaaaeaacqaIYaGmcqaHdpWCdaqhaaqaaiabdMgaPbqaaiabikdaYaaaaaaakiaawIcacaGLPaaaaSqaaiabhohaZnaaBaaameaacqWGPbqAaeqaaaWcbeqdcqGHris5aOGaeiilaWcaaa@63AB@</m:annotation></m:semantics></m:math></inline-formula>, where</p>
            <p>1) <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i2"><m:semantics><m:mrow><m:msub><m:mi>&#963;</m:mi><m:mi>i</m:mi></m:msub><m:mo>=</m:mo><m:mi>&#946;</m:mi><m:mfrac><m:mrow><m:mi>R</m:mi><m:mo stretchy="false">(</m:mo><m:msub><m:mstyle mathsize="normal" mathvariant="bold"><m:mi>s</m:mi></m:mstyle><m:mi>i</m:mi></m:msub><m:mo stretchy="false">)</m:mo><m:msqrt><m:mi>&#960;</m:mi></m:msqrt></m:mrow><m:mrow><m:mroot><m:mrow><m:mo stretchy="false">(</m:mo><m:mi>k</m:mi><m:mo>+</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo><m:mi>&#915;</m:mi><m:mo stretchy="false">(</m:mo><m:mstyle scriptlevel="+1"><m:mfrac><m:mi>m</m:mi><m:mn>2</m:mn></m:mfrac></m:mstyle><m:mo>+</m:mo><m:mn>1</m:mn><m:mo stretchy="false">)</m:mo></m:mrow><m:mi>m</m:mi></m:mroot></m:mrow></m:mfrac></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aaSbaaSqaaiabdMgaPbqabaGccqGH9aqpcqaHYoGyjuaGdaWcaaqaaiabdkfasjabcIcaOiabhohaZnaaBaaabaGaemyAaKgabeaacqGGPaqkdaGcaaqaaiabec8aWbqabaaabaWaaOqaaeaacqGGOaakcqWGRbWAcqGHRaWkcqaIXaqmcqGGPaqkcqqHtoWrcqGGOaakdaWcbaqaaiabd2gaTbqaaiabikdaYaaacqGHRaWkcqaIXaqmcqGGPaqkaeaacqWGTbqBaaaaaaaa@47B1@</m:annotation></m:semantics></m:math></inline-formula>;</p>
            <p>2) <it>R</it>(<b>s</b><sub><it>i</it></sub>) is the maximum distance between <b>s</b><sub><b>i </b></sub>and its <it>ks </it>nearest training instances;</p>
            <p>3) &#915; (&#183;) is the Gamma function <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>;</p>
            <p>4) <it>&#946; </it>and <it>ks </it>are parameters to be set either through cross-validation or by the user.</p>
            <p>For prediction of pre-miRNAs, two kernel density estimators are constructed to approximate the distribution of pre-miRNAs and pseudo hairpins in training set, respectively. As mentioned above, in our implementation, each RNA sequence is represented as a 33-dimensional feature vector. Then, a query instance located at <b>v </b>is predicted to the class that gives the maximum value among the likelihood functions defined as follows:</p>
            <p>
               <display-formula>
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i3">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>L</m:mi>
                              <m:mi>j</m:mi>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mstyle mathsize="normal" mathvariant="bold">
                              <m:mi>v</m:mi>
                           </m:mstyle>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo>|</m:mo>
                                 <m:msub>
                                    <m:mi>S</m:mi>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:mo>&#8901;</m:mo>
                                 <m:msub>
                                    <m:mover accent="true">
                                       <m:mi>f</m:mi>
                                       <m:mo>^</m:mo>
                                    </m:mover>
                                    <m:mi>j</m:mi>
                                 </m:msub>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mstyle mathsize="normal" mathvariant="bold">
                                    <m:mi>v</m:mi>
                                 </m:mstyle>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mstyle displaystyle="true">
                                    <m:munder>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>h</m:mi>
                                    </m:munder>
                                    <m:mrow>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>S</m:mi>
                                          <m:mi>h</m:mi>
                                       </m:msub>
                                       <m:mo>|</m:mo>
                                       <m:mo>&#8901;</m:mo>
                                       <m:msub>
                                          <m:mover accent="true">
                                             <m:mi>f</m:mi>
                                             <m:mo>^</m:mo>
                                          </m:mover>
                                          <m:mi>h</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mstyle mathsize="normal" mathvariant="bold">
                                          <m:mi>v</m:mi>
                                       </m:mstyle>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemitaW0aaSbaaSqaaiabdQgaQbqabaGccqGGOaakcqWH2bGDcqGGPaqkcqGH9aqpjuaGdaWcaaqaaiabcYha8jabdofatnaaBaaabaGaemOAaOgabeaacqGG8baFcqGHflY1cuWGMbGzgaqcamaaBaaabaGaemOAaOgabeaacqGGOaakcqWH2bGDcqGGPaqkaeaadaaeqbqaaiabcYha8jabdofatnaaBaaabaGaemiAaGgabeaacqGG8baFcqGHflY1cuWGMbGzgaqcamaaBaaabaGaemiAaGgabeaacqGGOaakcqWH2bGDcqGGPaqkaeaacqWGObaAaeqacqGHris5aaaakiabcYcaSaaa@53F6@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where |<it>S</it><sub><it>j</it></sub>| is the number of class-<it>j </it>training instances, and <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i4"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>f</m:mi><m:mo>^</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmOzayMbaKaadaWgaaWcbaGaemOAaOgabeaaaaa@2EC3@</m:annotation></m:semantics></m:math></inline-formula>(<b>v</b>) is the kernel density estimator corresponding to class-<it>j </it>training instances. In our current implementation, in order to improve the efficiency of the predictor, we include only a limited number, denoted by <it>kt</it>, of the nearest class-<it>j </it>training instances of <b>v </b>while computing <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i4"><m:semantics><m:mrow><m:msub><m:mover accent="true"><m:mi>f</m:mi><m:mo>^</m:mo></m:mover><m:mi>j</m:mi></m:msub></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmOzayMbaKaadaWgaaWcbaGaemOAaOgabeaaaaa@2EC3@</m:annotation></m:semantics></m:math></inline-formula>(<b>v</b>). <it>kt </it>is also a parameter to be set either through cross-validation or by the user.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison between RVKDE and SVM</p>
            </st>
            <p>This subsection reveals some characteristics of RVKDE by comparing it to SVM. RVKDE belongs to the radial basis function network (RBFN), a special type of neural networks with several distinctive features <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. The decision function of two-class RVKDE can be simplified as follows:</p>
            <p>
               <display-formula id="M1">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i5">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mrow>
                                 <m:mtext>RVKDE</m:mtext>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mstyle mathsize="normal" mathvariant="bold">
                              <m:mi>v</m:mi>
                           </m:mstyle>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mstyle mathsize="normal" mathvariant="bold">
                                          <m:mi>s</m:mi>
                                       </m:mstyle>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>y</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mfrac>
                                    <m:mn>1</m:mn>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>&#963;</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mfrac>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>exp</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mrow>
                                    <m:mo>(</m:mo>
                                    <m:mrow>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:mo>|</m:mo>
                                             <m:mo>|</m:mo>
                                             <m:mstyle mathsize="normal" mathvariant="bold">
                                                <m:mi>v</m:mi>
                                             </m:mstyle>
                                             <m:mo>&#8722;</m:mo>
                                             <m:msub>
                                                <m:mstyle mathsize="normal" mathvariant="bold">
                                                   <m:mi>s</m:mi>
                                                </m:mstyle>
                                                <m:mi>i</m:mi>
                                             </m:msub>
                                             <m:mo>|</m:mo>
                                             <m:msup>
                                                <m:mo>|</m:mo>
                                                <m:mn>2</m:mn>
                                             </m:msup>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mn>2</m:mn>
                                             <m:msubsup>
                                                <m:mi>&#963;</m:mi>
                                                <m:mi>i</m:mi>
                                                <m:mn>2</m:mn>
                                             </m:msubsup>
                                          </m:mrow>
                                       </m:mfrac>
                                    </m:mrow>
                                    <m:mo>)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabbkfasjabbAfawjabbUealjabbseaejabbweafbqabaGccqGGOaakcqWH2bGDcqGGPaqkcqGH9aqpdaaeqbqaaiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeyyXICDcfa4aaSaaaeaacqaIXaqmaeaacqaHdpWCdaWgaaqaaiabdMgaPbqabaaaaOGaeyyXICTagiyzauMaeiiEaGNaeiiCaa3aaeWaaeaacqGHsisljuaGdaWcaaqaaiabcYha8jabcYha8jabhAha2jabgkHiTiabhohaZnaaBaaabaGaemyAaKgabeaacqGG8baFcqGG8baFdaahaaqabeaacqaIYaGmaaaabaGaeGOmaiJaeq4Wdm3aa0baaeaacqWGPbqAaeaacqaIYaGmaaaaaaGccaGLOaGaayzkaaaaleaacqWHZbWCdaWgaaadbaGaemyAaKgabeaaaSqab0GaeyyeIuoakiabcYcaSaaa@62E3@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <b>v </b>is a testing sample. <it>y</it><sub><it>i </it></sub>is the class value as either +1 (positive) or -1 (negative) of a training sample <b>s</b><sub><it>i</it></sub>. <it>&#963;</it><sub><it>i </it></sub>is the local density of the proximity of <b>s</b><sub><it>i</it></sub>, estimated by the kernel density estimation algorithm. The testing sample <b>v </b>is classified as positive if <it>f</it><sub>RVKDE</sub>(<b>v</b>) &#8805; 0, and as negative otherwise. Interestingly, the decision function in Eq. (1) is very similar to the one in SVM using the radial basis function (RBF) kernel:</p>
            <p>
               <display-formula id="M2">
                  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i6">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>f</m:mi>
                              <m:mrow>
                                 <m:mtext>SVM</m:mtext>
                              </m:mrow>
                           </m:msub>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mstyle mathsize="normal" mathvariant="bold">
                              <m:mi>v</m:mi>
                           </m:mstyle>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:msub>
                                       <m:mstyle mathsize="normal" mathvariant="bold">
                                          <m:mi>s</m:mi>
                                       </m:mstyle>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:msub>
                                    <m:mi>y</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>&#8901;</m:mo>
                                 <m:msub>
                                    <m:mi>&#945;</m:mi>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>&#8901;</m:mo>
                                 <m:mi>exp</m:mi>
                                 <m:mo>&#8289;</m:mo>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#947;</m:mi>
                                 <m:mo>|</m:mo>
                                 <m:mo>|</m:mo>
                                 <m:mstyle mathsize="normal" mathvariant="bold">
                                    <m:mi>v</m:mi>
                                 </m:mstyle>
                                 <m:mo>&#8722;</m:mo>
                                 <m:msub>
                                    <m:mstyle mathsize="normal" mathvariant="bold">
                                       <m:mi>s</m:mi>
                                    </m:mstyle>
                                    <m:mi>i</m:mi>
                                 </m:msub>
                                 <m:mo>|</m:mo>
                                 <m:msup>
                                    <m:mo>|</m:mo>
                                    <m:mn>2</m:mn>
                                 </m:msup>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>,</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOzay2aaSbaaSqaaiabbofatjabbAfawjabb2eanbqabaGccqGGOaakcqWH2bGDcqGGPaqkcqGH9aqpdaaeqbqaaiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeyyXICTaeqySde2aaSbaaSqaaiabdMgaPbqabaGccqGHflY1cyGGLbqzcqGG4baEcqGGWbaCcqGGOaakcqGHsislcqaHZoWzcqGG8baFcqGG8baFcqWH2bGDcqGHsislcqWHZbWCdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabcYha8naaCaaaleqabaGaeGOmaidaaOGaeiykaKcaleaacqWHZbWCdaWgaaadbaGaemyAaKgabeaaaSqab0GaeyyeIuoakiabcYcaSaaa@5B50@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#945;</it><sub><it>i </it></sub>(corresponds to <inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i7"><m:semantics><m:mrow><m:msubsup><m:mi>&#963;</m:mi><m:mi>i</m:mi><m:mrow><m:mo>&#8722;</m:mo><m:mn>1</m:mn></m:mrow></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabdMgaPbqaaiabgkHiTiabigdaXaaaaaa@30FD@</m:annotation></m:semantics></m:math></inline-formula> in Eq. (1)) is determined by a constrained quadratic optimization <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> and <it>&#947;</it>(corresponds to 1/2<inline-formula><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" name="1471-2105-9-S12-S2-i8"><m:semantics><m:mrow><m:msubsup><m:mi>&#963;</m:mi><m:mi>i</m:mi><m:mn>2</m:mn></m:msubsup></m:mrow><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aa0baaSqaaiabdMgaPbqaaiabikdaYaaaaaa@3012@</m:annotation></m:semantics></m:math></inline-formula> in Eq. (1)) is a user-specified parameter. According to Eq. (1) and (2), the mathematical models of RVKDE and SVM are analogous. The main difference between RVKDE and SVM is the criteria to determine <it>&#963;</it><sub><it>i </it></sub>in Eq. (1) and <it>&#945;</it><sub><it>i </it></sub>in Eq. (2).</p>
            <p>SVM uses support vectors to construct a special kind of linear model, maximum margin hyperplane, that separates the samples of different classes <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The <it>&#945;</it><sub><it>i </it></sub>in SVM is determined based on the global distribution of samples by maximizing the separation between the classes. Conversely, RVKDE uses only few samples (&lt;10 in this study) in the proximity of a training instance and thus determines <it>&#963;</it><sub><it>i </it></sub>based on local information. As the decision boundary plots reported in the 'Decision boundaries of SVM and RVKDE' subsection of this study, the effects of using global/local information are crucial in predicting pre-miRNAs.</p>
         </sec>
         <sec>
            <st>
               <p>Experiment design</p>
            </st>
            <p>The proposed miR-KDE is evaluated by three experiments: 1) a five-fold cross-validation on the human pre-miRNA set HU400, 2) using the model trained by the first experiment to predict another human pre-miRNA set HU216 and 3) using the model trained by the first experiment to predict the non-human pre-miRNA set NH3350. Two SVM-based predictors, triplet-SVM and miPred, are included in these experiments for comparison. Parameters of alternative predictors are selected to maximize the accuracy of the first experiment. Five widely used indices for binary classification problems are introduced to evaluate the classifiers. Table <tblr tid="T7">7</tblr> lists these performance measures.</p>
            <tbl id="T7">
               <title>
                  <p>Table 7</p>
               </title>
               <caption>
                  <p>Evaluation measures employed in this study.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Measure</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Abbreviation</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Equation</b>
                           <sup>1</sup>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Sensitivity (recall)</p>
                     </c>
                     <c ca="left">
                        <p>%SE</p>
                     </c>
                     <c ca="left">
                        <p>TP/(TP+FN)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Specificity</p>
                     </c>
                     <c ca="left">
                        <p>%SP</p>
                     </c>
                     <c ca="left">
                        <p>TN/(TN+FP)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Accuracy</p>
                     </c>
                     <c ca="left">
                        <p>%ACC</p>
                     </c>
                     <c ca="left">
                        <p>(TP+TN)/(TP+TN+FP+FN)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>F-measure</p>
                     </c>
                     <c ca="left">
                        <p>%Fm</p>
                     </c>
                     <c ca="left">
                        <p>2TP/(2TP+FP+FN)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Matthews' correlation coefficient</p>
                     </c>
                     <c ca="left">
                        <p>%MCC</p>
                     </c>
                     <c ca="left">
                        <p>(TP &#215; TN-FP &#215; FN)/sqrt((TP+FP) &#215; (TN+FN) &#215; (TP+FN) &#215; (TN+FP))</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>1</sup>The definition of the abbreviations used: TP is the number of real pre-miRNAs detected; FN is the number of real pre-miRNAs missed; TN is the number of pseudo hairpins correctly classified; and FN is the number of pseudo hairpins incorrectly classified as pre-miRNA.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Decision boundary plot</p>
            </st>
            <p>Before constructing a two-dimensional decision boundary plot, two features must be selected from the 29 features as the <it>x</it>-axis and <it>y</it>-axis. In this study, we want to identify the two features having most influence on the classification decision of the testing sample. A heuristic method is used to estimate the influence of each feature on the classification decision. According to Eq. (1) and Eq. (2), the classification decision is largely influenced by the nearest training samples to the testing sample, since the influence of a Gaussian function decreases exponentially as the distance increases. Furthermore, the distance ||<b>v </b>- <b>s</b><sub><it>i</it></sub>|| in Eq (1) and Eq. (2) is more influenced by the dimensions with larger difference. Thus, the influence of a feature on the classification is estimated by the average of the differences of the testing sample to its <it>kt </it>nearest training samples (<it>kt </it>= 37 in this study). For each testing sample selected to generate a decision boundary plot, we estimate the influences of all 29 features. The feature with the most influence is selected as the <it>x</it>-axis, and the feature with the second most influence is selected as the <it>y</it>-axis.</p>
            <p>In the decision boundary plots of this study, the black circle represents the testing sample, red circles represent the training pre-miRNAs and blue circles represent the training pseudo hairpins. The background color indicates the predictor's decision for a sample of which the two features equal to the <it>x</it>-axis and <it>y</it>-axis and the remaining 27 features equal to the testing sample. The boundary between red and blue background is the decision boundary of the classifier on the <it>xy</it>-plane. Notice that a blue circle over a red background, or vice versa, does not indicate that the predictor misclassifies that training sample. The training samples are projected onto this plane and have the remaining 27 features different to the samples represented by the background. Namely, these decision boundary plots show a slice near the testing sample of the vector space.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The authors declare that they have no competing interests.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>Author DTHC participated in the development of RVKDE and conceived of this study. Both CCW and JWC designed the experiments and performed all calculations and analyses. All authors have read and approved this manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank the National Science Council of the Republic of China, Taiwan, for financially supporting this research under Contract Nos NSC 97-2627-P-001-002, NSC 96-2320-B-006-027-MY2 and NSC 96-2221-E-006-232-MY2.</p>
            <p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 9 Supplement 12, 2008: Asia Pacific Bioinformatics Network (APBioNet) Seventh International Conference on Bioinformatics (InCoB2008). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/9?issue=S12</url>.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>MicroRNAs: Genomics, biogenesis, mechanism, and function</p>
            </title>
            <aug>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>116</volume>
            <issue>2</issue>
            <fpage>281</fpage>
            <lpage>297</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0092-8674(04)00045-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">14744438</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The functions of animal microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Ambros</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2004</pubdate>
            <volume>431</volume>
            <issue>7006</issue>
            <fpage>350</fpage>
            <lpage>355</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature02871</pubid>
                  <pubid idtype="pmpid" link="fulltext">15372042</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>MicroRNAs: Small RNAs with a big role in gene regulation</p>
            </title>
            <aug>
               <au>
                  <snm>He</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hannon</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>8</issue>
            <fpage>522</fpage>
            <lpage>531</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1379</pubid>
                  <pubid idtype="pmpid" link="fulltext">15211354</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The C-Elegans Heterochronic Gene Lin-4 Encodes Small Rnas with Antisense Complementarity to Lin-14</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Feinbaum</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Ambros</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1993</pubdate>
            <volume>75</volume>
            <issue>5</issue>
            <fpage>843</fpage>
            <lpage>854</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(93)90529-Y</pubid>
                  <pubid idtype="pmpid" link="fulltext">8252621</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The 21-nucleotide let-7 RNA regulates developmental timing in Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Reinhart</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Slack</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Basson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pasquinelli</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Bettinger</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Rougvie</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Horvitz</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <issue>6772</issue>
            <fpage>901</fpage>
            <lpage>906</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35002607</pubid>
                  <pubid idtype="pmpid" link="fulltext">10706289</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>miRBase: tools for microRNA genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Saini</snm>
                  <fnm>HK</fnm>
               </au>
               <au>
                  <snm>van Dongen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2008</pubdate>
            <volume>36</volume>
            <fpage>D154</fpage>
            <lpage>D158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238936</pubid>
                  <pubid idtype="pmpid" link="fulltext">17991681</pubid>
                  <pubid idtype="doi">10.1093/nar/gkm952</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Approaches to microRNA discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RHA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>S2</fpage>
            <lpage>S7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1794</pubid>
                  <pubid idtype="pmpid" link="fulltext">16736019</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>BLAST: at the core of a powerful and diverse set of sequence analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>McGinnis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W20</fpage>
            <lpage>W25</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441573</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215342</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh435</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>New human and mouse microRNA genes found by homology search</p>
            </title>
            <aug>
               <au>
                  <snm>Weber</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>FEBS J</source>
            <pubdate>2005</pubdate>
            <volume>272</volume>
            <issue>1</issue>
            <fpage>59</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1432-1033.2004.04389.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">15634332</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Profile-based detection of microRNA precursors in animal genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Legendre</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lambert</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gautheret</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>841</fpage>
            <lpage>845</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti073</pubid>
                  <pubid idtype="pmpid" link="fulltext">15509608</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Vertebrate MicroRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <issue>5612</issue>
            <fpage>1540</fpage>
            <lpage>1540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1080372</pubid>
                  <pubid idtype="pmpid" link="fulltext">12624257</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The microRNAs of Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Abdelhakim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Genes &amp; Development</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>8</issue>
            <fpage>991</fpage>
            <lpage>1008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12672692</pubid>
                  <pubid idtype="doi">10.1101/gad.1074403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Computational identification of Drosophila microRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lai</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>7</issue>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193629</pubid>
                  <pubid idtype="pmpid" link="fulltext">12844358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Computational and experimental identification of C-elegans microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Grad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Reinhart</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2003</pubdate>
            <volume>11</volume>
            <issue>5</issue>
            <fpage>1253</fpage>
            <lpage>1263</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1097-2765(03)00153-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12769849</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Computational identification of plant MicroRNAs and their targets, including a stress-induced miRNA</p>
            </title>
            <aug>
               <au>
                  <snm>Jones-Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Mol Cell</source>
            <pubdate>2004</pubdate>
            <volume>14</volume>
            <issue>6</issue>
            <fpage>787</fpage>
            <lpage>799</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.molcel.2004.05.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">15200956</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Detection of 91 potential in plant conserved plant microRNAs in Arabidopsis thaliana and Oryza sativa identifies important target genes</p>
            </title>
            <aug>
               <au>
                  <snm>Bonnet</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Wuyts</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Peer</snm>
                  <mnm>Van de</mnm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>31</issue>
            <fpage>11511</fpage>
            <lpage>11516</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">509231</pubid>
                  <pubid idtype="pmpid" link="fulltext">15272084</pubid>
                  <pubid idtype="doi">10.1073/pnas.0404025101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Computational prediction of miRNAs in Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Adai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mlotshwa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Archer-Evans</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Manocha</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Vance</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Sundaresan</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <issue>1</issue>
            <fpage>78</fpage>
            <lpage>91</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">540280</pubid>
                  <pubid idtype="pmpid" link="fulltext">15632092</pubid>
                  <pubid idtype="doi">10.1101/gr.2908205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Identification of hundreds of conserved and nonconserved human microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Bentwich</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Avniel</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Karov</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aharonov</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Gilad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barad</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Barzilai</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Einat</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Einav</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Meiri</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2005</pubdate>
            <volume>37</volume>
            <issue>7</issue>
            <fpage>766</fpage>
            <lpage>770</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1590</pubid>
                  <pubid idtype="pmpid" link="fulltext">15965474</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>MicroRNA identification based on sequence and structure alignment</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>XW</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>XG</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>18</issue>
            <fpage>3610</fpage>
            <lpage>3614</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti562</pubid>
                  <pubid idtype="pmpid" link="fulltext">15994192</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Patterns of flanking sequence conservation and a characteristic upstream motif for microRNA gene identification</p>
            </title>
            <aug>
               <au>
                  <snm>Ohler</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
            </aug>
            <source>Rna-a Publication of the Rna Society</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>9</issue>
            <fpage>1309</fpage>
            <lpage>1322</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Clustering and conservation patterns of human microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Altuvia</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lithwick</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Elefant</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aravin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brownstein</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>8</issue>
            <fpage>2697</fpage>
            <lpage>2706</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1110742</pubid>
                  <pubid idtype="pmpid" link="fulltext">15891114</pubid>
                  <pubid idtype="doi">10.1093/nar/gki567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Phylogenetic shadowing and computational identification of human microRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Guryev</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Belt</snm>
                  <mnm>van de</mnm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wienholds</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RHA</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2005</pubdate>
            <volume>120</volume>
            <issue>1</issue>
            <fpage>21</fpage>
            <lpage>24</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2004.12.031</pubid>
                  <pubid idtype="pmpid" link="fulltext">15652478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Phylogenetic shadowing of primate sequences to find functional regions of the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Boffelli</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>McAuliffe</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ovcharenko</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Ovcharenko</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Pachter</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <issue>5611</issue>
            <fpage>1391</fpage>
            <lpage>1394</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1081331</pubid>
                  <pubid idtype="pmpid" link="fulltext">12610304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Identification of clustered microRNAs using an ab initio prediction method</p>
            </title>
            <aug>
               <au>
                  <snm>Sewer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paul</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aravin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brownstein</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>van Nimwegen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1315341</pubid>
                  <pubid idtype="pmpid" link="fulltext">16274478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Xue</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>YD</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>XG</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1360673</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier</p>
            </title>
            <aug>
               <au>
                  <snm>Yousef</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nebozhyn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shatkay</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanterakis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Showe</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Showe</snm>
                  <fnm>MK</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>11</issue>
            <fpage>1325</fpage>
            <lpage>1334</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl094</pubid>
                  <pubid idtype="pmpid" link="fulltext">16543277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data</p>
            </title>
            <aug>
               <au>
                  <snm>Hertel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>E197</fpage>
            <lpage>E202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl257</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873472</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures</p>
            </title>
            <aug>
               <au>
                  <snm>Kwang Loong</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mishra</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>11</issue>
            <fpage>1321</fpage>
            <lpage>1330</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btm026</pubid>
                  <pubid idtype="pmpid" link="fulltext">17267435</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Ab initio identification of human microRNAs based on structure motifs</p>
            </title>
            <aug>
               <au>
                  <snm>Brameier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wiuf</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2238772</pubid>
                  <pubid idtype="pmpid" link="fulltext">18088431</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila</p>
            </title>
            <aug>
               <au>
                  <snm>Gu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Fu</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>8</volume>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Reliable prediction of Drosha processing sites improves microRNA gene prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Helvik</snm>
                  <fnm>SA</fnm>
               </au>
               <au>
                  <snm>Snove</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Saetrom</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2007</pubdate>
            <volume>23</volume>
            <issue>2</issue>
            <fpage>142</fpage>
            <lpage>149</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl570</pubid>
                  <pubid idtype="pmpid" link="fulltext">17105718</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Prediction of RNA-binding proteins from primary sequence by a support vector machine approach</p>
            </title>
            <aug>
               <au>
                  <snm>Han</snm>
                  <fnm>LY</fnm>
               </au>
               <au>
                  <snm>Cai</snm>
                  <fnm>CZ</fnm>
               </au>
               <au>
                  <snm>Lo</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Chung</snm>
                  <fnm>MCM</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YZ</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <issue>3</issue>
            <fpage>355</fpage>
            <lpage>368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370931</pubid>
                  <pubid idtype="pmpid" link="fulltext">14970381</pubid>
                  <pubid idtype="doi">10.1261/rna.5890304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Accurate identification of alternatively spliced exons using support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Dror</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shamir</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>897</fpage>
            <lpage>901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti132</pubid>
                  <pubid idtype="pmpid" link="fulltext">15531599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Distinguishing protein-coding from non-coding RNAs through support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Gough</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>PLoS Genet</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>529</fpage>
            <lpage>536</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1371/journal.pgen.0020029</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Data classification with radial basis function networks based on a novel kernel density estimation algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Oyang</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Hwang</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Ou</snm>
                  <fnm>YY</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CY</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>ZW</fnm>
               </au>
            </aug>
            <source>Ieee Transactions on Neural Networks</source>
            <pubdate>2005</pubdate>
            <volume>16</volume>
            <issue>1</issue>
            <fpage>225</fpage>
            <lpage>236</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1109/TNN.2004.836229</pubid>
                  <pubid idtype="pmpid" link="fulltext">15732402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>SV40-encoded microRNAs regulate viral gene expression and reduce susceptibility to cytotoxic T cells</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Grundhoff</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>Tevethia</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pipas</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Ganem</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>435</volume>
            <issue>7042</issue>
            <fpage>682</fpage>
            <lpage>686</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature03576</pubid>
                  <pubid idtype="pmpid" link="fulltext">15931223</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Viruses and microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Cullen</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <fpage>S25</fpage>
            <lpage>S30</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1793</pubid>
                  <pubid idtype="pmpid" link="fulltext">16736021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>MicroRNAs: expression, avoidance and subversion by vertebrate viruses</p>
            </title>
            <aug>
               <au>
                  <snm>Sarnow</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jopling</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>Norman</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Schutz</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wehner</snm>
                  <fnm>KA</fnm>
               </au>
            </aug>
            <source>Nature Reviews Microbiology</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>9</issue>
            <fpage>651</fpage>
            <lpage>659</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1473</pubid>
                  <pubid idtype="pmpid" link="fulltext">16912711</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>miRBase: microRNA sequences, targets and gene nomenclature</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Grocock</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>van Dongen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Enright</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D140</fpage>
            <lpage>D144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347474</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381832</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>WZ</fnm>
               </au>
               <au>
                  <snm>Godzik</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>13</issue>
            <fpage>1658</fpage>
            <lpage>1659</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl158</pubid>
                  <pubid idtype="pmpid" link="fulltext">16731699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>RefSeq and LocusLink: NCBI gene-centered resources</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <issue>1</issue>
            <fpage>137</fpage>
            <lpage>140</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29787</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125071</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.137</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The UCSC Genome Browser Database</p>
            </title>
            <aug>
               <au>
                  <snm>Karolchik</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Baertsch</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diekhans</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Furey</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Hinrichs</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>YT</fnm>
               </au>
               <au>
                  <snm>Roskin</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Schwartz</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sugnet</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Thomas</snm>
                  <fnm>DJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>1</issue>
            <fpage>51</fpage>
            <lpage>54</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">165576</pubid>
                  <pubid idtype="pmpid" link="fulltext">12519945</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg129</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Vienna RNA secondary structure server</p>
            </title>
            <aug>
               <au>
                  <snm>Hofacker</snm>
                  <fnm>IL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3429</fpage>
            <lpage>3431</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169005</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824340</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg599</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Estimating the contributions of selection and self-organization in RNA secondary structure</p>
            </title>
            <aug>
               <au>
                  <snm>Schultes</snm>
                  <fnm>EA</fnm>
               </au>
               <au>
                  <snm>Hraber</snm>
                  <fnm>PT</fnm>
               </au>
               <au>
                  <snm>LaBean</snm>
                  <fnm>TH</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1999</pubdate>
            <volume>49</volume>
            <issue>1</issue>
            <fpage>76</fpage>
            <lpage>83</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/PL00006536</pubid>
                  <pubid idtype="pmpid" link="fulltext">10368436</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>mRNAs have greater negative folding free energies than shuffled or codon choice randomized sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Seffens</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Digby</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <issue>7</issue>
            <fpage>1578</fpage>
            <lpage>1584</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148359</pubid>
                  <pubid idtype="pmpid" link="fulltext">10075987</pubid>
                  <pubid idtype="doi">10.1093/nar/27.7.1578</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>A comparison of RNA folding measures</p>
            </title>
            <aug>
               <au>
                  <snm>Freyhult</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gardner</snm>
                  <fnm>PP</fnm>
               </au>
               <au>
                  <snm>Moulton</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Evidence that miRNAs are different from other RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Pan</snm>
                  <fnm>XP</fnm>
               </au>
               <au>
                  <snm>Cox</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Cobb</snm>
                  <fnm>GP</fnm>
               </au>
               <au>
                  <snm>Anderson</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Cell Mol Life Sci</source>
            <pubdate>2006</pubdate>
            <volume>63</volume>
            <issue>2</issue>
            <fpage>246</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00018-005-5467-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">16395542</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Metrics on RNA secondary structures</p>
            </title>
            <aug>
               <au>
                  <snm>Moulton</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Zuker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Steel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pointon</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <issue>1&#8211;2</issue>
            <fpage>277</fpage>
            <lpage>292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050081522</pubid>
                  <pubid idtype="pmpid" link="fulltext">10890402</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>RAG: RNA-As-Graphs web resource</p>
            </title>
            <aug>
               <au>
                  <snm>Fera</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shiffeldrim</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Zorn</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Laserson</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Gan</snm>
                  <fnm>HH</fnm>
               </au>
               <au>
                  <snm>Schlick</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">471545</pubid>
                  <pubid idtype="pmpid" link="fulltext">15238163</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>RAG: RNA-As-Graphs database &#8211; concepts, analysis, and features</p>
            </title>
            <aug>
               <au>
                  <snm>Gan</snm>
                  <fnm>HH</fnm>
               </au>
               <au>
                  <snm>Fera</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Zorn</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shiffeldrim</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Tang</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Laserson</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Schlick</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>8</issue>
            <fpage>1285</fpage>
            <lpage>1291</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth084</pubid>
                  <pubid idtype="pmpid" link="fulltext">14962931</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>The Gamma Function</p>
            </title>
            <aug>
               <au>
                  <snm>Artin</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <publisher>New York: Holt, Rinehart and Winston</publisher>
            <pubdate>1964</pubdate>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Machine learning</p>
            </title>
            <aug>
               <au>
                  <snm>Mitchell</snm>
                  <fnm>TM</fnm>
               </au>
            </aug>
            <publisher>New York: McGraw-Hill</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Learning and soft computing : support vector machines, neural networks, and fuzzy logic models</p>
            </title>
            <aug>
               <au>
                  <snm>Kecman</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <publisher>Cambridge, Mass.: MIT Press</publisher>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Data mining : practical machine learning tools and techniques</p>
            </title>
            <aug>
               <au>
                  <snm>Witten</snm>
                  <fnm>IH</fnm>
               </au>
               <au>
                  <snm>Frank</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <publisher>Amsterdam; Boston, MA: Morgan Kaufman</publisher>
            <edition>2</edition>
            <pubdate>2005</pubdate>
         </bibl>
      </refgrp>
   </bm>
</art>
