<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1748-7188-3-2</ui>
   <ji>1748-7188</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Learning from positive examples when the negative class is undetermined- microRNA gene identification</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Yousef</snm>
               <fnm>Malik</fnm>
               <insr iid="I1"/>
               <insr iid="I3"/>
               <email>yousef@gal-soc.org</email>
            </au>
            <au id="A2">
               <snm>Jung</snm>
               <fnm>Segun</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I4"/>
               <email>sj801@med.nyu.edu</email>
            </au>
            <au id="A3">
               <snm>Showe</snm>
               <mi>C</mi>
               <fnm>Louise</fnm>
               <insr iid="I1"/>
               <email>lshowe@wistar.org</email>
            </au>
            <au id="A4" ca="yes">
               <snm>Showe</snm>
               <mi>K</mi>
               <fnm>Michael</fnm>
               <insr iid="I1"/>
               <email>showe@wistar.org</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Systems Biology Division, The Wistar Institute, Philadelphia, PA 19104, USA</p>
            </ins>
            <ins id="I2">
               <p>School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA 19104, USA</p>
            </ins>
            <ins id="I3">
               <p>Computer Science, The College of Sakhnin, Sakhnin, Israel</p>
            </ins>
            <ins id="I4">
               <p>Sackler Institute of Graduate Biomedical Sciences, N.Y.U School of Medicine, New York, NY 10016, USA</p>
            </ins>
         </insg>
         <source>Algorithms for Molecular Biology</source>
         <issn>1748-7188</issn>
         <pubdate>2008</pubdate>
         <volume>3</volume>
         <issue>1</issue>
         <fpage>2</fpage>
         <url>http://www.almob.org/content/3/1/2</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18226233</pubid>
               <pubid idtype="doi">10.1186/1748-7188-3-2</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>22</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Yousef et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The application of machine learning to classification problems that depend only on positive examples is gaining attention in the computational biology community. We and others have described the use of two-class machine learning to identify novel miRNAs. These methods require the generation of an artificial negative class. However, designation of the negative class can be problematic and if it is not properly done can affect the performance of the classifier dramatically and/or yield a biased estimate of performance. We present a study using one-class machine learning for microRNA (miRNA) discovery and compare one-class to two-class approaches using na&#239;ve Bayes and Support Vector Machines. These results are compared to published two-class miRNA prediction approaches. We also examine the ability of the one-class and two-class techniques to identify miRNAs in newly sequenced species.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Of all methods tested, we found that 2-class naive Bayes and Support Vector Machines gave the best accuracy using our selected features and optimally chosen negative examples. One class methods showed average accuracies of 70&#8211;80% versus 90% for the two 2-class methods on the same feature sets. However, some one-class methods outperform some recently published two-class approaches with different selected features. Using the EBV genome as and external validation of the method we found one-class machine learning to work as well as or better than a two-class approach in identifying true miRNAs as well as predicting new miRNAs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>One and two class methods can both give useful classification accuracies when the negative class is well characterized. The advantage of one class methods is that it eliminates guessing at the optimal features for the negative class when they are not well defined. In these cases one-class methods can be superior to two-class methods when the features which are chosen as representative of that positive class are well defined.</p>
            </sec>
            <sec>
               <st>
                  <p>Availability</p>
               </st>
               <p>The OneClassmiRNA program is available at: <abbrgrp><abbr bid="B1">1</abbr></abbrgrp></p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>MicroRNAs (miRNAs) are single-stranded, non-coding RNAs averaging 21 nucleotides in length. The mature miRNA is cleaved from a 70&#8211;110 nucleotide (<it>nt</it>) "hairpin" precursor with a double-stranded region containing one or more single-stranded loops. MiRNAs target messenger RNAs (mRNAs) for cleavage, primarily by repressing translation and causing mRNA degradation <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>Several computational approaches have been applied to miRNA gene prediction using methods based on sequence conservation and/or structural similarity <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. All of these methods rely on binary classifications that artificially generate a non-miRNA class based on the absence of features used to define the positive class. Nam, <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> constructed a highly specific probabilistic Markov model (HMM) using the features of miRNA sequence and secondary structure; a negative class consisting of 1,000 extended stem-loop structures was generated based on several criteria, including sequence length (64&#8211;90 <it>nt</it>), stem length (above 22 <it>nt</it>), bulge size (under 15 <it>nt</it>), loop size (3&#8211;20 <it>nt</it>), and folding free energy (under -25 kcal/mol). Pfeffer, <it>et al</it>. <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> used support vector machines (SVMs) for predicting conserved miRNAs in herpes viruses. Features were extracted from the stem-loop and represented in a vector space. The negative class was generated from mRNAs, rRNAs, or tRNAs from human and viral genomes. The same technique was also applied to clustered miRNAs <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Xue, <it>et al</it>. <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> defined a negative class called pseudo pre-miRNAs. The criteria for this negative class included a minimum of 18 paired bases, a maximum of -15 kcal/mol folding free energy and no multiple loops. See <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> for a full review of miRNA discovery approaches.</p>
         <p>In a recent publication we described a two-class machine learning approach for miRNA prediction using the na&#239;ve Bayes classifier <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Four criteria were used to select a pool of negative examples from candidate stem loops: stem length out of the range 42&#8211;85 <it>nt</it>, at most -25 kcal/mol of folding free energy, loop length greater than 26 <it>nt </it>and a number of base pairs (bp) that is not in the range (16&#8211;45) of the positive class. This approach, like all of the binary classifiers mentioned earlier, does not address the best number of negative examples to use and this influences the balance between false positive and false negative predictions. A comparison of a genuine negative class with one generated from random data for miRNA target prediction has been reported <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp> showing that the two negative classes did not produce the same results.</p>
         <p>Lately, Wang, <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> developed an elegant algorithm, positive sample only learning (PSoL), to predict non-coding RNA (ncRNA) genes by generating an optimized negative class of ncRNA from so-called "unlabeled" data using two-class SVM. This method addresses predicting ncRNA genes without using negative training examples, but the procedure is quite complicated. Using their data set, we tested one of the one-class approaches, OC-SVM, to demonstrate a solution of the problem they addressed.</p>
         <p>The method we now describe uses only the known miRNAs (positive class) to train the miRNA classifier. We emphasize that the one-class approach is a good tool not only for its simplicity, but in order to avoid generating a negative class where the basis for defining this class is not clear. The only required input for this tool is the miRNA sequences from a specific genome (or multiple genomes) for building the model to be used later as a miRNA predictor. In addition, we have tested the accuracy of the one-class method in the identification of miRNA in "newly sequenced" organisms such as the <it>Epstein Barr virus </it>genome, which were not used for training the classifier. The results are comparable to our two-class approach with high sensitivity and similar numbers of new predictions.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Performance evaluation</p>
            </st>
            <p>Table <tblr tid="T1">1</tblr> shows the performance of five one-class classifiers as well as two-class na&#239;ve Bayes and two-class SVM for comparison. The results of the one-class approaches show a slight superiority for OC-Gaussian and OC-KNN over the other one class methods based on the average of the MCC measurement. However, accuracy is less than the two-class approaches by about ~8%&#8211;10%. During the training stage of the one-class classifier we have set the 10% of the positive data, whose likelihood is furthest from the true positive data based on the distribution, as "outliers" in order to produce a compact classifier. This factor might cause a loss of 10% of information about the target class which might also result in reducing performance compared to the two class approach.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>One-class results obtained from the secondary features plus sequence features.</p>
               </caption>
               <tblbdy cols="14">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>C. elegans</it>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>Mouse</it>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>Human</it>
                        </p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>
                           <it>All-miRNA</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="13">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Method</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>Average MCC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="14">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-SVM</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>0.72</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.74</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>0.91</p>
                     </c>
                     <c ca="center">
                        <p>0.62</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-Gaussian</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-Kmeans</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-PCA</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-KNN</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.83</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="14">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="14" ca="center">
                        <p>Two-Class</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="14">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Na&#239;ve Bayes</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.82 (125)</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (200)</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.92 (300)</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="center">
                        <p>0.93 (4000)</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>SVM</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>0.87 (200)</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>0.98</p>
                     </c>
                     <c ca="center">
                        <p>0.93 (500)</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.98 (300)</p>
                     </c>
                     <c ca="center">
                        <p>0.98</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>0.93 (900)</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Sen = sensitivity, Spe = specificity, and MCC = Matthews Correlation Coefficient. Results are presented for four genomes individually (<it>C. elegans, Mouse, and Human</it>) and <it>All-miRNA </it>as a mixture of multiple miRNAs species. The number in parentheses is the corresponding number of optimal negative examples giving the highest MCC.</p>
               </tblfn>
            </tbl>
            <p>Xue, <it>et al </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> reported a sensitivity of 0.93 and specificity of 0.88 using two-class SVM on the human miRNA with the same number of negative examples (1,000) as we used. Computing the MCC for their results gives MCC = 0.81. OC-KNN with the same data <it>(Human) </it>achieves slightly better results with MCC = 0.86 while comparable results are obtained with OC-Gaussian. See the column "MCC" under <it>Human </it>and the rows of "OC-Gaussian" and "OC- KNN" in Table <tblr tid="T1">1</tblr>. The two-class implementations in Table <tblr tid="T1">1</tblr> are also superior with <it>Human </it>(MCC = 0.98 for SVM and MCC = 0.92 for na&#239;ve Bayes).</p>
            <p>Nam, <it>et al </it><abbrgrp><abbr bid="B8">8</abbr></abbrgrp> used a hidden Markov model (HMM) to classify the human miRNA along with 1,000 negative examples to estimate the performance of their approach. They report 0.73 for sensitivity and 0.96 for specificity (MCC = 0.71). All the OC-methods outperform this algorithm except the OC-SVM which is about the same.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with other prediction methods</p>
            </st>
            <p>The aim of this section is to evaluate the performance of the one-class classification considering different features suggested by other studies <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B17">17</abbr></abbrgrp>. We used the MCC measurement for comparison purposes.</p>
            <p>The triplet-SVM classifier is a 2-class tool developed by Xue, <it>et al </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> that does not rely on comparative genomic approaches. The data consist of training and testing set and these were used to evaluate the performance of one-class approaches. We used the positive 163 human pre-miRNAs for training and then tested with the 30 human pre-miRNAs as positive and 1,000 pseudo pre-miRNAs as negative class. The different performances of one-class approaches are presented in Table <tblr tid="T2">2</tblr>. Many of the results have higher sensitivity but lower specificity than the 2-class, although some of the difference may be attributable to the different feature set. However, two-class na&#239;ve Bayes and two-class SVM (using our features) outperform these results by about 11% and 17% respectively based on the MCC measurement with <it>Human </it>miRNAs in Table <tblr tid="T1">1</tblr>.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>One-class results obtained from triplet-SVM and RNAmicro1.1 tools based on their specific features.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>triplet-SVM (<it>Human</it>)</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>RNAmicro1.1</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Method</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>Spe</p>
                     </c>
                     <c ca="center">
                        <p>MCC</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OC-SVM</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                     <c ca="center">
                        <p>0.72</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.94</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OC-Gaussian</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OC-Kmeans</p>
                     </c>
                     <c ca="center">
                        <p>0.98</p>
                     </c>
                     <c ca="center">
                        <p>0.8</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OC-PCA</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OC-KNN</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>0.91</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Original study results</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                     <c ca="center">
                        <p>0.99</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The last row has the originally reported results.</p>
               </tblfn>
            </tbl>
            <p>RNAmicro1.1 is another miRNA prediction tool developed by Hertel and Stadler <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> that relies mainly on comparative sequence analysis using two-class SVM. The positive set includes 295 alignments of distinct miRNA families obtained from the union of animal miRNAs contained in the Rfam 6.0 (276 are considered with the refined list provided by the authors). The negative set (about 10,000 provided as a new list by the authors) is constructed mainly from tRNA alignments. We have chosen randomly 1,000 to match the same size of negative class used by us and other studies. The results of one-class approaches (Table <tblr tid="T2">2</tblr>) are comparable (an advantage for most of the one-class methods of about 3% from the results reported by the authors). As observed earlier, two-class na&#239;ve Bayes and two-class SVM (based on our features) outperform these results by about 9% with similar data (<it>All-miRNA</it>).</p>
            <p>PSoL is an iterative method developed by Wang, <it>et al</it>. <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> to predict ncRNA genes from the <it>E. coli </it>genome and to define an optimized negative class using two-class SVM. It selects an initial negative set from an unlabelled set, and then uses two-class SVM to expand the negative set gradually by reassigning from the unlabelled data. The expansion is continued until the remaining unlabeled set is reduced to a predefined size <it>N </it>and this set is considered to be positive predictions. We used the same data as the authors used in their study &#8211; 321 positive examples along with 11,818 unlabeled examples &#8211; for the comparison with OC-SVM using linear kernel. We followed their assessment steps using 5-fold cross validation. OC-SVM reached a sensitivity of 0.73 with specificity of 0.92. This is comparable to the PSoL recovery rate (sensitivity) of about 0.8 when the expansion is stopped at <it>N </it>= 1,000.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting miRNA genes in the <it>Epstein Barr Virus (EBV) </it>genome</p>
            </st>
            <p><it>The EBV </it>genome has been extensively studied <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> and an estimate of 20&#8211;30 <it>EBV </it>miRNAs has been reported. However, additional miRNA may remain to be discovered in the <it>EBV </it>genome. We downloaded the whole genome of the <it>Epstein Barr virus </it>(Human herpes virus 4, NC_007605 version NC_007605.1 GI: 82503188) with length of 171,823 <it>nt</it>, from the NCBI website <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, and passed it through the pipeline shown in Fig. <figr fid="F1">1</figr>, which is similar to the one used in Yousef et al. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Thirty-two mature miRNAs reported in Rfam <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> (Release 8.1: May 2006) were used to estimate the sensitivity of each trained type of classifier (Table <tblr tid="T3">3</tblr>). As a comparison with the two-class approach, the same experiment was carried out using the BayesMiRNAfind classifier <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. We generated 5,207 candidates at step 2 (Fig. <figr fid="F1">1</figr>) but only 1,251 passed the potential stem-loop filter at step 3. At step four 68,702 mature miRNA candidates were produced from the 1,251 pre-miRNA candidates.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Components of the one-class computational procedure</p>
               </caption>
               <text>
                  <p>Components of the one-class computational procedure.</p>
               </text>
               <graphic file="1748-7188-3-2-1"/>
            </fig>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Prediction of miRNAs in <it>Epstein Barr Virus </it>with the one-class methods.</p>
               </caption>
               <tblbdy cols="9">
                  <r>
                     <c ca="center">
                        <p>Train</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <it>All-miRNA</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <it>Mouse</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>
                           <it>Human</it>
                        </p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Recent <it>Human</it></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>New</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>New</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>New</p>
                     </c>
                     <c ca="center">
                        <p>Sen</p>
                     </c>
                     <c ca="center">
                        <p>New</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-SVM</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (27/32)</p>
                     </c>
                     <c ca="center">
                        <p>236</p>
                     </c>
                     <c ca="center">
                        <p>0.72 (23/32)</p>
                     </c>
                     <c ca="center">
                        <p>236</p>
                     </c>
                     <c ca="center">
                        <p>0.81 (26/32)</p>
                     </c>
                     <c ca="center">
                        <p>279</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (30/32)</p>
                     </c>
                     <c ca="center">
                        <p>198</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-Gaussian</p>
                     </c>
                     <c ca="center">
                        <p>0.88 (28/32)</p>
                     </c>
                     <c ca="center">
                        <p>258</p>
                     </c>
                     <c ca="center">
                        <p>0.81 (26/32)</p>
                     </c>
                     <c ca="center">
                        <p>233</p>
                     </c>
                     <c ca="center">
                        <p>0.81 (26/32)</p>
                     </c>
                     <c ca="center">
                        <p>266</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (27/32)</p>
                     </c>
                     <c ca="center">
                        <p>275</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-Kmeans</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (29/32)</p>
                     </c>
                     <c ca="center">
                        <p>284</p>
                     </c>
                     <c ca="center">
                        <p>0.97 (31/32)</p>
                     </c>
                     <c ca="center">
                        <p>266</p>
                     </c>
                     <c ca="center">
                        <p>0.78 (25/32)</p>
                     </c>
                     <c ca="center">
                        <p>269</p>
                     </c>
                     <c ca="center">
                        <p>0.97 (31/32)</p>
                     </c>
                     <c ca="center">
                        <p>271</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-PCA</p>
                     </c>
                     <c ca="center">
                        <p>0.97 (31/32)</p>
                     </c>
                     <c ca="center">
                        <p>284</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (29/32)</p>
                     </c>
                     <c ca="center">
                        <p>255</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (29/32)</p>
                     </c>
                     <c ca="center">
                        <p>259</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (30/32)</p>
                     </c>
                     <c ca="center">
                        <p>283</p>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>OC-KNN</p>
                     </c>
                     <c ca="center">
                        <p>0.88 (28/32)</p>
                     </c>
                     <c ca="center">
                        <p>272</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (27/32)</p>
                     </c>
                     <c ca="center">
                        <p>266</p>
                     </c>
                     <c ca="center">
                        <p>0.81 (26/32)</p>
                     </c>
                     <c ca="center">
                        <p>283</p>
                     </c>
                     <c ca="center">
                        <p>0.91 (29/32)</p>
                     </c>
                     <c ca="center">
                        <p>269</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="9">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>na&#239;ve Bayes</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (27/32)</p>
                     </c>
                     <c ca="center">
                        <p>165</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>N/A</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (30/32)</p>
                     </c>
                     <c ca="center">
                        <p>276</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><it>All-miRNA</it>, <it>Mouse</it>, or <it>Human </it>served as training data sets. New = new miRNA predictions.</p>
               </tblfn>
            </tbl>
            <p>As shown in Table <tblr tid="T3">3</tblr>, all the one-class methods are able to recognize most of the reported virus miRNA with sensitivity of 72%&#8211;90%. OC-PCA has the highest sensitivity when trained by <it>All-miRNA </it>or <it>Human </it>miRNAs, whereas OC-Kmeans is superior when trained by <it>Mouse </it>miRNAs. BayesMiRNAfind succeeds in achieving 84% sensitivity along with 165 reported new predictions.</p>
            <p>Rfam miRNAs registry Release 8.1: (May 2006) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> includes a new list of human miRNA (462 stem-loop sequences) and we also used this new data to train the one-class methods. These results are presented in the last column of Table <tblr tid="T3">3</tblr>. In this study, 18 of the 462 new human miRNAs were discarded since they fail to form a stem-loop structure based on mfold. The new one-class results with this data set are better than those determined with the previous list of human miRNAs or to the other data sets included in Table <tblr tid="T3">3</tblr>. We believe this is because the "recent human " list is richer and cleaner as the number of miRNAs listed is almost double the previous one, and it is not surprising that the performance of classifiers improves as the number of positive examples for training increases. The two-class BayesMiRNAfind was also retrained with the new human miRNA sequences and with different numbers of negative examples. The best results obtained were with 200 negative examples yielding 94% (30/32) sensitivity along with 276 new miRNA predictions.</p>
            <p>Generally, approximately 4% of the new miRNA candidates (~200/5,207) were identified by the computational procedure (Fig. <figr fid="F1">1</figr>, compare step 6 with step 3) while about 88% (28/32) of the known miRNAs were retrieved (Table <tblr tid="T3">3</tblr>). Using different filters (score, conservation, common, etc,) can reduce the number of miRNA predictions; for example, selecting 0.25 as a threshold (step 7 in Fig. <figr fid="F1">1</figr>) for OC-Gaussian with <it>All-miRNA </it>model (See Fig. <figr fid="F2">2</figr>) will recover 97% of the captured true miRNA (0.97*28) while reducing the new miRNA prediction by 42%. A threshold of 0.3 recovers 40% of the captured miRNA (0.4*28) and a reduction of about 95% of the new miRNA predictions. The choice of the threshold is arbitrary and it determines the number of the final predictions. However, one can set a threshold that captures 70&#8211;80% of the true miRNA to have reliable predictions. To assess our predictions we have used the triplet-SVM classifier tools <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> to evaluate the OC-Gaussian results. 87% of the known miRNA captured by OC-Gaussian classifier were confirmed by the triplet-SVM classifier and 13% of our new miRNA predictions were confirmed as well. This interesting result suggests that combining different methods may lead to classifying miRNAs more accurately. This also may strengthen our main purpose: to reduce the false positive predictions while obtaining high sensitivity when analyzing a large genomic sequence.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>One-Class Gaussian classification scores</p>
               </caption>
               <text>
                  <p><b>One-Class Gaussian classification scores</b>. This shows the distribution of OC-Gaussian classifier scores over the miRNAs class and the new miRNA prediction from EBV genome sequences. <it>All-miRNA </it>is used for training.</p>
               </text>
               <graphic file="1748-7188-3-2-2"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>The one-class approach in machine learning has been receiving more attention particularly for solving problems where the negative class is not well defined <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>; moreover, the one class approach has been successfully applied in various fields including text mining <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, functional Magnetic Resonance Imaging (fMRI) <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and signature verification <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>.</p>
         <p>In this paper we have presented a one-class approach to predicting miRNAs based on their secondary structure and sequence features from other studies using information only from the positive (miRNA) class. We approached this problem because an arbitrary selection of the negative class in these predictive studies can be difficult and can bias the results. This may be particularly true as new organisms are surveyed where the examples for a negative class are not clearly defined. We find that the accuracy of prediction using one-class methods depends on the features used and in some cases may be better than a two-class approach judged by our own and others' studies. We found slightly greater accuracy for 2-class than one-class using our feature set, but this was not generally true using different feature sets (see Table <tblr tid="T2">2</tblr>).</p>
         <p>We find that the miRNA features used in our studies appear to describe the miRNA class more accurately than those used in some previous studies <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B17">17</abbr></abbrgrp>. The features we proposed are more likely to capture the functionality of the miRNA by considering the bulges, loops and asymmetric-loops features. We also show that the triplet-SVM classifier tools <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> combining with some classifiers (either one-class or two-class) using our suggested features is a reasonable way to reduce the false positive prediction while preserving high sensitivity. This approach could be usefully applied to a large genome (as human, mouse, and etc.), especially when conservation is not considered as a feature for a cross-species analysis.</p>
         <p>Among the different one-class approaches including Support Vector Machines (SVMs), Gaussian, Kmeans, Principal Component Analysis (PCA), and K-Nearest Neighbor (K-NN), we found that OC-KNN and OC-Gaussian are superior to others in terms of prediction specificity as measured by their ability to accurately capture only the known miRNAs. High specificity is very important in genome wide analyses where the numbers of predictions can be very large and false positives must be minimized.</p>
         <p>The principal advantage to the one class approach lies in not having to define the characteristics of a negative class. Two-class classifiers are an obvious choice in many instances where the negative class is obvious, e.g., comparison of tissue from healthy controls with tumor tissue from a cancer patient. When searching a genome for miRNA, the definition of non-miRNA is not well defined so many false positives may be predicted and some true miRNA species may not be detected. We have applied this one-class approach to miRNA discovery, and a similar application might also be useful for miRNA target prediction in which the definition of a negative class is also ambiguous.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Choosing structural and sequence features</p>
            </st>
            <p>We begin by describing features of miRNA extracted from both secondary structure and sequences. We adopted the structural features from our two-class miRNA prediction method <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> for the development of a one-class method. For the positive (miRNA) class, the 21 nt of the mature miRNA are mapped into its associated stem-loop (generated by the mfold program <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>) and then features are extracted as described below. Similarly, we used sliding 21 nt windows along each stem-loop strand to extract features for the negative (non-miRNA) class.</p>
            <p>For the structural features, 62 features are derived from three parts of the associated hairpin (stem-loop) (See Fig. <figr fid="F3">3</figr>) &#8211; foot, mature, and head &#8211; and include the following for each of these parts: (1) the total number of base pairs(bp), (2) the number of bulges, (3) the number of loops, (4) the number of asymmetric loops, (5) eight features representing the number of bulges of lengths 1&#8211;7 and greater than 7, (6) eight features representing the number of symmetric loops of length 1&#8211;7 and greater than 7, (7) the distance from the mature miRNA candidate to the first paired base of the foot and head part.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Partition stem-loop into 3 parts</p>
               </caption>
               <text>
                  <p><b>Partition stem-loop into 3 parts</b>. Foot, Mature and Head features to determine potential stem-loops.</p>
               </text>
               <graphic file="1748-7188-3-2-3"/>
            </fig>
            <p>For the sequence features, we define "words" as sequences having lengths equal to or less than 3. The frequency of each word in the first 9 nt of the 21 nt putative mature miRNA is extracted to form a representation in the vector space. For justification of the use of first 9 nt and the 1- 2- and 3-mers ("words"), a comparison study between different "words" lengths was conducted as presented in Table A and Table B [Additional file <supplr sid="S1">1</supplr>]. More detailed information can be found in <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. When using a two-class method, we chose values for features of the negative class which lie outside the distributions of values for those features which characterized the positive class <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. For one-class methods this required arbitrary choice is unnecessary since there is no need to describe a negative class.</p>
            <suppl id="S1">
               <title>
                  <p>Additional File 1</p>
               </title>
               <text>
                  <p>Annotation of species used and additional data on accuracy associated with various one-class parameters. <b>Table A. </b>Sensitivity (Sen) and specificity (Spe) from one-class SVM using various word-lengths and the first 9 nt of the mature miRNA. <b>Table B. </b>Sensitivity (Sen) and specificity (Spe) obtained from one-class SVM to find the optimal number of the first <it>k </it>nucleotides using word length 3/4 3. <b>Table C. </b>Importance of the sequence features alone for classification. <b>Table D. </b>Optimized parameters for each one-class method. <b>Table E. </b>Annotation for all used species. <b>Table F. </b>The size of each dataset after removing similar structures of mature microRNAs. <b>Table G. </b>Accuracy in classification of <it>All-miRNA </it>dataset after masking to remove homologs. <b>Table H. </b>One-Class results obtained from the secondary features only and secondary features plus sequence features</p>
               </text>
               <file name="1748-7188-3-2-S1.doc">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>One-class methods</p>
            </st>
            <p>In general a binary learning (two-class) approach to miRNA discovery considers both positive (miRNA) and negative (non-miRNA) classes by providing examples for the two classes to a learning algorithm in order to build a classifier that will attempt to discriminate between them. The most common term for this kind of learning is <it>supervised learning </it>where the labels of the two-classes are known before hand. One-class uses only the information for the target class (positive class) building a classifier which is able to recognize the examples belonging to its target and rejecting others as outliers.</p>
            <p>Among the many classification algorithms available, we chose five one-class algorithms to compare for miRNA discovery. We give a brief description of each one-class classifier and we refer references <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp> for additional details including a description of parameters and thresholds. The LIBSVM library <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> was used as implementation of the SVM (both one-class and two-class using the RBF kernel function) and the DDtools <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> for the other one-class methods. See Table D [Additional file <supplr sid="S1">1</supplr>] for optimal parameter selections and used parameter value.</p>
            <p>Each classifier returns a score which is a measure of the likelihood that the candidate being tested belongs to the positive class. The highest score determines the preferred candidate associated with a given hairpin structure, see Fig. <figr fid="F1">1</figr></p>
         </sec>
         <sec>
            <st>
               <p>One-class support vector machines (OC-SVM)</p>
            </st>
            <p>Support Vector Machines (SVMs) are a learning machine developed as a two-class approach <abbrgrp><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr></abbrgrp>. The use of one-class SVM was originally suggested by Scholkopf et al. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. One-class SVM is an algorithmic method that produces a prediction function trained to "capture" most of the training data. For that purpose a kernel function is used to map the data into a feature space where the SVM is employed to find the hyper-plane with maximum margin from the origin of the feature space. In this use, the margin to be maximized between the two classes (in two-class SVM) becomes the distance between the origin and the support vectors which define the boundaries of the surrounding circle, (or hyper-sphere in high-dimensional space) which encloses the single class.</p>
         </sec>
         <sec>
            <st>
               <p>One class Gaussian (OC-Gaussian)</p>
            </st>
            <p>The Gaussian model is considered as a density estimation model. The assumption is that the target samples form a multivariate normal distribution, therefore for a given test sample <it>z </it>in <it>n</it>-dimensional space, the probability density function can be calculated as:</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1748-7188-3-2-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>p</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:mi>z</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mn>2</m:mn>
                                       <m:mi>&#928;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mi>n</m:mi>
                                       <m:mo>/</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msup>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mrow>
                                          <m:mo>|</m:mo>
                                          <m:mi>&#931;</m:mi>
                                          <m:mo>|</m:mo>
                                       </m:mrow>
                                    </m:mrow>
                                    <m:mrow>
                                       <m:mn>1</m:mn>
                                       <m:mo>/</m:mo>
                                       <m:mn>2</m:mn>
                                    </m:mrow>
                                 </m:msup>
                              </m:mrow>
                           </m:mfrac>
                           <m:msup>
                              <m:mi>e</m:mi>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mn>1</m:mn>
                                 <m:mo>/</m:mo>
                                 <m:mn>2</m:mn>
                                 <m:msup>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>z</m:mi>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mi>&#956;</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                    <m:mi>T</m:mi>
                                 </m:msup>
                                 <m:msup>
                                    <m:mi>&#931;</m:mi>
                                    <m:mrow>
                                       <m:mo>&#8722;</m:mo>
                                       <m:mn>1</m:mn>
                                    </m:mrow>
                                 </m:msup>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>z</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>&#956;</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:msup>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemiCaaNaeiikaGIaemOEaONaeiykaKIaeyypa0tcfa4aaSaaaeaacqaIXaqmaeaacqGGOaakcqaIYaGmcqqHGoaucqGGPaqkdaahaaqabeaacqWGUbGBcqGGVaWlcqaIYaGmaaWaaqWaaeaacqqHJoWuaiaawEa7caGLiWoadaahaaqabeaacqaIXaqmcqGGVaWlcqaIYaGmaaaaaOGaemyzau2aaWbaaSqabeaacqGGOaakcqGHsislcqaIXaqmcqGGVaWlcqaIYaGmcqGGOaakcqWG6bGEcqGHsisliiGacqWF8oqBcqGGPaqkdaahaaadbeqaaiabdsfaubaaliabfo6atnaaCaaameqabaGaeyOeI0IaeGymaedaaSGaeiikaGIaemOEaONaeyOeI0Iae8hVd0MaeiykaKcaaaaa@591F@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#956; </it>and <it>&#931; </it>are the mean and covariance matrix of the target class estimated from the training samples.</p>
         </sec>
         <sec>
            <st>
               <p>One-class Kmeans (OC-Kmeans)</p>
            </st>
            <p>Kmeans is a simple and well-known unsupervised machine learning algorithm used in order to partition the data into <it>k </it>clusters. Using the OC-Kmeans we describe the data as <it>k </it>clusters, or more specifically as <it>k </it>centroids, one derived from each cluster. For a new sample, <it>z</it>, the distance <it>d</it>(<it>z</it>) is calculated as the minimum distance to each centroid. Then based on a user threshold, the classification decision is made. If <it>d</it>(<it>z</it>) is less than the threshold the new sample belongs to the target class, otherwise it is rejected.</p>
            <p>One-class principal component analysis (OC-PCA). Principal component analysis (PCA) is a classical statistical method known as a linear transform that has been widely used in data analysis and compression. Mainly PCA is a projection method used for reducing dimensionality in a given dataset by capturing the most variance by a few orthogonal subspaces called principal components (PCs). For the one-class approach (OC-PCA) one needs to build the PCA model based on the training set and then for a given test example <it>z </it>the distance to the PCA(<it>z</it>) model is calculated and used as a decision factor for acceptance or rejection.</p>
         </sec>
         <sec>
            <st>
               <p>One-class K-nearest neighbor (OC-KNN)</p>
            </st>
            <p>The one-class nearest neighbor classifier (OC-KNN) is a modification of the known two-class nearest neighbor classifier which learns from positive examples only. The algorithm operates by storing all the training examples as its model, then for a given test example <it>z</it>, the distance to its nearest neighbor <it>y </it>(<it>y </it>= <it>NN</it>(<it>z</it>)) is calculated as <it>d</it>(<it>z</it>, <it>y</it>). The new sample belongs to the target class when:</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1748-7188-3-2-i2" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mi>d</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>z</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:mi>d</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>N</m:mi>
                                 <m:mi>N</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>y</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mfrac>
                           <m:mo>&lt;</m:mo>
                           <m:mi>&#948;</m:mi>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaqcfa4aaSaaaeaacqWGKbazcqGGOaakcqWG6bGEcqGGSaalcqWG5bqEcqGGPaqkaeaacqWGKbazcqGGOaakcqWG5bqEcqGGSaalcqWGobGtcqWGobGtcqGGOaakcqWG5bqEcqGGPaqkcqGGPaqkaaGccqGH8aapiiGacqWF0oazaaa@4129@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>NN</it>(<it>y</it>) is the nearest neighbor of <it>y</it>, in other words, it is the nearest neighbor of the nearest neighbor of <it>z</it>. The default value of <it>&#948; </it>is 1. The average distance of the <it>k </it>nearest neighbors is considered for the OC-KNN implementation.</p>
         </sec>
         <sec>
            <st>
               <p>Classification performance evaluation</p>
            </st>
            <p>To evaluate classification performance, we used the data generated from the positive class and 1,000 negative examples chosen at random from the negative class pool (candidates which failed one of four initial criteria, as previously described <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>). The negative class is not used for training of the one-class classifiers, but merely for estimating the specificity performance.</p>
            <p>The positive class data includes 117 miRNAs from <it>C. elegans</it>, 224 miRNAs from <it>Mouse</it>, 243 miRNAs from <it>Human</it>, and all 1,359 known miRNAs from other species, called <it>All-miRNA </it><abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. In <it>All-miRNA</it>, 100 homologous precursors were removed from the dataset to avoid bias, but this had little effect on accuracy (compare Table F with Table G [Additional file <supplr sid="S1">1</supplr>]). See <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> for more details.</p>
            <p>The two-class na&#239;ve Bayes classifier and two-class SVM were trained with 90% of the positive miRNA data and with a negative class ranging from 50 examples to 900 chosen randomly from the pool of 1,000 negative examples. The test was done with the remaining 10% from the miRNA class and the remaining negative examples. The evaluation procedure was repeated 100 times and the results are reported in Table <tblr tid="T1">1</tblr> under the title "Two-Class." For the na&#239;ve Bayes test with the set <it>All-miRNA</it>, the number of negative examples was extended to 55,000.</p>
            <p>Each one-class algorithm was trained using 90% of the positive class and the remaining 10% was used for sensitivity evaluation. The randomly selected 1,000 negative examples were used for the evaluation of specificity. The whole process was repeated 100 times in order to evaluate the stability of the methods. Additionally, the Matthews Correlation Coefficient (MCC) <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> measurement is used to take into account both over-prediction and under-prediction in imbalanced data sets. It is defined as:</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1748-7188-3-2-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>M</m:mi>
                           <m:mi>C</m:mi>
                           <m:mi>C</m:mi>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mrow>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:mi>T</m:mi>
                                 <m:mi>p</m:mi>
                                 <m:mi>T</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mi>F</m:mi>
                                 <m:mi>p</m:mi>
                                 <m:mi>F</m:mi>
                                 <m:mi>n</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                              <m:mrow>
                                 <m:msqrt>
                                    <m:mrow>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>p</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>p</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>n</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>n</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>n</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>T</m:mi>
                                       <m:mi>n</m:mi>
                                       <m:mo>+</m:mo>
                                       <m:mi>F</m:mi>
                                       <m:mi>p</m:mi>
                                       <m:mo stretchy="false">)</m:mo>
                                    </m:mrow>
                                 </m:msqrt>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaemyta0Kaem4qamKaem4qamKaeyypa0tcfa4aaSaaaeaacqGGOaakcqWGubavcqWGWbaCcqWGubavcqWGUbGBcqGHsislcqWGgbGrcqWGWbaCcqWGgbGrcqWGUbGBcqGGPaqkaeaadaGcaaqaaiabcIcaOiabdsfaujabdchaWjabgUcaRiabdAeagjabdchaWjabcMcaPiabcIcaOiabdsfaujabdchaWjabgUcaRiabdAeagjabd6gaUjabcMcaPiabcIcaOiabdsfaujabd6gaUjabgUcaRiabdAeagjabd6gaUjabcMcaPiabcIcaOiabdsfaujabd6gaUjabgUcaRiabdAeagjabdchaWjabcMcaPaqabaaaaaaa@5C7D@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>The MCC score is in the interval (-1, 1), where one shows a perfect separation, and zero is the expected value for random scores.</p>
            <p>In Table <tblr tid="T1">1</tblr>, we present the performance for each one-class classifier (The performance using secondary structural features without any sequence information is shown separately in Table H [Additional file <supplr sid="S1">1</supplr>]). The performance for the two-class methods is presented as well. The results for a specific number of negative examples with the highest MCC only are shown.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>MY originated the project, supervised programming and drafted the paper, SJ carried out calculations and programming, MKS and LCS provided the biological applications, reviewed data and finalized manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>This project is funded in part under a grant with the Pennsylvania Department of Health (PA DOH Commonwealth Universal Research Enhancement Program), and Tobacco Settlement grants ME01-740 (L.C. Showe). S. Jung is supported by the Greater Philadelphia Bioinformatics Alliance (GPBA) internship grant. We would like to thank Jana Hertel, Chenghai Xue, and Stephen Holbrook for providing us with the data used in their study.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>One Class MiRNAfind Gene Prediction Web Server</p>
            </title>
            <url>http://wotan.wistar.upenn.edu/OneClassmiRNA/</url>
         </bibl>
         <bibl id="B2">
            <title>
               <p>MicroRNAs: Genomics, Biogenesis, Mechanism, and Function</p>
            </title>
            <aug>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2004</pubdate>
            <volume>116</volume>
            <issue>2</issue>
            <fpage>281</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">14744438</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Vertebrate MicroRNA Genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>299</volume>
            <issue>5612</issue>
            <fpage>1540</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12624257</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The microRNAs of Caenorhabditis elegans</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Lau</snm>
                  <fnm>NC</fnm>
               </au>
               <au>
                  <snm>Weinstein</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Abdelhakim</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yekta</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rhoades</snm>
                  <fnm>MW</fnm>
               </au>
               <au>
                  <snm>Burge</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Bartel</snm>
                  <fnm>DP</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2003</pubdate>
            <volume>17</volume>
            <issue>8</issue>
            <fpage>991</fpage>
            <lpage>1008</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12672692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>New human and mouse microRNA genes found by homology search</p>
            </title>
            <aug>
               <au>
                  <snm>Weber</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>FEBS Journal</source>
            <pubdate>2005</pubdate>
            <volume>272</volume>
            <issue>1</issue>
            <fpage>59</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15634332</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Computational identification of Drosophila microRNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Lai</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Tomancak</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Biology</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>7</issue>
            <fpage>R42</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">193629</pubid>
                  <pubid idtype="pmpid" link="fulltext">12844358</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Computational and Experimental Identification of C. elegans microRNAs</p>
            </title>
            <aug>
               <au>
                  <snm>Grad</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Aach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Reinhart</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Ruvkun</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Molecular Cell</source>
            <pubdate>2003</pubdate>
            <volume>11</volume>
            <issue>5</issue>
            <fpage>1253</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12769849</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Human microRNA prediction through a probabilistic co-learning model of sequence and structure</p>
            </title>
            <aug>
               <au>
                  <snm>Nam</snm>
                  <fnm>J-W</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>K-R</fnm>
               </au>
               <au>
                  <snm>Han</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>VN</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B-T</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <issue>11</issue>
            <fpage>3570</fpage>
            <lpage>3581</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1159118</pubid>
                  <pubid idtype="pmpid" link="fulltext">15987789</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Identification of microRNAs of the herpesvirus family</p>
            </title>
            <aug>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sewer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lagos-Quintana</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sheridan</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Grasser</snm>
                  <fnm>FA</fnm>
               </au>
               <au>
                  <snm>van Dyk</snm>
                  <fnm>LF</fnm>
               </au>
               <au>
                  <snm>Ho</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Shuman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chien</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Meth</source>
            <pubdate>2005</pubdate>
            <volume>2</volume>
            <issue>4</issue>
            <fpage>269</fpage>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Identification of clustered microRNAs using an ab initio prediction method</p>
            </title>
            <aug>
               <au>
                  <snm>Sewer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Paul</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Landgraf</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Aravin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Pfeffer</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brownstein</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tuschl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>van Nimwegen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Zavolan</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>267</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1315341</pubid>
                  <pubid idtype="pmpid" link="fulltext">16274478</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Xue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>He</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>G-P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <issue>1</issue>
            <fpage>310</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1360673</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Approaches to microRNA discovery</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RHA</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Combining multi-species genomic data for microRNA identification using a Naive Bayes classifier</p>
            </title>
            <aug>
               <au>
                  <snm>Yousef</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nebozhyn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shatkay</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kanterakis</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Showe</snm>
                  <fnm>LC</fnm>
               </au>
               <au>
                  <snm>Showe</snm>
                  <fnm>MK</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>11</issue>
            <fpage>1325</fpage>
            <lpage>1334</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16543277</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>miTarget: microRNA target gene prediction using a support vector machine</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>S-K</fnm>
               </au>
               <au>
                  <snm>Nam</snm>
                  <fnm>J-W</fnm>
               </au>
               <au>
                  <snm>Rhee</snm>
                  <fnm>J-K</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>W-J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>B-T</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>411</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1594580</pubid>
                  <pubid idtype="pmpid" link="fulltext">16978421</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A Kernel Method for MicroRNA Target Prediction Using Sensible Data and Position-Based Features</p>
            </title>
            <aug>
               <au>
                  <snm>Sung-Kyu</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jin-Wu</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Wha-Jin</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Byoung-Tak</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology: 2005</source>
            <pubdate>2005</pubdate>
            <fpage>46</fpage>
            <lpage>52</lpage>
         </bibl>
         <bibl id="B16">
            <title>
               <p>PSoL: a positive sample only learning algorithm for finding non-coding RNA genes</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Ding</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Meraz</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Holbrook</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <fpage>btl441</fpage>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data</p>
            </title>
            <aug>
               <au>
                  <snm>Hertel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stadler</snm>
                  <fnm>PF</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e197</fpage>
            <lpage>e202</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16873472</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Epstein-Barr Virus MicroRNAs Are Evolutionarily Conserved and Differentially Expressed</p>
            </title>
            <aug>
               <au>
                  <snm>Cai</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Sch</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>auml</snm>
                  <fnm/>
               </au>
               <au>
                  <snm>fer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bilello</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Desrosiers</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Raab-Traub</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Cullen</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>PLoS Pathogens</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <issue>3</issue>
            <fpage>e23</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1409806</pubid>
                  <pubid idtype="pmpid" link="fulltext">16557291</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>A combined computational and microarray-based approach identifies novel microRNAs encoded by human gamma-herpesviruses</p>
            </title>
            <aug>
               <au>
                  <snm>Grundhoff</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Ganem</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2006</pubdate>
            <volume>12</volume>
            <issue>5</issue>
            <fpage>733</fpage>
            <lpage>750</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1440911</pubid>
                  <pubid idtype="pmpid" link="fulltext">16540699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>NCBI</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov</url>
         </bibl>
         <bibl id="B21">
            <title>
               <p>The microRNA Registry</p>
            </title>
            <aug>
               <au>
                  <snm>Griffiths-Jones</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>90001</issue>
            <fpage>D109</fpage>
            <lpage>111</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308757</pubid>
                  <pubid idtype="pmpid" link="fulltext">14681370</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>One Class SVM for Yeast Regulation Prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Kowalczyk</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Raskutti</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>SIGKDD Explorations</source>
            <pubdate>2002</pubdate>
            <volume>4</volume>
            <issue>2</issue>
            <fpage>99</fpage>
            <lpage>100</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Support vector machines for novel class detection in Bioinformatics</p>
            </title>
            <aug>
               <au>
                  <snm>Spinosa</snm>
                  <fnm>EJ</fnm>
               </au>
               <au>
                  <cnm>Carvalho ACPLFd</cnm>
               </au>
            </aug>
            <source>Genetics and Molecular Research (GMR)</source>
            <pubdate>2005</pubdate>
            <volume>4</volume>
            <issue>3</issue>
            <fpage>608</fpage>
            <lpage>615</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16342046</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A Needle in a Haystack: Local One-Class Optimization</p>
            </title>
            <aug>
               <au>
                  <snm>Crammer</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Chechik</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proceedings of the Twenty-First International Conference on Machine Learning (ICML): 2004</source>
            <pubdate>2004</pubdate>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Robust one-class clustering using hybrid global and local search</p>
            </title>
            <aug>
               <au>
                  <snm>Gupta</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ghosh</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of the 22nd international conference on Machine learning 2005 Bonn, Germany</source>
            <publisher>ACM Press</publisher>
            <pubdate>2005</pubdate>
            <fpage>273</fpage>
            <lpage>280</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>One-Class SVMs for Document Classification</p>
            </title>
            <aug>
               <au>
                  <snm>Manevitz</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Yousef</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Journal of Machine Learning Research</source>
            <pubdate>2001</pubdate>
            <fpage>139</fpage>
            <lpage>154</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Feature characterization in fMRI data: the Information Bottleneck approach</p>
            </title>
            <aug>
               <au>
                  <snm>Thirion</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Faugeras</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Medical Image Analysis</source>
            <pubdate>2004</pubdate>
            <volume>8</volume>
            <issue>4</issue>
            <fpage>403</fpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15567705</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Authorship verification as a one-class classification problem</p>
            </title>
            <aug>
               <au>
                  <snm>Koppel</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schler</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proceedings of the twenty-first international conference on Machine learning 2004; Banff, Alberta, Canada</source>
            <publisher>ACM Press</publisher>
            <pubdate>2004</pubdate>
            <fpage>62</fpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Mfold web server for nucleic acid folding and hybridization prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Zuker</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>13</issue>
            <fpage>3406</fpage>
            <lpage>3415</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">169194</pubid>
                  <pubid idtype="pmpid" link="fulltext">12824337</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>One-class classification; Concept-learning in the absence of counter-examples</p>
            </title>
            <aug>
               <au>
                  <snm>Tax</snm>
                  <fnm>DMJ</fnm>
               </au>
            </aug>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Estimating the Support of a High-Dimensional Distribution</p>
            </title>
            <aug>
               <au>
                  <snm>Scholkopf</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Platt</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Shawe-Taylor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Smola</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Williamson</snm>
                  <fnm>RC</fnm>
               </au>
            </aug>
            <source>Neural Comp</source>
            <pubdate>2001</pubdate>
            <volume>13</volume>
            <issue>7</issue>
            <fpage>1443</fpage>
            <lpage>1471</lpage>
         </bibl>
         <bibl id="B32">
            <title>
               <p>LIBSVM: a library for support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Chang</snm>
                  <fnm>C-C</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>C-J</fnm>
               </au>
            </aug>
            <pubdate>2001</pubdate>
         </bibl>
         <bibl id="B33">
            <title>
               <p>DDtools, the Data Description Toolbox for Matlab</p>
            </title>
            <aug>
               <au>
                  <snm>Tax</snm>
                  <fnm>DMJ</fnm>
               </au>
            </aug>
            <pubdate>2005</pubdate>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Advances in Kernel Methods</p>
            </title>
            <aug>
               <au>
                  <snm>Sch&#246;lkopf</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Burges</snm>
                  <fnm>CJC</fnm>
               </au>
               <au>
                  <snm>Smola</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <publisher>Cambridge, MA: MIT Press</publisher>
            <pubdate>1999</pubdate>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The Nature of Statistical Learning Theory</p>
            </title>
            <aug>
               <au>
                  <snm>Vapnik</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <publisher>Springer</publisher>
            <pubdate>1995</pubdate>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Comparison of the predicted and observed secondary structure of T4 phage lysozyme</p>
            </title>
            <aug>
               <au>
                  <snm>Matthews</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>1975</pubdate>
            <volume>405</volume>
            <issue>2</issue>
            <fpage>442</fpage>
            <lpage>451</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1180967</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
