This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)
HHMMiR: efficient de novo prediction of microRNAs using hierarchical hidden Markov models
1 Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
2 Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
3 Department of Computational Biology, University of Pittsburgh, Pittsburgh, PA 15260, USA
BMC Bioinformatics 2009, 10(Suppl 1):S35 doi:10.1186/1471-2105-10-S1-S35Published: 30 January 2009
MicroRNAs (miRNAs) are small non-coding single-stranded RNAs (20–23 nts) that are known to act as post-transcriptional and translational regulators of gene expression. Although, they were initially overlooked, their role in many important biological processes, such as development, cell differentiation, and cancer has been established in recent times. In spite of their biological significance, the identification of miRNA genes in newly sequenced organisms is still based, to a large degree, on extensive use of evolutionary conservation, which is not always available.
We have developed HHMMiR, a novel approach for de novo miRNA hairpin prediction in the absence of evolutionary conservation. Our method implements a Hierarchical Hidden Markov Model (HHMM) that utilizes region-based structural as well as sequence information of miRNA precursors. We first established a template for the structure of a typical miRNA hairpin by summarizing data from publicly available databases. We then used this template to develop the HHMM topology.
Our algorithm achieved average sensitivity of 84% and specificity of 88%, on 10-fold cross-validation of human miRNA precursor data. We also show that this model, trained on human sequences, works well on hairpins from other vertebrate as well as invertebrate species. Furthermore, the human trained model was able to correctly classify ~97% of plant miRNA precursors. The success of this approach in such a diverse set of species indicates that sequence conservation is not necessary for miRNA prediction. This may lead to efficient prediction of miRNA genes in virtually any organism.