Combination of measures distinguishes pre-miRNAs from other stem-loops in the genome of the newly sequenced Anopheles darlingi
1 Équipe BAOBAB, Laboratoire de Biométrie et Biologie Évolutive (UMR 5558); CNRS; Univ. Lyon 1, 43 bd du 11 nov 1918, 69622, Villeurbanne Cedex, France
2 IST/INESC-ID, 9 Rua Alves Redol, 1000-029 Lisbon, Portugal
3 BAMBOO Team, INRIA Rhone-Alpes, 655 avenue de l'Europe, 38330 Montbonnot Saint-Martin, France
4 Bioinformatics Laboratory, National Laboratory of Scientific Computation (LNCC), Avenida Getúlio Vargas, 333, Petrópolis, Brazil
BMC Genomics 2010, 11:529 doi:10.1186/1471-2164-11-529Published: 29 September 2010
Efforts using computational algorithms towards the enumeration of the full set of miRNAs of an organism have been limited by strong reliance on arguments of precursor conservation and feature similarity. However, miRNA precursors may arise anew or be lost across the evolutionary history of a species and a newly sequenced genome may be evolutionarily too distant from other genomes for an adequate comparative analysis. In addition, the learning of intricate classification rules based purely on features shared by miRNA precursors that are currently known may reflect a perpetuating identification bias rather than a sound means to tell true miRNAs from other genomic stem-loops.
We show that there is a strong bias amongst annotated pre-miRNAs towards robust stem-loops in the genomes of Drosophila melanogaster and Anopheles gambiae and we propose a scoring scheme for precursor candidates which combines four robustness measures. Additionally, we identify several known pre-miRNA homologs in the newly sequenced Anopheles darlingi and show that most are found amongst the top-scoring precursor candidates. Furthermore, a comparison of the performance of our approach is made against two single-genome pre-miRNA classification methods.
In this paper we present a strategy to sieve through the vast amount of stem-loops found in metazoan genomes in search of pre-miRNAs, significantly reducing the set of candidates while retaining most known miRNA precursors. This approach makes no use of conservation data and relies solely on properties derived from our knowledge of miRNA biogenesis.