This article is part of the supplement: Ninth International Conference on Bioinformatics (InCoB2010): Bioinformatics
DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences
- Equal contributors
Bio-Sciences Division, Innovation Labs, Tata Consultancy Services, 1 Software Units Layout, Hyderabad 500 081, Andhra Pradesh, India
BMC Bioinformatics 2010, 11(Suppl 7):S14 doi:10.1186/1471-2105-11-S7-S14Published: 15 October 2010
In metagenomic sequence data, majority of sequences/reads originate from new or partially characterized genomes, the corresponding sequences of which are absent in existing reference databases. Since taxonomic assignment of reads is based on their similarity to sequences from known organisms, the presence of reads originating from new organisms poses a major challenge to taxonomic binning methods. The recently published SOrt-ITEMS algorithm uses an elaborate work-flow to assign reads originating from hitherto unknown genomes with significant accuracy and specificity. Nevertheless, a significant proportion of reads still get misclassified. Besides, the use of an alignment-based orthology step (for improving the specificity of assignments) increases the total binning time of SOrt-ITEMS.
In this paper, we introduce a rapid binning approach called DiScRIBinATE (
A significant reduction in binning time, coupled with a superior assignment accuracy (as compared to existing binning methods), indicates the immense applicability of the proposed algorithm in rapidly mapping the taxonomic diversity of large metagenomic samples with high accuracy and specificity.
The program is available on request from the authors.