Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Accurate genome relative abundance estimation for closely related species in a metagenomic sample

Michael B Sohn1, Lingling An12*, Naruekamol Pookhao2 and Qike Li1

Author Affiliations

1 Interdisciplinary Program in Statistics, University of Arizona, Tucson AZ 85721, USA

2 Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson AZ 85721, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:242  doi:10.1186/1471-2105-15-242

Published: 16 July 2014

Abstract

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Keywords:
Metagenomics; Alignment similarity; Genomic similarity; Closely related species