Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Research article

Compression-based distance (CBD): a simple, rapid, and accurate method for microbiota composition comparison

Fang Yang12, Nicholas Chia235*, Bryan A White124 and Lawrence B Schook124

Author Affiliations

1 Division of Nutritional Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

2 Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

3 Loomis Laboratory of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

4 Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA

5 Department of Surgical Research and Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA

For all author emails, please log on.

BMC Bioinformatics 2013, 14:136  doi:10.1186/1471-2105-14-136

Published: 23 April 2013

Abstract

Background

Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons.

Results

We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention.

Conclusion

CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets.

Keywords:
Microbiota comparison; Microbiome analysis; Compression-based distance