Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Correspondence

Equivalent input produces different output in the UniFrac significance test

Jeffrey R Long1*, Vanessa Pittet4, Brett Trost1, Qingxiang Yan3, David Vickers1, Monique Haakensen2 and Anthony Kusalik1

Author Affiliations

1 Department of Computer Science, University of Saskatchewan, 176 Thorvaldson Bldg, 110 Science Place, Saskatoon, Canada

2 Contango Strategies Ltd, LFK Biotechnology Complex, 15-410 Downey Road, Saskatoon, Canada

3 Department of Mathematics and Statistics, University of Saskatchewan, 106 Wiggins Road, Saskatoon, Canada

4 Department of Pathology and Laboratory Medicine, University of Saskatchewan, 106 Wiggins Road, Saskatoon, Canada

For all author emails, please log on.

BMC Bioinformatics 2014, 15:278  doi:10.1186/1471-2105-15-278

Published: 13 August 2014

Abstract

Background

UniFrac is a well-known tool for comparing microbial communities and assessing statistically significant differences between communities. In this paper we identify a discrepancy in the UniFrac methodology that causes semantically equivalent inputs to produce different outputs in tests of statistical significance.

Results

The phylogenetic trees that are input into UniFrac may or may not contain abundance counts. An isomorphic transform can be defined that will convert trees between these two formats without altering the semantic meaning of the trees. UniFrac produces different outputs for these equivalent forms of the same input tree. This is illustrated using metagenomics data from a lake sediment study.

Conclusions

Results from the UniFrac tool can vary greatly for the same input depending on the arbitrary choice of input format. Practitioners should be aware of this issue and use the tool with caution to ensure consistency and validity in their analyses. We provide a script to transform inputs between equivalent formats to help researchers achieve this consistency.

Keywords:
Unifrac; Computer methodologies; Metagenomics