Equivalent input produces different output in the UniFrac significance test
1 Department of Computer Science, University of Saskatchewan, 176 Thorvaldson Bldg, 110 Science Place, Saskatoon, Canada
2 Contango Strategies Ltd, LFK Biotechnology Complex, 15-410 Downey Road, Saskatoon, Canada
3 Department of Mathematics and Statistics, University of Saskatchewan, 106 Wiggins Road, Saskatoon, Canada
4 Department of Pathology and Laboratory Medicine, University of Saskatchewan, 106 Wiggins Road, Saskatoon, Canada
BMC Bioinformatics 2014, 15:278 doi:10.1186/1471-2105-15-278Published: 13 August 2014
UniFrac is a well-known tool for comparing microbial communities and assessing statistically significant differences between communities. In this paper we identify a discrepancy in the UniFrac methodology that causes semantically equivalent inputs to produce different outputs in tests of statistical significance.
The phylogenetic trees that are input into UniFrac may or may not contain abundance counts. An isomorphic transform can be defined that will convert trees between these two formats without altering the semantic meaning of the trees. UniFrac produces different outputs for these equivalent forms of the same input tree. This is illustrated using metagenomics data from a lake sediment study.
Results from the UniFrac tool can vary greatly for the same input depending on the arbitrary choice of input format. Practitioners should be aware of this issue and use the tool with caution to ensure consistency and validity in their analyses. We provide a script to transform inputs between equivalent formats to help researchers achieve this consistency.