Comparison of the UniFrac Significance test and the P-test with raw and de-replicated data. This figure illustrates how the same tree can have a significant P-test P-value and a non-significant UniFrac significance test P-value. The trees drawn in A and B have the same topology but different branch lengths. The boxes and triangles represent sequences from two different environments. The trees on the left are being evaluated to determine whether the square and triangle communities are significantly different. The trees on the right are example trees in which the environment assignments have been randomized. The parsimony changes that are calculated with the P-test are represented by red dots. The color of the branches represent calculations made for the UniFrac significance test; branches that lead to only one of the two environments are black and branches that lead to descendants of both environments are grey. A.) A tree that would have a significant P-test result and a non-significant UniFrac Significance test result. The sequences from the square and triangle environments are clustered together on the tree, and it thus only takes 2 changes between environments to explain their distribution. This is less than would be expected if the sequences were randomly distributed between environments as shown on the right, and thus the P-value is likely to be significant (note that in practice, the true tree is compared to many randomized trees and not just one). The monophyletic lineages occur near the tips of the tree, however, and are not associated with a significant amount of unique branch length (black branches). The UniFrac metric value would thus be low and randomization of the tree could easily result in more unique (black) branch length as shown on right, resulting in a non-significant P-value. B.) A tree that would have a significant result for both the P-test and the UniFrac significance test. The P-test results are the same as for the tree in A because the topology is the same. However, because the monophyletic lineages in the square and triangle environment represent a substantial amount of branch length in the tree, the UniFrac value is high. The permutations of environment assignments would thus typically result in less unique branch length, leading to a significant result. C) The same analysis as B except that the diversity at the tips of the tree has been removed by choosing OTUs. The UniFrac distance is essentially unchanged, but randomization over the reduced number of taxa results in non-significant P-values for both the UniFrac Significance test and the P-test.
Lozupone et al. BMC Bioinformatics 2006 7:371 doi:10.1186/1471-2105-7-371