|
Resolution: standard / high Figure 3.
Comparison of the UniFrac Significance test and the P-test with raw and de-replicated
data. This figure illustrates how the same tree can have a significant P-test P-value and a non-significant UniFrac significance test P-value. The trees drawn in A and B have the same topology but different branch lengths.
The boxes and triangles represent sequences from two different environments. The trees
on the left are being evaluated to determine whether the square and triangle communities
are significantly different. The trees on the right are example trees in which the
environment assignments have been randomized. The parsimony changes that are calculated
with the P-test are represented by red dots. The color of the branches represent calculations
made for the UniFrac significance test; branches that lead to only one of the two
environments are black and branches that lead to descendants of both environments
are grey. A.) A tree that would have a significant P-test result and a non-significant UniFrac
Significance test result. The sequences from the square and triangle environments
are clustered together on the tree, and it thus only takes 2 changes between environments
to explain their distribution. This is less than would be expected if the sequences
were randomly distributed between environments as shown on the right, and thus the
P-value is likely to be significant (note that in practice, the true tree is compared
to many randomized trees and not just one). The monophyletic lineages occur near the
tips of the tree, however, and are not associated with a significant amount of unique
branch length (black branches). The UniFrac metric value would thus be low and randomization
of the tree could easily result in more unique (black) branch length as shown on right,
resulting in a non-significant P-value. B.) A tree that would have a significant result for both the P-test and the UniFrac
significance test. The P-test results are the same as for the tree in A because the
topology is the same. However, because the monophyletic lineages in the square and
triangle environment represent a substantial amount of branch length in the tree,
the UniFrac value is high. The permutations of environment assignments would thus
typically result in less unique branch length, leading to a significant result. C) The same analysis as B except that the diversity at the tips of the tree has been
removed by choosing OTUs. The UniFrac distance is essentially unchanged, but randomization
over the reduced number of taxa results in non-significant P-values for both the UniFrac Significance test and the P-test.
Lozupone et al. BMC Bioinformatics 2006 7:371 doi:10.1186/1471-2105-7-371 |