Simulated metagenomic samples cluster by community; some pairs are much more strongly distinguished than others. Correspondence analysis of a matrix containing the number of sequences belonging to each genus in simulated shotgun read samples of three simulated communities (with genera as rows and samples as columns), as well as in the simulated communities and simulated reference database. Each panel shows two of the first three components plotted against each other. Although these components capture a large part of the variation among samples and the reference database, read samples of Pop1 and Pop3 are not strongly distinguished by components 1 or 3 (bottom panel), while samples from the other two pairs of communities are easier to distinguish. This analysis is consistent with the performance of weighted Fast UniFrac in distinguishing the three different pairs of communities (see Table 3).
Riesenfeld and Pollard BMC Genomics 2013 14:419 doi:10.1186/1471-2164-14-419