Branch-length distortion varies with reference database size, read length, and number of reads. A branch DF of 1.0 is optimal and signifies that a topologically correct branch is estimated to be the same length in the read tree as in the source tree. The amount of branch-length distortion is inversely related to read length and reference database size, and grows with the number of reads. Variation in the DF quartiles indicates that larger reference databases reduce the amount that branch lengths are overestimated (DF third quartile), particularly in scenarios with 200 reads, while longer read lengths consistently improve the amount that branch lengths are underestimated (DF first quartile). In scenarios involving a small reference database, the DF third quartile is increased by increasing the number of reads, regardless of read length; this trend is drastic with pplacer. Each panel shows the mean values of the DF median, first quartile, and third quartile, averaged over 30 simulations for each parameter combination. Vertical error bars show a standard deviation above and below the mean. Data for rpoB family are shown; trends were similar across gene families tested (Additional file 3: Figure S2).
Riesenfeld and Pollard BMC Genomics 2013 14:419 doi:10.1186/1471-2164-14-419