Table 4

Results for the MetaPhyler simulated metagenomic data set (73,086 sequences, 300 bp)
actual CARMA MEGAN MetaPhyler MG-RAST
percentage of sequence classified 93.6 88.2 80.9 29.8
Proteobacteria 47.0 47.6 44.5 48.3 46.7
Firmicutes 21.9 22.2 24.0 21.8 23.1
Actinobacteria 9.7 8.7 8.8 9.1 9.3
Bacteroidetes 4.8 4.5 4.8 4.3 4.4
Cyanobacteria 3.9 3.6 3.8 3.9 3.7
Tenericutes 2.2 2.5 2.7 2.4 2.3
Spirochaetes 1.9 2.4 2.6 2.3 2.2
Chlamydiae 1.3 1.9 2.0 1.8 1.8
Thermotogae 0.9 1.2 1.2 1.1 1.2
Chlorobi 0.9 1.4 1.5 1.3 1.4
percentage of sequence misclassified 0.3 0.3 0.3 0.2
correlation coefficient ≈ 1.0 ≈ 1.0 ≈ 1.0 ≈ 1.0

The actual distribution of sequences compared to the distribution inferred by the alignment-based programs.

Bazinet and Cummings

Bazinet and Cummings BMC Bioinformatics 2012 13:92   doi:10.1186/1471-2105-13-92

Open Data