Table 1

Effect of genome quality on annotation efficacy

Uncorrected


Pfama

COGa

KEGGa


Including PP-C42

% Partial ORFs fragments vs. % all ORFs annotated

r = -0.854

P < 0.001

r = -0.551

P = 0.012

r = 0.586

P = 0.007

Mean ORF length vs. % all ORFs annotated

r = 0.785

P < 0.001

r = 0.526

P = 0.017

r = -0.403

P = 0.078

Excluding PP-C42

% Partial ORF fragments vs. % all ORFs annotated

r = -0.421

P = 0.073

r = -0.019

P = 0.939

r = 0.415

P = 0.078

Mean ORF length vs. % all ORFs annotated

r = 0.406

P = 0.084

r = 0.157

P = 0.520

r = -0.016

P = 0.949


Corrected using matched partial ORF sets


Pfam

COG

KEGG


Including PP-C42

% Partial ORFs fragments vs. % all ORFs annotated

r = -0.861

P < 0.001

r = -0.595

P = 0.007

r = 0.469

P = 0.050

Mean ORF length vs. % all ORFs annotated

r = 0.787

P < 0.001

r = 0.563

P = 0.012

r = -0.284

P = 0.253

Excluding PP-C42

% Partial ORF fragments vs. % all ORFs annotated

r = -0.338

P = 0.170

r = 0.027

P = 0.915

r = 0.350

P = 0.168

Mean ORF length vs. % all ORFs annotated

r = 0.378

P = 0.122

r = 0.155

P = 0.538

r = 0.052

P = 0.842


aPearson correlations between annotation frequency and genome quality, as represented by the percent of the predicted ORFs composed of partial sequences and mean ORF length. Complete genomes are excluded in all cases; including them has essentially no effect.

Klassen and Currie BMC Genomics 2012 13:14   doi:10.1186/1471-2164-13-14

Open Data