Table 2

Gene model accuracy using unmatched species parameters

Reference

Organism

Performance

Category

Ab Initio Predictions

MAKER Annotations


Augustus

GeneMark

SNAP

Augustus

GeneMark

SNAP


A. thaliana

Nucleotide Accuracy

57.85%

48.62%

43.84%

68.56%

57.96%

73.77%

Exon Accuracy

30.71%

16.51%

18.58%

53.31%

28.87%

60.11%

D. melanogaster

Nucleotide Accuracy

67.47%

66.51%

48.92%

73.78%

72.83%

74.44%

Exon Accuracy

30.62%

26.25%

19.94%

43.10%

39.74%

53.69%

C. elegans

Nucleotide Accuracy

66.18%

67.26%

68.24%

74.32%

71.92%

85.02%

Exon Accuracy

28.33%

30.01%

35.44%

38.52%

39.42%

63.14%


The effect of limited/insufficient training data on ab initio gene prediction is simulated by providing the algorithms Augustus, GeneMark, and SNAP with incorrect species parameters files (the A. thaliana species parameters were used to produce gene models for C. elegans and D. melanogaster, and the C. elegans parameters were used to produce gene models in A. thaliana). In comparison, the same predictors, when ran as part of the MAKER2 gene annotation pipeline, perform substantially better, even with the same incorrect species parameter files.

Holt and Yandell BMC Bioinformatics 2011 12:491   doi:10.1186/1471-2105-12-491

Open Data