Table 4

Annotation comparison methods
A. thaliana D. melanogaster G. max H. sapiens
Reference annotations TAIR9 FlyBase 5.39 NCBI Entrez UCSC knownGene (hg19)
Prediction annotations TAIR10 Ensembl r65 JGI / Phytozome Ensembl r65
Average runtime (sec) Text HTML Text HTML Text HTML Text HTML
n=1 36.3 859.4 91.1 1,350.5 85.3 1,461.1 294.3 6,422.0
n=2 32.8 449.2 56.6 859.5 79.4 768.4 181.3 4,089.5
n=4 30.7 246.5 39.2 633.7 76.5 439.9 130.1 2,751.2
n=8 29.8 168.7 32.4 546.6 76.3 330.5 108.0 2,323.3
Gene loci 25,618 10,976 47,877 17,865
shared 25,590 10,944 37,942 7,779
unique to reference 6 32 3,363 9,569
unique to prediction 22 0 6,572 517
Comparisons 33,002 22,474 38,734 16,168
perfect matches 31,750 96.2% 22,446 99.9% 2,489 6.4% 2,517 15.6%
CDS structure matches 420 1.3% 0 0.0% 17,450 45.1% 8,269 51.1%
exon structure matches 8 0.0% 21 0.1% 26 0.1% 27 0.2%
UTR structure matches 159 0.5% 1 0.0% 647 1.7% 58 0.4%
non-matches 665 2.0% 6 0.0% 18,122 46.8% 5,297 32.8%

As a demonstration of ParsEval’s speed and scalability, we obtained pairs of whole-genome annotations for Arabidopsis thaliana (thale cress), Drosophila melanogaster (fruit fly), Glycine max (soybean), and Homo sapiens (human) For each organism, we used ParsEval to compare the two corresponding sets of annotations. Runtimes are shown for both text and HTML/PNG output modes, using 1, 2, 4, and 8 processors. For each organism, we also show the number of gene loci identified, how many were shared between the two sets of annotations, and how many are unique to one set. Finally, we show the number of reported comparisons for each organismand how many were perfect gene structure matches, how many were CDS structure matches, and how many were non-matches. All of the results shown in this table were easily obtained from the summary reports generated by ParsEval.

Standage and Brendel

Standage and Brendel BMC Bioinformatics 2012 13:187   doi:10.1186/1471-2105-13-187

Open Data