Table 4 |
||||||||
| Annotation comparison methods | ||||||||
| A. thaliana | D. melanogaster | G. max | H. sapiens | |||||
| Reference annotations | TAIR9 | FlyBase 5.39 | NCBI Entrez | UCSC knownGene (hg19) | ||||
| Prediction annotations | TAIR10 | Ensembl r65 | JGI / Phytozome | Ensembl r65 | ||||
| Average runtime (sec) | Text | HTML | Text | HTML | Text | HTML | Text | HTML |
| n=1 | 36.3 | 859.4 | 91.1 | 1,350.5 | 85.3 | 1,461.1 | 294.3 | 6,422.0 |
| n=2 | 32.8 | 449.2 | 56.6 | 859.5 | 79.4 | 768.4 | 181.3 | 4,089.5 |
| n=4 | 30.7 | 246.5 | 39.2 | 633.7 | 76.5 | 439.9 | 130.1 | 2,751.2 |
| n=8 | 29.8 | 168.7 | 32.4 | 546.6 | 76.3 | 330.5 | 108.0 | 2,323.3 |
| Gene loci | 25,618 | 10,976 | 47,877 | 17,865 | ||||
| shared | 25,590 | 10,944 | 37,942 | 7,779 | ||||
| unique to reference | 6 | 32 | 3,363 | 9,569 | ||||
| unique to prediction | 22 | 0 | 6,572 | 517 | ||||
| Comparisons | 33,002 | 22,474 | 38,734 | 16,168 | ||||
| perfect matches | 31,750 | 96.2% | 22,446 | 99.9% | 2,489 | 6.4% | 2,517 | 15.6% |
| CDS structure matches | 420 | 1.3% | 0 | 0.0% | 17,450 | 45.1% | 8,269 | 51.1% |
| exon structure matches | 8 | 0.0% | 21 | 0.1% | 26 | 0.1% | 27 | 0.2% |
| UTR structure matches | 159 | 0.5% | 1 | 0.0% | 647 | 1.7% | 58 | 0.4% |
| non-matches | 665 | 2.0% | 6 | 0.0% | 18,122 | 46.8% | 5,297 | 32.8% |
As a demonstration of ParsEval’s speed and scalability, we obtained pairs of whole-genome annotations for Arabidopsis thaliana (thale cress), Drosophila melanogaster (fruit fly), Glycine max (soybean), and Homo sapiens (human) For each organism, we used ParsEval to compare the two corresponding sets of annotations. Runtimes are shown for both text and HTML/PNG output modes, using 1, 2, 4, and 8 processors. For each organism, we also show the number of gene loci identified, how many were shared between the two sets of annotations, and how many are unique to one set. Finally, we show the number of reported comparisons for each organismand how many were perfect gene structure matches, how many were CDS structure matches, and how many were non-matches. All of the results shown in this table were easily obtained from the summary reports generated by ParsEval.
Standage and Brendel BMC Bioinformatics 2012 13:187 doi:10.1186/1471-2105-13-187