Table 2

Specificity, sensitivity and precision estimates for different gene finders in E. coli.

Data set

EasyGene

Glim

rbs-Glim

Orpheus

Gm24

GmS

Gmhmm

Frame


A'-% found

98.4

98.9/98.9

98.9

98.0/95.3

91.5

97.2

98.1

97.0

A'-% exact

93.8

98.9/95.3

84.1

95.1/92.4

41.6

88.0

85.7

93.2

B'-% found

98.4

98.5/98.6

98.6

95.9/96.5

90.2

96.6

97.2

96.4

T-% found

98.1(98.0)

98.3/98.4

98.4

96.5/95.6

89.8

96.3

97.1

96.1

Genome

4145

6827/5756

5756

9333/7543

3552

4064

4230

4064


zero order

7

169/211

211

6761/5430

6

153

1459

0

first order

7

545/723

723

6836/4804

13

241

830

0

third order

1

2423/2694

2694

6582/4817

43

659

866

1

shadows

0

19/21

21

22/9

1

0

2

0


Upper part shows the percentage of genes found exactly (both 5' and 3' end) and partially (only 3' end exact) for different gene finders and sets of high confidence genes in E. coli. For Glimmer and Orpheus, the numbers before the "/" are based exclusively on their ORF scores and recommended threshold whereas the numbers after the "/" are based on their post-processing procedures. The number of genes predicted in the whole genome is also shown. This should be compared to the 4288 annotated genes in E. coli. The lower part of the table shows the number of false positives predicted in random sequences generated by Markov chains of order 0, 1 and 3 and the very last row shows the number of false predictions in the shadows of the high-confidence genes in data set A. All values listed for EasyGene are based on an R-value threshold of R = 2.

Larsen and Krogh BMC Bioinformatics 2003 4:21   doi:10.1186/1471-2105-4-21

Open Data