Table 5

MisPred analysis of human genes predicted by the EnsEMBL and NCBI's GNOMON pipelines

EnsEMBL


Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

2772

147

5.3%

23

15.65%

ND

ND

ND

ND

Monodelphis domestica

10519

680

6.46%

137

20.15%

ND

ND

ND

ND

Gallus gallus

6139

345

5.62%

113

32.75%

ND

ND

ND

ND

Danio rerio

10289

860

8.36%

317

36.86%

ND

ND

ND

ND


Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

2772

1

0.04%

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

10519

10

0.1%

0

0.00%

0

0.00%

0

0.00%

Gallus gallus

6139

2

0.03%

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

20

0.19%

5

25%

4

20%

1

5%


Conflict 3

Number of proteins

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*



Homo sapiens

2772

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

10519

0

0.00%

0

0.00%

0

0.00%

Gallus gallus

6139

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

0

0.00%

0

0.00%

0

0.00%


Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

2772

722

26.05%

48

6.65%

ND

ND

ND

ND

Monodelphis domestica

10519

3726

35.42%

119

3.19%

ND

ND

ND

ND

Gallus gallus

6139

1640

26.72%

159

9.70%

ND

ND

ND

ND

Danio rerio

10289

2565

24.93%

197

7.68%

ND

ND

ND

ND


Conflict 5

Number of proteins

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*



Homo sapiens

2772

0

0.00%

0

0.00%

0

0.00%

Danio rerio

10289

0

0.00%

0

0.00%

0

0.00%


NCBI/GNOMON


Conflict 1

Number of proteins

Identified as containing an extracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

3012

139

4.61%

32

23.02%

ND

ND

ND

ND

Monodelphis domestica

9703

642

6.62%

112

17.45%

ND

ND

ND

ND

Gallus gallus

5604

310

5.53%

88

28.39%

ND

ND

ND

ND

Danio rerio

8905

742

8.33%

158

21.29%

ND

ND

ND

ND


Conflict 2

Number of proteins

Identified as containing an extra- and an intracellular domain

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

3012

2

0.07%

0

0%

0

0.00%

0

0.00%

Monodelphis domestica

9703

17

0.18%

4

23.53%

2

11.76%

2

11.76%

Gallus gallus

5604

3

0.05%

1

33.33%

1

33.33%

0

0.00%

Danio rerio

8905

16

0.18%

6

37.5%

4

25%

2

12.5%


Conflict 3

Number of proteins

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*



Homo sapiens

3012

0

0.00%

0

0.00%

0

0.00%

Monodelphis domestica

9703

1

0.01%

0

0.00%

1

0.01%

Gallus gallus

5604

0

0.00%

0

0.00%

0

0.00%

Danio rerio

8905

2

0.02%

1

0.01%

1

0.01%


Conflict 4

Number of proteins

Proteins containing domains suitable for the study of domain integrity

Percentage

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*


Homo sapiens

3012

792

26.3%

41

5.18%

ND

ND

ND

ND

Monodelphis domestica

9703

3420

35.25%

39

1.14%

ND

ND

ND

ND

Gallus gallus

5604

1500

26.77%

208

13.87%

ND

ND

ND

ND

Danio rerio

8905

2059

23.12%

300

14.57%

ND

ND

ND

ND


Conflict 5

Number of proteins

Identified as suspicious by MisPred

Percentage*

False Positives

Percentage*

True errors

Percentage*



Homo sapiens

3012

1

0.03%

0

0.00%

1

0.03%

Danio rerio

8905

5

0.06%

5

0.06%

0

0.00%


*Values for suspicious, false positive and true positive sequences are expressed as percentage of the proteins relevant for the given conflict.

ND – not determined.

The data refer to human genes for which both gene prediction pipelines generated at least one gene model.

Nagy et al. BMC Bioinformatics 2008 9:353   doi:10.1186/1471-2105-9-353

Open Data