Table 5

Example of an article containing multiple gene and specie mentions (PMC2680910)

PMCID2680910

Central Vote

Curated Outputa

System Raw Output Team

Gene ID

Gene names

Species

1

2

3

4

5

78

68

65

93

89


10015

ALIX

human

7

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

57630

POSH

human

7

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

155030

Gag

HIV-1

6

Y, C

Y, C

-

Y, C

Y, C

Y, C

-

-

Y, C

-

36990

POSH

Drosophila

Y

Y

Y

Y

Y

Y

Y

-

Y

Y

43330

ALIX

Drosophila

Y

Y

Y

Y

Y

Y

Y

-

-

Y

128866

CHMP4B

human

Y

Y

Y

Y

-

Y

-

Y

-

Y

39659

TAK-1

Drosophila

Y

Y

Y

Y

Y

-

Y

-

-

Y

3355106

ALG-2

Drosophila

Y

Y

Y

Y

Y

-

-

-

Y

-

7323

UbcH5c

human

Y

Y

Y

Y

-

-

Y

Y

-

-

1489984

p9

EIAV

Y

Y

Y

Y

-

-

-

-

-

-

137492

HCRP1

human

Y

Y

Y

Y

-

Y

-

Y

-

-

7251

TSG101

human

Y

Y

Y

Y

-

Y

-

Y

-

-

155030

p6

HIV-1

Y

-

Y

Y

-

-

-

-

-

-

7334

UBC13

human

1

Y

-

Y, C

Y

-

Y

Y

Y

-

-


Total genes detected

14

19

13

26

10

90

22

120

9

52

FP

0

5

0

0

3

81

15

113

4

46

FN

0

2

1

0

7

5

7

7

8

8

TP

14

12

13

14

7

9

7

7

5

6

Precision

1.00

0.71

1.00

1.00

0.70

0.10

0.32

0.06

0.56

0.12

Recall

1.00

0.86

0.93

1.00

0.50

0.64

0.50

0.50

0.38

0.43


List of Entrez Gene ID, gene name and species found in PMC2680910. The Central Vote column indicates the number of curators that selected the gene as central; “Y”: gene mentioned in the article is detected; “-”:gene mentioned was missed; “C”=indicates central gene as determined by majority vote, and in the systems it means that the gene was ranked high by the system (gene ranked higher than non central genes); “Total genes detected”: totality of gene mentions provided by a given system (what the system considered a gene). FP and FN stand for false positive and negative, respectively. aCurated output by manual curation (2 curators, 1-2) and system-assisted curation (5 curators, but 3 are shown, 3-5).

Arighi et al. BMC Bioinformatics 2011 12(Suppl 8):S4   doi:10.1186/1471-2105-12-S8-S4

Open Data