Table 4

Example of an article that presents name ambiguity between gene names, and between a gene name and a term from other domain (PMC2275796).

PMC2275796

Central Vote

Curated Outputa

System Raw Output Team

Gene ID

Gene names

Species

78

68

65

93

89


56606

GLUT9/SLC2A9

human

7

Y, C

Y, C

Y, C

Y, C

Y, C

Y, C

9948

WDR1/AIP1

human

Y

Y

Y

Y

Y

-


Some examples of ambiguity found in system’s output


11182

GLUT9/SLC2A6

human

N, C

N,C

N,C

CAD

N

N

MI

N

N

139741

MAGI2/AIP1

human

N

N

N


Total genes detected

2

6

4

44

4

15

Performance for total of genes in the article

FP

0

4

2

42

2

14

FN

0

0

0

0

0

1

TP

2

2

2

2

2

1

Precision

1

0.33

0.50

0.05

0.50

0.07

Recall

1

1

1

1

1

0.5


Total central genes

1

1

2

2

2

1

Performance for detecting central genesb

FP

0

0

1

1

1

0

FN

0

0

0

0

0

0

TP

1

1

1

1

1

1

Precision

1

1

0.50

0.50

0.50

1

Recall

1

1

1

1

1

1


List of Entrez Gene IDs, gene name and species found in PMC2275796. The Central Vote column indicates the number of curators that selected the gene as central; “Y”: gene mentioned in the article was detected; “-”:gene mentioned was missed; “N”: the entity detected was not a gene or a wrong gene; “C”=indicates central gene as determined by majority vote, and in the systems it means that the gene was ranked high (gene ranked higher than non central genes); “Total genes detected”: totality of gene mentions provided by a given system (what the system considered a gene). FP and FN stand for false positive and negative, respectively. aCurated output by manual curation (2 curators) and system-assisted curation (5 curators) was identical so it is shown as a single column. bThe FP for central gene performance was calculated by comparing the list of manually curated central genes with the gene ranking by the system. If any non-central gene is ranked higher than a central one it is considered a FP.

Arighi et al. BMC Bioinformatics 2011 12(Suppl 8):S4   doi:10.1186/1471-2105-12-S8-S4

Open Data