Table 1

Overall results

UMass

Stanford

FAUST

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1


GE

Task 1

48.5

64.1

55.2

42.4

61.1

50.0

49.4

64.8

56.0

GE

Task 2

43.9

60.9

51.0

--

--

--

46.7

63.8

53.9


EPI

FULL

28.1

41.6

33.5

26.6

37.9

31.2

28.9

44.5

35.0

EPI

CORE

57.0

73.3

64.2

56.9

70.2

62.8

59.9

80.3

68.6


ID

FULL

46.9

62.0

53.4

46.3

55.9

50.6

48.0

66.0

55.6

ID

CORE

49.7

62.4

55.3

49.2

56.4

52.5

50.8

66.4

57.6

FAUST (without novel)

Recall

Precision

F1


GE

Task 1

47.6

69.7

56.6


Results on test sets of all tasks we submitted to, for three models. We list recall, precision, and F1 using the standard BioNLP approximate recursive metric. For the GE and ID datasets, the Stanford model used all four decoders with the reranker. For EPI, the Stanford model used only the 1N decoder with the reranker. In all three domains, the stacked UMassā†Stanford model (FAUST) used all four decoders from the Stanford model as inputs. The "FAUST (without novel)" is created by removing all events which don't occur in either the UMass or Stanford models (i.e., events which are novel to the stacked output).

McClosky et al. BMC Bioinformatics 2012 13(Suppl 11):S9   doi:10.1186/1471-2105-13-S11-S9

Open Data