Table 5

Stacking and other model combination techniques

Model

F1


UMass

54.7

UMassā†Stanford

55.8

Model

Alone

Intersection with UMass

Union with UMass


Stanford (1N)

49.9

49.0

54.7

Stanford (1P)

49.0

48.3

54.6

Stanford (2N)

46.5

45.4

54.8

Stanford (2P)

49.5

49.1

54.4

Stanford (all)

--

42.4

53.0

Stanford (1N, reranked)

50.2

49.7

54.4

Stanford (1P, reranked)

49.4

50.2

53.2

Stanford (2N, reranked)

47.8

46.9

54.6

Stanford (2P, reranked)

50.4

50.0

54.4

Stanford (all, reranked)

50.7

50.0

54.7

Model

Intersection

Union


Stanford (all)

43.9

50.2


Stacking and reranking outperform the intersection and union model combination baselines. The first section of the table summarizes the results from the UMass and stacked models. The second section gives the performance of each Stanford model alone and when combined with the pure UMass model via the intersection and union methods. In the last section, we evaluate the intersection and union baselines using only the four Stanford models as inputs. The "Stanford (all)" line represents using all four individual decoders without model combination (hence the Alone column in the second table is left empty--it cannot be evaluated since it isn't a single set of outputs). In "Stanford (all, reranked)", the reranker was used to combine the four decoders into a single output before being intersected or unioned. All results are on the development set for the Genia track.

McClosky et al. BMC Bioinformatics 2012 13(Suppl 11):S9   doi:10.1186/1471-2105-13-S11-S9

Open Data