The NLM-WSD test set and some of its subsets. The 12 terms which Weeber et al.  described as "problematic" due to low levels of agreement between annotators are shown in italics. The test set used by Joshi et al.  comprises the set union of the terms used by Liu et al.  and Leroy and Rindflesch  while the "common subset" is formed from their intersection.
Stevenson et al. BMC Bioinformatics 2008 9(Suppl 11):S7 doi:10.1186/1471-2105-9-S11-S7