Steps involved in the construction of test and reference fingerprints. Two sets of abstracts containing symbols with known gene or non-gene meaning were constructed. One set consisted of abstracts with short-form/long-form combinations culled from Medline, the other set consisted of abstracts that were mentioned in OMIM annotations of genes. The two sets were merged by selecting symbols that occurred in both sets and had at least six abstracts for each of their gene senses. The OMIM annotations for the genes in the merged set were stored separately. A reference set was generated by randomly selecting five abstracts per gene sense from the merged set; the remaining abstracts were used for testing. All abstracts in the test and reference set as well as the OMIM annotations were indexed using the combined gene thesaurus, and the resulting "concept fingerprints" were used for reference fingerprint construction and testing of the disambiguation algorithm.
Schijvenaars et al. BMC Bioinformatics 2005 6:149 doi:10.1186/1471-2105-6-149