Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

AM Cohen*, WR Hersh, C Dubay and K Spackman

Author affiliations

Department of Medical Informatics and Clinical Epidemiology School of Medicine Oregon Health & Science University 3181 S.W. Sam Jackson Park Road, Mail Code: BICC Portland, Oregon, 97239-3098, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2005, 6:103  doi:10.1186/1471-2105-6-103

Published: 22 April 2005

Abstract

Background

Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.

Results

Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs.

Conclusion

The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.