This article is part of the supplement: Workshop on Advances in Bio Text Mining
Species identification for gene name normalization
-
* Corresponding author: Illés Solt solt@informatik.hu-berlin.de
1 Knowledge Management in Bioinformatics, Institute for Computer Science, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany
2 Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, H-1117 Budapest, Magyar Tudósok krt 2., Hungary
BMC Bioinformatics 2010, 11(Suppl 5):P5 doi:10.1186/1471-2105-11-S5-P5
Published: 6 October 2010First paragraph (this article has no abstract)
Protein interaction networks are expensive to construct experimentally. Therefore, researchers usually refer to the literature or domain-specific databases to convey knowledge on currently known interactions. Yet the task of manual collection of knowledge from scientific papers is labor intensive, and therefore should be automated to the extent possible. For this, an important step is identifying gene and protein names (termed entities). After identification, gene names must be mapped to database identifiers to connect them to structured knowledge. One particular problem in this step are homonymous, i.e., identical names referring to different genes in different species.