Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Application of text-mining for updating protein post-translational modification annotation in UniProtKB

Anne-Lise Veuthey1*, Alan Bridge1, Julien Gobeill2, Patrick Ruch2, Johanna R McEntyre3, Lydie Bougueleret1 and Ioannis Xenarios145

Author Affiliations

1 Swiss-Prot group, SIB Swiss Institute of Bioinformatics, 1 Michel Servet, Geneva 4, 1211, Switzerland

2 BiTeM Group, Information Science Department, University of Applied Sciences, Carouge, 1227, Switzerland

3 EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK

4 Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Génopode, Lausanne, 1015, Switzerland

5 University of Lausanne, Lausanne, 1015, Switzerland

For all author emails, please log on.

BMC Bioinformatics 2013, 14:104  doi:10.1186/1471-2105-14-104

Published: 22 March 2013

Abstract

Background

The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB.

Results

The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments.

Conclusions

The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/ webcite.