Log on / register
Feedback | Support | My details
Open AccessHighly AccessResearch article

Glycosylation site prediction using ensembles of Support Vector Machine classifiers

Cornelia Caragea1,2 email, Jivko Sinapov1,2 email, Adrian Silvescu1,2 email, Drena Dobbs3,4 email and Vasant Honavar1,2 email

1Artificial Intelligence Research Laboratory, Computer Science Department, Iowa State University, USA

2Center for Computational Intelligence, Learning, and Discovery, Iowa State University, USA

3Department of Genetics, Development and Cell Biology, Iowa State University, USA

4Bioinformatics and Computational Biology Program, Iowa State University, USA

author email corresponding author email

BMC Bioinformatics 2007, 8:438doi:10.1186/1471-2105-8-438

Published: 9 November 2007

Abstract

Background

Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences.

Results

We explore machine learning methods for training classifiers to predict the amino acid residues that are likely to be glycosylated using information derived from the target amino acid residue and its sequence neighbors. We compare the performance of Support Vector Machine classifiers and ensembles of Support Vector Machine classifiers trained on a dataset of experimentally determined N-linked, O-linked, and C-linked glycosylation sites extracted from O-GlycBase version 6.00, a database of 242 proteins from several different species. The results of our experiments show that the ensembles of Support Vector Machine classifiers outperform single Support Vector Machine classifiers on the problem of predicting glycosylation sites in terms of a range of standard measures for comparing the performance of classifiers. The resulting methods have been implemented in EnsembleGly, a web server for glycosylation site prediction.

Conclusion

Ensembles of Support Vector Machine classifiers offer an accurate and reliable approach to automated identification of putative glycosylation sites in glycoprotein sequences.


© 1999-2008 BioMed Central Ltd unless otherwise stated