Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Proceedings of the 2012 International Conference on Intelligent Computing (ICIC 2012)

Open Access Proceedings

Protein localization prediction using random walks on graphs

Xiaohua Xu*, Lin Lu, Ping He and Ling Chen

Author Affiliations

Department of Computer Science, Yangzhou University, Yangzhou 225009, China

For all author emails, please log on.

BMC Bioinformatics 2013, 14(Suppl 8):S4  doi:10.1186/1471-2105-14-S8-S4

Published: 9 May 2013

Abstract

Background

Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph.

Results

We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria.

Conclusions

This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers.