Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: NIPS workshop on New Problems and Methods in Computational Biology

Open Access Proceedings

Protein Ranking by Semi-Supervised Network Propagation

Jason Weston1*, Rui Kuang24, Christina Leslie2 and William Stafford Noble3

Author Affiliations

1 NEC LABS AMERICA, 4 Independence Way, Princeton, NJ, USA

2 Center for Computational Learning Systems, Columbia University, Interchurch Center, 475 Riverside Dr., New York, USA

3 Department of Genome Sciences, Department of Computer Science and Engineering, University of Washington, 1705 NE Pacific Street, Seattle, WA, USA

4 Department of Computer Science, Columbia University, 1214 Amsterdam Avenue, New York, NY, USA

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 1):S10  doi:10.1186/1471-2105-7-S1-S10

Published: 20 March 2006

Abstract

Background

Biologists regularly search DNA or protein databases for sequences that share an evolutionary or functional relationship with a given query sequence. Traditional search methods, such as BLAST and PSI-BLAST, focus on detecting statistically significant pairwise sequence alignments and often miss more subtle sequence similarity. Recent work in the machine learning community has shown that exploiting the global structure of the network defined by these pairwise similarities can help detect more remote relationships than a purely local measure.

Methods

We review RankProp, a ranking algorithm that exploits the global network structure of similarity relationships among proteins in a database by performing a diffusion operation on a protein similarity network with weighted edges. The original RankProp algorithm is unsupervised. Here, we describe a semi-supervised version of the algorithm that uses labeled examples. Three possible ways of incorporating label information are considered: (i) as a validation set for model selection, (ii) to learn a new network, by choosing which transfer function to use for a given query, and (iii) to estimate edge weights, which measure the probability of inferring structural similarity.

Results

Benchmarked on a human-curated database of protein structures, the original RankProp algorithm provides significant improvement over local network search algorithms such as PSI-BLAST. Furthermore, we show here that labeled data can be used to learn a network without any need for estimating parameters of the transfer function, and that diffusion on this learned network produces better results than the original RankProp algorithm with a fixed network.

Conclusion

In order to gain maximal information from a network, labeled and unlabeled data should be used to extract both local and global structure.