Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Multiconstrained gene clustering based on generalized projections

Jia Zeng12*, Shanfeng Zhu34, Alan Wee-Chung Liew5 and Hong Yan67

Author Affiliations

1 School of Computer Science and Technology, Soochow University, Suzhou 215006, China

2 Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong

3 School of Computer Science and Technology, Fudan University, Shanghai 200433, China

4 Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China

5 School of Information and Communication Technology, Griffith University, Gold Coast Campus, QLD 4222, Queensland, Australia

6 Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong

7 School of Electronic and Information Engineering, University of Sydney, NSW 2006, Australia

For all author emails, please log on.

BMC Bioinformatics 2010, 11:164  doi:10.1186/1471-2105-11-164

Published: 31 March 2010

Abstract

Background

Gene clustering for annotating gene functions is one of the fundamental issues in bioinformatics. The best clustering solution is often regularized by multiple constraints such as gene expressions, Gene Ontology (GO) annotations and gene network structures. How to integrate multiple pieces of constraints for an optimal clustering solution still remains an unsolved problem.

Results

We propose a novel multiconstrained gene clustering (MGC) method within the generalized projection onto convex sets (POCS) framework used widely in image reconstruction. Each constraint is formulated as a corresponding set. The generalized projector iteratively projects the clustering solution onto these sets in order to find a consistent solution included in the intersection set that satisfies all constraints. Compared with previous MGC methods, POCS can integrate multiple constraints from different nature without distorting the original constraints. To evaluate the clustering solution, we also propose a new performance measure referred to as Gene Log Likelihood (GLL) that considers genes having more than one function and hence in more than one cluster. Comparative experimental results show that our POCS-based gene clustering method outperforms current state-of-the-art MGC methods.

Conclusions

The POCS-based MGC method can successfully combine multiple constraints from different nature for gene clustering. Also, the proposed GLL is an effective performance measure for the soft clustering solutions.