Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Knowledge-guided gene ranking by coordinative component analysis

Chen Wang1, Jianhua Xuan1*, Huai Li2, Yue Wang1, Ming Zhan2, Eric P Hoffman3 and Robert Clarke4

Author affiliations

1 Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA, USA

2 Bioinformatics Unit, Research Resources Branch, National Institute on Aging, NIH, Baltimore, MD, USA

3 Research Center for Genetic Medicine, Children's National Medical Center, Washington, DC, USA

4 Departments of Oncology and Physiology & Biophysics, Georgetown University School of Medicine, Washington, DC, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2010, 11:162  doi:10.1186/1471-2105-11-162

Published: 30 March 2010

Abstract

Background

In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data.

Results

To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers.

Conclusion

We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.