Detecting disease associated modules and prioritizing active genes based on high throughput data
-
* Corresponding authors: Xiang-Sun Zhang zxs@amt.ac.cn - Luonan Chen chen@eic.osaka-sandai.ac.jp
- Equal contributors
1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, PR China
2 Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, PR China
3 Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for Pre-diabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yue-Yang Road, Shanghai, 200031, PR China
4 Department of Electrical Engineering and Electronics, Osaka Sangyo University, Osaka, 574-8530, Japan
BMC Bioinformatics 2010, 11:26 doi:10.1186/1471-2105-11-26
Published: 13 January 2010Abstract
Background
The accumulation of high-throughput data greatly promotes computational investigation of gene function in the context of complex biological systems. However, a biological function is not simply controlled by an individual gene since genes function in a cooperative manner to achieve biological processes. In the study of human diseases, rather than to discover disease related genes, identifying disease associated pathways and modules becomes an essential problem in the field of systems biology.
Results
In this paper, we propose a novel method to detect disease related gene modules or dysfunctional pathways based on global characteristics of interactome coupled with gene expression data. Specifically, we exploit interacting relationships between genes to define a gene's active score function based on the kernel trick, which can represent nonlinear effects of gene cooperativity. Then, modules or pathways are inferred based on the active scores evaluated by the support vector regression in a global and integrative manner. The efficiency and robustness of the proposed method are comprehensively validated by using both simulated and real data with the comparison to existing methods.
Conclusions
By applying the proposed method to two cancer related problems, i.e. breast cancer and prostate cancer, we successfully identified active modules or dysfunctional pathways related to these two types of cancers with literature confirmed evidences. We show that this network-based method is highly efficient and can be applied to a large-scale problem especially for human disease related modules or pathway extraction. Moreover, this method can also be used for prioritizing genes associated with a specific phenotype or disease.