This article is part of the supplement: Proceedings of the Great Lakes Bioinformatics Conference 2012
Identifying stage-specific protein subnetworks for colorectal cancer
1 Department of Electrical Engineering & Computer Science, Case Western Reserve University, Cleveland, OH, USA
2 School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
3 Case Center for Proteomics & Bioinformatics, Case Western Reserve University, Cleveland, OH, USA
4 Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
5 Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, OH, USA
6 Department of Genetics, Case Western Reserve University, Cleveland, OH, USA
BMC Proceedings 2012, 6(Suppl 7):S1 doi:10.1186/1753-6561-6-S7-S1Published: 13 November 2012
In recent years, many algorithms have been developed for network-based analysis of differential gene expression in complex diseases. These algorithms use protein-protein interaction (PPI) networks as an integrative framework and identify subnetworks that are coordinately dysregulated in the phenotype of interest.
While such dysregulated subnetworks have demonstrated significant improvement over individual gene markers for classifying phenotype, the current state-of-the-art in dysregulated subnetwork discovery is almost exclusively limited to binary phenotype classes. However, many clinical applications require identification of molecular markers for multiple classes.
We consider the problem of discovering groups of genes whose expression signatures can discriminate multiple phenotype classes. We consider two alternate formulations of this problem (i) an all-vs-all approach that aims to discover subnetworks distinguishing all classes, (ii) a one-vs-all approach that aims to discover subnetworks distinguishing each class from the rest of the classes. For the one-vs-all formulation, we develop a set-cover based algorithm, which aims to identify groups of genes such that at least one gene in the group exhibits differential expression in the target class.
We test the proposed algorithms in the context of predicting stages of colorectal cancer. Our results show that the set-cover based algorithm identifying "stage-specific" subnetworks outperforms the all-vs-all approaches in classification. We also investigate the merits of utilizing PPI networks in the search for multiple markers, and show that, with correct parameter settings, network-guided search improves performance. Furthermore, we show that assessing statistical significance when selecting features greatly improves classification performance.