Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the International Conference on Intelligent Biology and Medicine (ICIBM 2013): Genomics

Open Access Research

Molecular pathway identification using biological network-regularized logistic models

Wen Zhang12, Ying-wooi Wan123, Genevera I Allen24, Kaifang Pang12, Matthew L Anderson36 and Zhandong Liu1256*

Author Affiliations

1 Department of Pediatrics-Neurology, Baylor College of Medicine, Houston, TX, USA

2 Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX, USA

3 Department of Obstetrics and Gynecology, Baylor College of Medicine, Houston, TX, USA

4 Department of Statistics and Electrical Engineering, Rice University, Houston, TX, USA

5 Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA

6 Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA

For all author emails, please log on.

BMC Genomics 2013, 14(Suppl 8):S7  doi:10.1186/1471-2164-14-S8-S7

Published: 9 December 2013

Abstract

Background

Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L1-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature.

Results

We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet webcite.

Conclusion

Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.