Identification of genes associated with multiple cancers via integrative analysis
1 School of Public Health, Yale University, New Haven, CT 06520, USA
2 Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA
3 Department of Therapeutic Radiology, Yale University, New Haven, CT 06520, USA
BMC Genomics 2009, 10:535 doi:10.1186/1471-2164-10-535Published: 17 November 2009
Advancement in gene profiling techniques makes it possible to measure expressions of thousands of genes and identify genes associated with development and progression of cancer. The identified cancer-associated genes can be used for diagnosis, prognosis prediction, and treatment selection. Most existing cancer microarray studies have been focusing on the identification of genes associated with a specific type of cancer. Recent biomedical studies suggest that different cancers may share common susceptibility genes. A comprehensive description of the associations between genes and cancers requires identification of not only multiple genes associated with a specific type of cancer but also genes associated with multiple cancers.
In this article, we propose the Mc.TGD (Multi-cancer Threshold Gradient Descent), an integrative analysis approach capable of analyzing multiple microarray studies on different cancers. The Mc.TGD is the first regularized approach to conduct "two-dimensional" selection of genes with joint effects on cancer development. Simulation studies show that the Mc.TGD can more accurately identify genes associated with multiple cancers than meta analysis based on "one-dimensional" methods. As a byproduct, identification accuracy of genes associated with only one type of cancer may also be improved. We use the Mc.TGD to analyze seven microarray studies investigating development of seven different types of cancers. We identify one gene associated with six types of cancers and four genes associated with five types of cancers. In addition, we also identify 11, 9, 18, and 17 genes associated with 4 to 1 types of cancers, respectively. We evaluate prediction performance using a Leave-One-Out cross validation approach and find that only 4 (out of 570) subjects cannot be properly predicted.
The Mc.TGD can identify a short list of genes associated with one or multiple types of cancers. The identified genes are considerably different from those identified using meta analysis or analysis of marginal effects.