Additive risk survival model with microarray data
-
* Corresponding author: Shuangge Ma shuangge.ma@yale.edu
1 Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA
2 Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA
BMC Bioinformatics 2007, 8:192 doi:10.1186/1471-2105-8-192
Published: 8 June 2007Abstract
Background
Microarray techniques survey gene expressions on a global scale. Extensive biomedical studies have been designed to discover subsets of genes that are associated with survival risks for diseases such as lymphoma and construct predictive models using those selected genes. In this article, we investigate simultaneous estimation and gene selection with right censored survival data and high dimensional gene expression measurements.
Results
We model the survival time using the additive risk model, which provides a useful alternative to the proportional hazards model and is adopted when the absolute effects, instead of the relative effects, of multiple predictors on the hazard function are of interest. A Lasso (least absolute shrinkage and selection operator) type estimate is proposed for simultaneous estimation and gene selection. Tuning parameter is selected using the V-fold cross validation. We propose Leave-One-Out cross validation based methods for evaluating the relative stability of individual genes and overall prediction significance.
Conclusion
We analyze the MCL and DLBCL data using the proposed approach. A small number of probes represented on the microarrays are identified, most of which have sound biological implications in lymphoma development. The selected probes are relatively stable and the proposed approach has overall satisfactory prediction power.