A method for analyzing censored survival phenotype with gene expression data
- Equal contributors
1 Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, USA
2 Department of Biostatistics, Genetics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, NC 27599, USA
3 Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan
4 Department of Statistics, University of California, Los Angeles, CA, 90095-1554, USA
BMC Bioinformatics 2008, 9:417 doi:10.1186/1471-2105-9-417Published: 6 October 2008
Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.
We propose a method for an integrated study of survival time and gene expression. This method can be summarized as a two-step procedure: in the first step, a moderate number of genes are pre-selected using correlation or liquid association (LA). Imputation and transformation methods are employed for the correlation/LA calculation. In the second step, the dimension of the predictors is further reduced using the modified sliced inverse regression for censored data (censorSIR).
The new method is tested via both simulated and real data. For the real data application, we employed a set of 295 breast cancer patients and found a linear combination of 22 gene expression profiles that are significantly correlated with patients' survival rate.
By an appropriate combination of feature selection and dimension reduction, we find a method of identifying gene expression signatures which is effective for survival prediction.