Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A method for analyzing censored survival phenotype with gene expression data

Tongtong Wu1, Wei Sun2, Shinsheng Yuan3, Chun-Houh Chen3 and Ker-Chau Li34*

Author Affiliations

1 Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, USA

2 Department of Biostatistics, Genetics, Carolina Center for Genome Science, University of North Carolina, Chapel Hill, NC 27599, USA

3 Institute of Statistical Science, Academia Sinica, Taipei 115, Taiwan

4 Department of Statistics, University of California, Los Angeles, CA, 90095-1554, USA

For all author emails, please log on.

BMC Bioinformatics 2008, 9:417  doi:10.1186/1471-2105-9-417

Published: 6 October 2008

Abstract

Background

Survival time is an important clinical trait for many disease studies. Previous works have shown certain relationship between patients' gene expression profiles and survival time. However, due to the censoring effects of survival time and the high dimensionality of gene expression data, effective and unbiased selection of a gene expression signature to predict survival probabilities requires further study.

Method

We propose a method for an integrated study of survival time and gene expression. This method can be summarized as a two-step procedure: in the first step, a moderate number of genes are pre-selected using correlation or liquid association (LA). Imputation and transformation methods are employed for the correlation/LA calculation. In the second step, the dimension of the predictors is further reduced using the modified sliced inverse regression for censored data (censorSIR).

Results

The new method is tested via both simulated and real data. For the real data application, we employed a set of 295 breast cancer patients and found a linear combination of 22 gene expression profiles that are significantly correlated with patients' survival rate.

Conclusion

By an appropriate combination of feature selection and dimension reduction, we find a method of identifying gene expression signatures which is effective for survival prediction.