This article is part of the supplement: The 2010 International Conference on Bioinformatics and Computational Biology (BIOCOMP 2010): Genomics

Open Access Research article

Maximum predictive power of the microarray-based models for clinical outcomes is limited by correlation between endpoint and gene expression profile

Chen Zhao1, Leming Shi2, Weida Tong2, John D Shaughnessy3, André Oberthuer4, Lajos Pusztai5, Youping Deng6, W Fraser Symmans5 and Tieliu Shi1*

Author affiliations

1 The Center for Bioinformatics and The institute of Biomedical Sciences, School of Life Sciences, East China Normal University, Shanghai, 200241, China

2 National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, Arkansas 72079, USA

3 Laboratory of Myeloma Genetics, Myeloma Institute for Research and Therapy, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA

4 MD, Children's Hospital, Department of Pediatric Oncology and Hematology, University of Cologne, Kerpener Strasse 62, D-50924 Cologne, Germany

5 Department of Breast Medical Oncology and Department of Pathology, The University of Texas M.D. Anderson Cancer Center, Unit 1354, PO Box 301439, Houston, TX 77230-1439, USA

6 Rush University Cancer Center, Department of Internal Medicine, Rush University Medical Center, Chicago, IL 60612, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2011, 12(Suppl 5):S3  doi:10.1186/1471-2164-12-S5-S3

Published: 23 December 2011



Microarray data have been used for gene signature selection to predict clinical outcomes. Many studies have attempted to identify factors that affect models' performance with only little success. Fine-tuning of model parameters and optimizing each step of the modeling process often results in over-fitting problems without improving performance.


We propose a quantitative measurement, termed consistency degree, to detect the correlation between disease endpoint and gene expression profile. Different endpoints were shown to have different consistency degrees to gene expression profiles. The validity of this measurement to estimate the consistency was tested with significance at a p-value less than 2.2e-16 for all of the studied endpoints. According to the consistency degree score, overall survival milestone outcome of multiple myeloma was proposed to extend from 730 days to 1561 days, which is more consistent with gene expression profile.


For various clinical endpoints, the maximum predictive powers of different microarray-based models are limited by the correlation between endpoint and gene expression profile of disease samples as indicated by the consistency degree score. In addition, previous defined clinical outcomes can also be reassessed and refined more coherent according to related disease gene expression profile. Our findings point to an entirely new direction for assessing the microarray-based predictive models and provide important information to gene signature based clinical applications.