Open Access Highly Accessed Research article

Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures

Cheng Fan1, Aleix Prat12, Joel S Parker12, Yufeng Liu37, Lisa A Carey4, Melissa A Troester5 and Charles M Perou1267*

Author Affiliations

1 Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, USA

2 Department of Genetics, University of North Carolina, Chapel Hill, USA

3 Department of Statistics & Operations Research, University of North Carolina, Chapel Hill, USA

4 Department of Medicine, Division of Oncology, University of North Carolina, Chapel Hill, USA

5 Department of Epidemiology, University of North Carolina, Chapel Hill, USA

6 Department of Pathology & Laboratory Medicine, University of North Carolina, Chapel Hill, USA

7 Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, USA

For all author emails, please log on.

BMC Medical Genomics 2011, 4:3  doi:10.1186/1755-8794-4-3

Published: 9 January 2011



Multiple breast cancer gene expression profiles have been developed that appear to provide similar abilities to predict outcome and may outperform clinical-pathologic criteria; however, the extent to which seemingly disparate profiles provide additive prognostic information is not known, nor do we know whether prognostic profiles perform equally across clinically defined breast cancer subtypes. We evaluated whether combining the prognostic powers of standard breast cancer clinical variables with a large set of gene expression signatures could improve on our ability to predict patient outcomes.


Using clinical-pathological variables and a collection of 323 gene expression "modules", including 115 previously published signatures, we build multivariate Cox proportional hazards models using a dataset of 550 node-negative systemically untreated breast cancer patients. Models predictive of pathological complete response (pCR) to neoadjuvant chemotherapy were also built using this approach.


We identified statistically significant prognostic models for relapse-free survival (RFS) at 7 years for the entire population, and for the subgroups of patients with ER-positive, or Luminal tumors. Furthermore, we found that combined models that included both clinical and genomic parameters improved prognostication compared with models with either clinical or genomic variables alone. Finally, we were able to build statistically significant combined models for pathological complete response (pCR) predictions for the entire population.


Integration of gene expression signatures and clinical-pathological factors is an improved method over either variable type alone. Highly prognostic models could be created when using all patients, and for the subset of patients with lymph node-negative and ER-positive breast cancers. Other variables beyond gene expression and clinical-pathological variables, like gene mutation status or DNA copy number changes, will be needed to build robust prognostic models for ER-negative breast cancer patients. This combined clinical and genomics model approach can also be used to build predictors of therapy responsiveness, and could ultimately be applied to other tumor types.