Open Access Research article

Consistent metagenes from cancer expression profiles yield agent specific predictors of chemotherapy response

Qiyuan Li12, Aron C Eklund1, Nicolai J Birkbak1, Christine Desmedt3, Benjamin Haibe-Kains4, Christos Sotiriou3, W Fraser Symmans5, Lajos Pusztai6, Søren Brunak1, Andrea L Richardson7* and Zoltan Szallasi18*

Author Affiliations

1 Center for Biological Sequence Analysis, Department of Systems Biolology, Technical University of Denmark, 2800 Lyngby, Denmark

2 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA

3 Medical Oncology Department, Jules Bordet Institute, Brussels, 1000, Belgium

4 Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02115, USA

5 Department of Pathology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA

6 Department of Breast Medical Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA

7 Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA

8 Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology (CHIP@HST), Harvard Medical School, Boston, MA 02115, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:310  doi:10.1186/1471-2105-12-310

Published: 28 July 2011



Genome scale expression profiling of human tumor samples is likely to yield improved cancer treatment decisions. However, identification of clinically predictive or prognostic classifiers can be challenging when a large number of genes are measured in a small number of tumors.


We describe an unsupervised method to extract robust, consistent metagenes from multiple analogous data sets. We applied this method to expression profiles from five "double negative breast cancer" (DNBC) (not expressing ESR1 or HER2) cohorts and derived four metagenes. We assessed these metagenes in four similar but independent cohorts and found strong associations between three of the metagenes and agent-specific response to neoadjuvant therapy. Furthermore, we applied the method to ovarian and early stage lung cancer, two tumor types that lack reliable predictors of outcome, and found that the metagenes yield predictors of survival for both.


These results suggest that the use of multiple data sets to derive potential biomarkers can filter out data set-specific noise and can increase the efficiency in identifying clinically accurate biomarkers.