Consistent metagenes from cancer expression profiles yield agent specific predictors of chemotherapy response
- Equal contributors
1 Center for Biological Sequence Analysis, Department of Systems Biolology, Technical University of Denmark, 2800 Lyngby, Denmark
2 Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02115, USA
3 Medical Oncology Department, Jules Bordet Institute, Brussels, 1000, Belgium
4 Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02115, USA
5 Department of Pathology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA
6 Department of Breast Medical Oncology, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA
7 Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
8 Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology (CHIP@HST), Harvard Medical School, Boston, MA 02115, USA
BMC Bioinformatics 2011, 12:310 doi:10.1186/1471-2105-12-310Published: 28 July 2011
Genome scale expression profiling of human tumor samples is likely to yield improved cancer treatment decisions. However, identification of clinically predictive or prognostic classifiers can be challenging when a large number of genes are measured in a small number of tumors.
We describe an unsupervised method to extract robust, consistent metagenes from multiple analogous data sets. We applied this method to expression profiles from five "double negative breast cancer" (DNBC) (not expressing ESR1 or HER2) cohorts and derived four metagenes. We assessed these metagenes in four similar but independent cohorts and found strong associations between three of the metagenes and agent-specific response to neoadjuvant therapy. Furthermore, we applied the method to ovarian and early stage lung cancer, two tumor types that lack reliable predictors of outcome, and found that the metagenes yield predictors of survival for both.
These results suggest that the use of multiple data sets to derive potential biomarkers can filter out data set-specific noise and can increase the efficiency in identifying clinically accurate biomarkers.