The molecular portraits of breast tumors are conserved across microarray platforms
1 Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC 27599, USA
2 Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
3 Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC 27599, USA
4 Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
5 Department of Pathology and Laboratory Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
6 Department of Medicine, University of North Carolina, Chapel Hill, NC 27599, USA
7 Constella Health Sciences, 2605 Meridian Parkway, Durham, NC 27713, USA
8 Section of Hematology/Oncology, Department of Medicine, Committees on Genetics and Cancer Biology, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637-1463, USA
9 Department of Pathology, Thomas Jefferson University, 132 South 10th Street Philadelphia, PA 19107, USA
10 The ARUP Institute for Clinical and Experimental Pathology, 500 Chipeta Way, Salt Lake City, Utah 84108, USA
11 Department of Surgery, University of Utah School of Medicine, 30 N 1900 E, Salt Lake City, Utah 84132, USA
12 Department of Pathology, University of Utah School of Medicine, 30 N 1900 E, Salt Lake City, Utah 84132, USA
13 Department of Medicine, Division of Oncology, Washington University School of Medicine and Siteman Cancer Center, St Louis, Missouri, USA
BMC Genomics 2006, 7:96 doi:10.1186/1471-2164-7-96Published: 27 April 2006
Validation of a novel gene expression signature in independent data sets is a critical step in the development of a clinically useful test for cancer patient risk-stratification. However, validation is often unconvincing because the size of the test set is typically small. To overcome this problem we used publicly available breast cancer gene expression data sets and a novel approach to data fusion, in order to validate a new breast tumor intrinsic list.
A 105-tumor training set containing 26 sample pairs was used to derive a new breast tumor intrinsic gene list. This intrinsic list contained 1300 genes and a proliferation signature that was not present in previous breast intrinsic gene sets. We tested this list as a survival predictor on a data set of 311 tumors compiled from three independent microarray studies that were fused into a single data set using Distance Weighted Discrimination. When the new intrinsic gene set was used to hierarchically cluster this combined test set, tumors were grouped into LumA, LumB, Basal-like, HER2+/ER-, and Normal Breast-like tumor subtypes that we demonstrated in previous datasets. These subtypes were associated with significant differences in Relapse-Free and Overall Survival. Multivariate Cox analysis of the combined test set showed that the intrinsic subtype classifications added significant prognostic information that was independent of standard clinical predictors. From the combined test set, we developed an objective and unchanging classifier based upon five intrinsic subtype mean expression profiles (i.e. centroids), which is designed for single sample predictions (SSP). The SSP approach was applied to two additional independent data sets and consistently predicted survival in both systemically treated and untreated patient groups.
This study validates the "breast tumor intrinsic" subtype classification as an objective means of tumor classification that should be translated into a clinical assay for further retrospective and prospective validation. In addition, our method of combining existing data sets can be used to robustly validate the potential clinical value of any new gene expression profile.