Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Research article

Discovery and validation of breast cancer subtypes

Amy V Kapp1*, Stefanie S Jeffrey2, Anita Langerød3, Anne-Lise Børresen-Dale34, Wonshik Han5, Dong-Young Noh5, Ida RK Bukholm67, Monica Nicolau2, Patrick O Brown8 and Robert Tibshirani19

Author Affiliations

1 Department of Statistics, Stanford University, Stanford, CA, USA

2 Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA

3 Department of Genetics, Institute for Cancer Research, Rikshospitalet-Radiumhospitalet Medical Center, Oslo, Norway

4 Medical Faculty, University of Oslo, Oslo, Norway

5 Department of Surgery, Seoul National University College of Medicine, Seoul, Korea

6 Department of Surgery, Akershus University Hospital, Nordbyhagen, Norway

7 University of Oslo, Oslo, Norway

8 Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA

9 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA

For all author emails, please log on.

BMC Genomics 2006, 7:231  doi:10.1186/1471-2164-7-231

Published: 11 September 2006



Previous studies demonstrated breast cancer tumor tissue samples could be classified into different subtypes based upon DNA microarray profiles. The most recent study presented evidence for the existence of five different subtypes: normal breast-like, basal, luminal A, luminal B, and ERBB2+.


Based upon the analysis of 599 microarrays (five separate cDNA microarray datasets) using a novel approach, we present evidence in support of the most consistently identifiable subtypes of breast cancer tumor tissue microarrays being: ESR1+/ERBB2-, ESR1-/ERBB2-, and ERBB2+ (collectively called the ESR1/ERBB2 subtypes). We validate all three subtypes statistically and show the subtype to which a sample belongs is a significant predictor of overall survival and distant-metastasis free probability.


As a consequence of the statistical validation procedure we have a set of centroids which can be applied to any microarray (indexed by UniGene Cluster ID) to classify it to one of the ESR1/ERBB2 subtypes. Moreover, the method used to define the ESR1/ERBB2 subtypes is not specific to the disease. The method can be used to identify subtypes in any disease for which there are at least two independent microarray datasets of disease samples.