Open Access Research article

Overrepresentation of transcription factor families in the genesets underlying breast cancer subtypes

Himanshu Joshi12, Silje H Nord3, Arnoldo Frigessi4, Anne-Lise Børresen-Dale23 and Vessela N Kristensen123*

Author affiliations

1 Department of Clinical Molecular Biology and Laboratory Sciences (EpiGen), Division of Medicine, Akershus University Hospital, Lorenskog, Norway

2 Institute for Clinical Medicine, University in Oslo, Oslo, Norway

3 Department of Genetics, Institute for Cancer Research Oslo University Hospital, Radiumhospitalet, Norway

4 Department of Biostatistics, Institute of Basic Medical Sciences, University of Oslo, Blindern, 0317, Oslo, Norway

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13:199  doi:10.1186/1471-2164-13-199

Published: 22 May 2012



The human genome contains a large amount of cis-regulatory DNA elements responsible for directing both spatial and temporal gene-expression patterns. Previous studies have shown that based on their mRNA expression breast tumors could be divided into five subgroups (Luminal A, Luminal B, Basal, ErbB2+ and Normal-like), each with a distinct molecular portrait. Whole genome gene expression analysis of independent sets of breast tumors reveals repeatedly the robustness of this classification. Furthermore, breast tumors carrying a TP53 mutation show a distinct gene expression profile, which is in strong association to the distinct molecular portraits. The mRNA expression of 552 genes, which varied considerably among the different tumors, but little between two samples of the same tumor, has been shown to be sufficient to separate these tumor subgroups.


We analyzed in silico the transcriptional regulation of genes defining the subgroups at 3 different levels: 1. We studied the pathways in which the genes distinguishing the subgroups of breast cancer may be jointly involved including upstream regulators (1st and 2nd level of regulation) as well as downstream targets of these genes. 2. Then we analyzed the promoter areas of these genes (−500 bp to +100 bp relative to the transcription start site) for canonical transcription binding sites using Genomatix. 3. We looked for the actual expression levels of the identified TF and how they correlate with the overrepresentation of their TF binding sites in the separate groups. We report that promoter composition of the genes that most strongly predict the patient subgroups is distinct. The class-predictive genes showed a clearly different degree of overrepresentation of transcription factor families in their promoter sequences.


The study suggests that transcription factors responsible for the observed expression pattern in breast cancers may lead us to important biological pathways.