Biclustering reveals breast cancer tumour subgroups with common clinical features and improves prediction of disease recurrence
1 Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
2 Department of Molecular Medicine and Pathology, University of Auckland, Auckland, New Zealand
3 New Zealand Bioinformatics Institute, University of Auckland, Auckland, New Zealand
4 Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, Auckland, New Zealand
5 Department of Engineering Science, University of Auckland, Auckland, New Zealand
6 Melbourne School of Engineering, University of Melbourne, Victoria, Australia
BMC Genomics 2013, 14:102 doi:10.1186/1471-2164-14-102Published: 13 February 2013
Additional file 1: Figure S1:
Clinical information for each individual bicluster determined by cMonkey from gene expression data of breast cancer patients who had not received adjuvant treatment (437 patients). The clinical variables include cohort, lymph node status (LN.status: LN+, LN- and NA), estrogen receptor status (ER.status: ER+, ER- and NA), progesterone receptor status (PgR.status: PgR+, PgR- and NA), tumour grade (Grade: G1, G2, G3 and NA), molecular subtype (Subtype: Basal, Her2, Luminal A, Luminal B, Normal and None), tumour size (Size in mm) and patient Age (in years). Missing information is noted as NA (Not Available). The percentage noted on each bar indicates the percentage of tumour samples of the specific category in each bicluster. Figure S2. Kaplan-Meier estimates of survival distribution of randomly selected patient groups. Kaplan-Meier estimate of disease free survival (DFS) for 44 groups with arbitrary patient membership (all patients were randomly allocated into 44 groups), superimposed on survival curve of (A) histological grades and (B) tumour subtypes, respectively. Distribution of bicluster-associated survival curves in Figure 4 is much broader than the distribution of patient group-associated survival curves by chance. Figure S3. Kaplan-Meier plots of pairs of biclusters showing differences in survival. Kaplan-Meier plots estimate disease free survival (DFS) distribution of pairwise biclusters comparison with significant difference in their survival distribution (logrank p value < 0.01). In each subplot, blue curve presents patients’ group associated with good prognosis while curves in green represents patients’ group associated with poor prognosis. Logrank p-values denotes on the top of each subplot indicates the significance of difference in survival distribution between two biclusters. Figure S4. Volcano plots showing genes that are differentially expressed between biclusters which have significant difference (logrank p values < 0.01) in their survival curves, associated with good and poor prognosis respectively. The statistical significance of differential gene expression was quantified by two sample t-test p-value. The fold change in expression level of each gene between two biclusters was evaluated. We identified genes whose expression level was significantly different by at least 2-fold change between biclusters associated with good and poor survival. We plot the statistical significance of differentially expressed genes against fold change in the expression levels between low and high risk biclusters to identify both statistically and biologically significant genes. By setting the cut off of significance level at 0.05 (horizontal red dashed line) and fold change at 2 (two vertical red dashed lines), the volcano plots identified genes with elevated expression levels, which represents the top-most significantly differentially expressed genes between biclusters associated with good and poor survival. The volcano plot was generated by mavolcanoplot in MatLab. Figure S5. Functional enrichment heatmap for PAM gene classifiers of biclusters which have significant difference (logrank p values < 0.01) in their survival curves, associated with good and poor prognosis.
Format: PDF Size: 8.6MB Download file
This file can be viewed with: Adobe Acrobat Reader
Additional file 2:
Summarizes the significantly enriched (GeneSetDB p-value < 0.01) functional annotations of genes that were most strongly associated with clinical outcome for each bicluster.
Format: XLSX Size: 260KB Download file
Additional file 3: Table S1:
Functional annotation enrichment in biclusters associated with low risk of relapse and in biclusters associated with high risk of relapse. Table S2. Differentially expressed genes between Bicluster 7 and Bicluster 21. Genes are ranked in ascending order of p-values reported from two-sample t-test, and the related statistics scores of each gene, including t-scores, p-values, False Discovery Rate (FDR), q-values and Benjamini-Hochberg (BH) adjusted FDR are summarized. Table S3. Enrichment analysis of differentially expressed genes between Bicluster 7 and Bicluster 21 determined by GeneSetDB at the significance level < 0.001. Table S4. Gene Classifiers determined by PAM – the maximum number of probe sets which can accurately characterise Bicluster 7 and Bicluster 21. The highlighted genes overlap with the OncotypeDx commercial gene list. Table S5. Multivariate analyses of prognostic importance of biclusters in comparison to conventional clinical factors, molecular tumour subtypes, and genetic grade. Significance of biclustered classification for prognosis was compared to chance by randomly assigned tumours into 44 arbitrary groups and estimating the association between membership of these arbitrary groups and DFS. Univariate Cox PH analysis revealed no statistically significant association between arbitrary group membership and survival outcome. Significance is indicated in the ranges: 0 ≤ 0.001 ‘***’; 0.001 ≤ 0.01 ‘**’; 0.01 ≤ 0.05 ‘*’; 0.05 ≤ 0.1 ‘.’; 0.1 ≤ 1 ‘-’. Table S6. Summary of clinical information of 437 non-adjuvant treated patients.
Format: XLSX Size: 39KB Download file
Additional file 4:
Summarises probe sets which were able to accurately characterise survival outcome differences between the biclusters.
Format: XLSX Size: 136KB Download file