Additional file 1: Figure S1.

Clinical information for each individual bicluster determined by cMonkey from gene expression data of breast cancer patients who had not received adjuvant treatment (437 patients). The clinical variables include cohort, lymph node status (LN.status: LN+, LN- and NA), estrogen receptor status (ER.status: ER+, ER- and NA), progesterone receptor status (PgR.status: PgR+, PgR- and NA), tumour grade (Grade: G1, G2, G3 and NA), molecular subtype (Subtype: Basal, Her2, Luminal A, Luminal B, Normal and None), tumour size (Size in mm) and patient Age (in years). Missing information is noted as NA (Not Available). The percentage noted on each bar indicates the percentage of tumour samples of the specific category in each bicluster. Figure S2. Kaplan-Meier estimates of survival distribution of randomly selected patient groups. Kaplan-Meier estimate of disease free survival (DFS) for 44 groups with arbitrary patient membership (all patients were randomly allocated into 44 groups), superimposed on survival curve of (A) histological grades and (B) tumour subtypes, respectively. Distribution of bicluster-associated survival curves in Figure 4 is much broader than the distribution of patient group-associated survival curves by chance. Figure S3. Kaplan-Meier plots of pairs of biclusters showing differences in survival. Kaplan-Meier plots estimate disease free survival (DFS) distribution of pairwise biclusters comparison with significant difference in their survival distribution (logrank p value < 0.01). In each subplot, blue curve presents patients’ group associated with good prognosis while curves in green represents patients’ group associated with poor prognosis. Logrank p-values denotes on the top of each subplot indicates the significance of difference in survival distribution between two biclusters. Figure S4. Volcano plots showing genes that are differentially expressed between biclusters which have significant difference (logrank p values < 0.01) in their survival curves, associated with good and poor prognosis respectively. The statistical significance of differential gene expression was quantified by two sample t-test p-value. The fold change in expression level of each gene between two biclusters was evaluated. We identified genes whose expression level was significantly different by at least 2-fold change between biclusters associated with good and poor survival. We plot the statistical significance of differentially expressed genes against fold change in the expression levels between low and high risk biclusters to identify both statistically and biologically significant genes. By setting the cut off of significance level at 0.05 (horizontal red dashed line) and fold change at 2 (two vertical red dashed lines), the volcano plots identified genes with elevated expression levels, which represents the top-most significantly differentially expressed genes between biclusters associated with good and poor survival. The volcano plot was generated by mavolcanoplot in MatLab. Figure S5. Functional enrichment heatmap for PAM gene classifiers of biclusters which have significant difference (logrank p values < 0.01) in their survival curves, associated with good and poor prognosis.

Format: PDF Size: 8.6MB Download file

This file can be viewed with: Adobe Acrobat Reader

Wang et al. BMC Genomics 2013 14:102   doi:10.1186/1471-2164-14-102