Table 1

Clinical characteristics of k = 2,3,7 clusters: The ER, PR, Her2, Node, stage and grade status of the samples in each cluster are shown for k = 2,3 and 7. ND stands for "Not Determined". At k = 2, the clustering splits the data into normal samples and disease samples (BCA), except for one ADH which is classified with the normals. At k = 3, the BCA samples split into high grade (grade 2 or 3) and low grade (grade 1 or 2) categories. At k = 7, the low grade samples split into two clusters LG1, LG2 and the high grade into four: HG1 – HG4. The HG1 samples are all ER-, PR- and mostly Her2-. The HG3 and HG4 clusters are mostly ER+, PR+, Her2-. The HG2 cluster has mixed ER, PR and Her2 signatures. Using the Sorlie et al classification, we identify HG1 as the Basal-like subtype; LG1 as Luminal A; LG2, HG3 and HG4 as Luminal B and HG2 as the Her2+ subtype. When the sum of the entries for ER/PR/Her2/Node/Grade do not add up to the size of the cluster, it is because the corresponding information was missing in the dataset [10].

Cluster level k
Group
Size
Stage
ER
PR
Her2
Node
Grade







ADH
DCIS
IDC
N
Pos
Neg
ND
Pos
Neg
ND
Pos
Neg
ND
Pos
Neg
1
2
3

2
N
33
1


32















BCA
60
7
30
23

47
10
3
42
15
3
10
37
9
44
14
18
22
19
3
LG
28
7
13
8

26

2
21
5
2
4
18
6
20
8
18
9


HG
32

17
15

21
10
1
21
10
1
6
19
3
24
6

13
19
7
LG1
11
4
5
2

11


8
3

1
10

7
4
9
2


LG2
17
3
8
6

15

2
13
2
2
3
8
6
13
4
9
7


HG1
5

2
3


5


5

1
4

3



5

HG2
10

7
3

7
3

5
5

3
4
1
9
1

2
8

HG3
13

6
7

10
2
1
12

1
2
7
2
10
3

7
6

HG4
4

2
2

4


4



4

2
2

4


Dalgin et al. BMC Bioinformatics 2007 8:291   doi:10.1186/1471-2105-8-291