Table 1

Characteristics of the Combined Dataset.

Training (~2/3)

Testing (~1/3)



Characteristics

Total N

%

N

N

P-value*


Subjects

550

359

191

-


ER

+

395

71.8%

259

136

0.89

-

155

28.2%

100

55


Size

< 2 cm

309

56.2%

198

111

0.56

≥ 2 cm

241

43.8%

161

80


HER2*

+

110

20.0%

73

37

0.88

-

440

80.0%

286

154


Grade

1

98

17.8%

63

35

0.45

2

182

33.1%

113

69

3

270

49.1%

183

87


Published Dataset^

Ivshina

137

24.9%

89

48

1

Loi

42

7.6%

28

14

NKI

141

25.6%

92

49

UNC

33

6.0%

22

11

Wang

197

35.8%

128

69


Platform

Affymetrix

376

68.4%

245

131

0.99

Agilent

174

31.6%

114

60


Subtype (PAM50)

Luminal A

156

28.4%

98

58

0.92

Luminal B

131

23.8%

85

46

HER2-enriched

83

15.1%

56

27

Basal-like

106

19.3%

72

34

Normal Breast-like

74

13.5%

48

26


*HER2 status is based on ERBB2 mRNA levels. P-values have been calculated based on a Chi-square test.

^compiled from Ivshina et al., 2006; Loi et al., 2007; van de Vijver et al., 2002; Wang et al., 2005; http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15393 webcite.

Fan et al. BMC Medical Genomics 2011 4:3   doi:10.1186/1755-8794-4-3

Open Data