Table 1

Seven pairs (fourteen in total) of independent microarray datasets used for benchmarking

Phenotype

Dataset name and reference

Class 1 (control group) samples

Class 2 (case group) samples

Data source

Platform


Effect of smoking on bronchial epithelium

Never smokers

Current smokers


Beane [18]

21

52

GSE7895

U133A

Vanni [19]

22

37

GSE10135

U133 Plus 2


Subtypes of non-small cell lung cancer (NSCLC)

AD (adenocarcinoma)

SCC (squamous cell carcinoma)


Bild [20]

58

53

GSE3141

U133 Plus 2

Lee [21]

63

75

GSE8894

U133 Plus 2


Subtypes of primary high grade glioma

AA (anaplastic astrocytoma)

GBM (glioblastoma multiforme)


Phillips [22]

21

56

GSE4271

U133 Set

Sun [23]

19

77

GSE4290

U133 Plus 2


Estrogen receptor (ER) status in breast cancer

ER-negative

ER-positive


Chin [24]

46

84

E-TABM-158

U133A

Minn [25]

42

57

GSE2603

U133A


Breast cancer grade

Grade 1

Grade 3


Desmedt [26]

30

83

GSE7390

U133A

Sotiriou [27]

28

32

GSE2990

U133A


Lung cancer grade

Grade 1

Grade 3


Dana-Farber [28]

13

37

Author's website

U133A

Michigan [28]

26

66


Clear cell renal cell carcinoma (CCRCC) vs Normal kidney

Normal kidney

Tumorous kidney


Jones [29]

23

32

GSE15641

U133A

Kort [30]

12

10

GSE11024

U133 Plus 2


Each of the datasets was referred to in the main text by the dataset name, which is the first author's name or cohort name. Numbers in the table indicate the number of samples.

Hwang BMC Genomics 2012 13(Suppl 7):S26   doi:10.1186/1471-2164-13-S7-S26

Open Data