BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Research article

A comprehensive sensitivity analysis of microarray breast cancer classification under feature variability

Herman MJ Sontrop1*, Perry D Moerland2, René van den Ham3, Marcel JT Reinders4 and Wim FJ Verhaegh1

Author Affiliations

1 Molecular Diagnostics Department, Philips Research, High Tech Campus 12a, 5656 AE Eindhoven, the Netherlands

2 Bioinformatics Laboratory, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Meibergdreef 9, 1100 AZ Amsterdam, the Netherlands

3 Biomolecular Engineering Department, Philips Research, High Tech Campus 11, 5656 AE Eindhoven, the Netherlands

4 Delft Bioinformatics Lab, Delft University of Technology, Mekelweg 4, 2628 CD Delft, the Netherlands

For all author emails, please log on.

BMC Bioinformatics 2009, 10:389 doi:10.1186/1471-2105-10-389

Published: 26 November 2009

Additional files

Additional file 1:

Overview of 947 Affymetrix hybridizations. The column DataSetName indicates to what study each hybridization corresponds. For each study the repository and corresponding accession number can be found in Table 1 in the main text. The column FileName indicates the exact file name for each hybridization as used in the corresponding repository. For the datasets of Desmedt, Minn, Loi and Chin, the class label was based on the time of distant metastasis free survival (t.dmfs, in months) and corresponding event indicator e.dmfs. For the datasets of Miller and Pawitan, the class label was based on the time of breast cancer specific overall survival (t.sos, in months) and corresponding event indicator e.sos.

Format: PDF Size: 133KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

Overview of additional 87 hybridizations from the Van de Vijver. The column SampleID indicates the identifiers as used by Van de Vijver et al. [34]. The selected hybridizations represent all lymph-node negative cases that were not yet contained in the original publication by Van 't Veer et al. [2].

Format: PDF Size: 160KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 3:

Supplementary information on perturbation schemes and classifiers. The file contains additional information on how to construct perturbed expression profiles for MAS5.0, dChip and the Rosetta data. In addition, information on parameter settings for the SVM and RF classifiers is provided.

Format: PDF Size: 155KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Performance curves for the Affymetrix datasets using the single-rank feature sets. Rows represent different preprocessing pairs, while columns represent curves for different datasets. Within each cell, performance curves corresponding to different classifiers are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average balanced accuracy over 50 splits. For each dataset and split, the top-100 feature set was computed using the single-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

Format: PS Size: 314KB Download file

Open Data

Additional file 5:

Discordance curves for the Affymetrix datasets using the single-rank feature sets. Rows represent different preprocessing pairs, while columns represent curves for different datasets. Within each cell, discordance curves corresponding to different classifiers are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases, over 50 splits, of inconsistent class assignments on the unperturbed validation sets. For each dataset and split, the top-100 feature set was computed using the single-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

Format: PS Size: 406KB Download file

Open Data

Additional file 6:

Stability curves for the Affymetrix datasets using the single-rank feature sets. Rows represent curves obtained using different classifiers, while columns represent curves for different datasets. Within each cell, stability curves associated with different preprocessing methods are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases over 50 splits with a map-score larger than 35. For each dataset and split, the top-100 feature set was computed using the single-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

Format: PS Size: 316KB Download file

Open Data

Additional file 7:

Discordance curves for the poor prognosis cases in the Affymetrix datasets using the multi-rank feature sets. Rows represent different preprocessing pairs, while columns represent curves for different datasets. Within each cell, discordance curves corresponding to different classifiers are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases, over 50 splits, of inconsistent class assignments on the unperturbed validation sets. For each dataset and split, the top-100 feature set was computed using the multi-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

Format: PS Size: 399KB Download file

Open Data

Additional file 8:

Discordance curves for the good prognosis cases in the Affymetrix datasets using the multi-rank feature sets. Rows represent different preprocessing pairs, while columns represent curves for different datasets. Within each cell, discordance curves corresponding to different classifiers are shown in separate colors. The color scheme is shown at the bottom of the figure. Within a cell the x-axis provides the signature size, while the y-axis gives the average percentage of cases, over 50 splits, of inconsistent class assignments on the unperturbed validation sets. For each dataset and split, the top-100 feature set was computed using the multi-rank strategy and this ranking was subsequently used for all classifiers in order to construct signatures.

Format: PS Size: 404KB Download file

Open Data