Open Access Research article

Correcting for intra-experiment variation in Illumina BeadChip data is necessary to generate robust gene-expression profiles

Robert R Kitchen1, Vicky S Sabine2, Andrew H Sims3*, E Jane Macaskill4, Lorna Renshaw4, Jeremy S Thomas4, Jano I van Hemert5, J Michael Dixon4 and John MS Bartlett2

Author Affiliations

1 School of Physics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK

2 Endocrine Cancer Group, Edinburgh Cancer Research Centre, Institute of Genetics and Molecular Medicine, Crewe Road South, Edinburgh, EH4 2XR, UK

3 Applied Bioinformatics of Cancer Group, Edinburgh Cancer Research Centre, Institute of Genetics and Molecular Medicine, Crewe Road South, Edinburgh, Edinburgh, EH4 2XR, UK

4 Breast Cancer Research Group, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XU, UK

5 School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB, UK

For all author emails, please log on.

BMC Genomics 2010, 11:134  doi:10.1186/1471-2164-11-134

Published: 24 February 2010

Additional files

Additional file 1:

Coefficient of variation amongst replicate UHRR samples. Two box and whiskers plots of the coefficient of variation (CV) of the replicate UHRR samples. The first plot (A) shows the experiment-wide CV of the UHRR samples. The left-most of the four main sections shows the CV of the raw (detection filtered) data, to the right of this is the CV after four popular normalisation algorithms; quantile, loess, cubic-spline (qspline), and median. The final two segments show the CV after batch-correcting each normalised dataset using either mean-centring or ComBat. In the second plot (B), from the left, the first four segments contain five box-plots illustrating the CV within each of the five runs; the four segments containing raw (white), quantile-normalised (dark-blue), mean-centred (lighter-blue), and ComBat-corrected (pale-blue) data respectively. All data were detection-filtered prior to analysis. The right-most segment shows the experiment-wide CV of the UHRR (coloured as the previous segments) calculated with no consideration of the individual runs.

Format: PDF Size: 173KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 2:

UHRR inter-run pairwise differences. Pairwise differences between each of the five runs calculated using UHRR samples for raw, quantile-normalised, mean-centred, and ComBat-corrected data.

Format: JPEG Size: 372KB Download file

Open Data

Additional file 3:

Number of differentially expressed genes identified in replicate analyses. Numbers of genes reported to be differentially expressed after standard analysis (quantile normalisation) (left), after a standard analysis with mean-centring (middle), and after a standard analysis augmented with the ComBat batch correction (right). A and B refer to the results from independent analyses of the duplicate sample groups while C refers to the results from the pooled duplicate samples. The rows of Venn diagrams illustrate the results with (i) limma, (ii) SAM, (iii) limma using UHRR, and (iv) SAM using UHRR.

Format: PDF Size: 914KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data

Additional file 4:

Heatmaps. Full heatmaps of quantile normalised, quantile normalised plus mean-centred and quantile normalised plus ComBat data, including probe and sample annotations.

Format: PDF Size: 599KB Download file

This file can be viewed with: Adobe Acrobat Reader

Open Data