Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

Quantized correlation coefficient for measuring reproducibility of ChIP-chip data

Shouyong Peng12, Mitzi I Kuroda12 and Peter J Park134*

Author Affiliations

1 Department of Medicine, Brigham and Women's Hospital, Boston, MA, 02115 USA

2 Department of Genetics, Harvard Medical School, Boston, MA, 02115 USA

3 HST Informatics Program at Children's Hospital, Boston, MA, 02115 USA

4 Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115 USA

For all author emails, please log on.

BMC Bioinformatics 2010, 11:399  doi:10.1186/1471-2105-11-399

Published: 27 July 2010

Abstract

Background

Chromatin immunoprecipitation followed by microarray hybridization (ChIP-chip) is used to study protein-DNA interactions and histone modifications on a genome-scale. To ensure data quality, these experiments are usually performed in replicates, and a correlation coefficient between replicates is used often to assess reproducibility. However, the correlation coefficient can be misleading because it is affected not only by the reproducibility of the signal but also by the amount of binding signal present in the data.

Results

We develop the Quantized correlation coefficient (QCC) that is much less dependent on the amount of signal. This involves discretization of data into set of quantiles (quantization), a merging procedure to group the background probes, and recalculation of the Pearson correlation coefficient. This procedure reduces the influence of the background noise on the statistic, which then properly focuses more on the reproducibility of the signal. The performance of this procedure is tested in both simulated and real ChIP-chip data. For replicates with different levels of enrichment over background and coverage, we find that QCC reflects reproducibility more accurately and is more robust than the standard Pearson or Spearman correlation coefficients. The quantization and the merging procedure can also suggest a proper quantile threshold for separating signal from background for further analysis.

Conclusions

To measure reproducibility of ChIP-chip data correctly, a correlation coefficient that is robust to the amount of signal present should be used. QCC is one such measure. The QCC statistic can also be applied in a variety of other contexts for measuring reproducibility, including analysis of array CGH data for DNA copy number and gene expression data.