Normalization and experimental design for ChIP-chip data
1 Howard Hughes Medical Institute, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
2 Harvard-Partners Center for Genetics and Genomics, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA
3 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA
4 Children Hospital Informatics Program, Boston, Massachusetts 02115, USA
BMC Bioinformatics 2007, 8:219 doi:10.1186/1471-2105-8-219Published: 25 June 2007
Chromatin immunoprecipitation on tiling arrays (ChIP-chip) has been widely used to investigate the DNA binding sites for a variety of proteins on a genome-wide scale. However, several issues in the processing and analysis of ChIP-chip data have not been resolved fully, including the effect of background (mock control) subtraction and normalization within and across arrays.
The binding profiles of Drosophila male-specific lethal (MSL) complex on a tiling array provide a unique opportunity for investigating these topics, as it is known to bind on the X chromosome but not on the autosomes. These large bound and control regions on the same array allow clear evaluation of analytical methods.
We introduce a novel normalization scheme specifically designed for ChIP-chip data from dual-channel arrays and demonstrate that this step is critical for correcting systematic dye-bias that may exist in the data. Subtraction of the mock (non-specific antibody or no antibody) control data is generally needed to eliminate the bias, but appropriate normalization obviates the need for mock experiments and increases the correlation among replicates. The idea underlying the normalization can be used subsequently to estimate the background noise level in each array for normalization across arrays. We demonstrate the effectiveness of the methods with the MSL complex binding data and other publicly available data.
Proper normalization is essential for ChIP-chip experiments. The proposed normalization technique can correct systematic errors and compensate for the lack of mock control data, thus reducing the experimental cost and producing more accurate results.