Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Highlights from the Third International Symposium for Computational Biology (ISCB) Student Council Symposium at the Fifteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB)

Open Access Oral presentation

ChIP-on-chip significance analysis reveals ubiquitous transcription factor binding

Adam A Margolin123*, Teresa Palomero2, Adolfo A Ferrando2, Andrea Califano12 and Gustavo Stolovitzky3

Author Affiliations

1 Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA

2 Joint Centers for Systems Biology, Columbia University, New York, NY 10032, USA

3 Systems Biology Group, IBM Research, Yorktown Heights, NY 10598, USA

For all author emails, please log on.

BMC Bioinformatics 2007, 8(Suppl 8):S2  doi:10.1186/1471-2105-8-S8-S2


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/8/S8/S2


Published:20 November 2007

© 2007 Margolin et al; licensee BioMed Central Ltd.

Background

ChIP-on-chip technology provides a genome-scale view of transcription factor (TF)/target interactions and a systems-level window into transcriptional regulatory networks. However, while many studies have used ChIP-on-chip data to effectively discover new TF targets, statistical methods have fallen short of developing an accurate model to disassociate signals caused by experimental noise from those caused by true biological variation, thus leveraging the technology to provide high confidence predictions of the full range of interactions.

Method

This paper presents a novel method to accurately model the significance of binding events measured by ChIP-on-chip data. For each arrayed probe representing a genomic segment, a ChIP-on-chip microarray measures intensity levels for the IP channel, which is enriched in genomic fragments bound by an immunoprecipitated TF, and the WCE channel, which represents random genomic fragments. Statistical significance is inferred by computing the conditional probability, p(M | A), where <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M1">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M2">View MathML</a> (Fig. 1). A kernel density estimation procedure is used to calculate the joint probability, p(M, A), and for each average intensity value, the mean of the null distribution (i.e. distribution for unbound probes) is inferred as <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M3">View MathML</a>. The distribution of p(M | A), for M <<a onClick="popup('http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M4">View MathML</a>, is then projected across <a onClick="popup('http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2105/8/S8/S2/mathml/M4">View MathML</a> to yield the inferred null distribution, which is used to assign statistical significance scores. Probes for replicate experiments and probes with genomic locations within the fragmentation length (~500 bp) are integrated to produce a single significance score for each genomic region.

thumbnailFigure 1. (Left) Magnitude versus amplitude (MA) plot of a ChIP-on-chip hybridization. The x-axis represents the average log2 intensity of the IP and WCE channels, and the y-axis represents the log2 ratio of IP/WCE. The black line represents the mean of the inferred null distribution, and the colored lines represent confidence intervals of .1, .01, and .001 probability. The model reveals an intensity dependent mean and variance of the null distribution, and a large number of probes are significantly enriched in the IP channel. (Right) The axes are the same as in the left panel, and colors represent the -log10 p-value of the null distribution.

Results

The method is tested on six different ChIP-on-chip arrays representing replicate experiments for three different TFs (NOTCH1, MYC and HES1). For each experiment, this analysis reveals an order of magnitude more genomic binding events than detected by traditional methods, predicting several thousand interactions for each TF and suggesting previously unappreciated complexity of transcriptional regulatory networks. Several independent experiments are used to provide evidence about the validity of these predictions. First, biochemical validation of more than 20 predicted targets by gene specific ChIP and qPCR confirm the accuracy of false discovery rate statistics computed by the method. Second, binding site enrichment analysis indicates that the strength of binding site signals are maintained over several thousand promoters. Finally, gene expression analysis reveals a coordinated downregulation of gene expression for the entire range of predicted NOTCH1 bound genes upon NOTCH1 inhibition experiments in cell lines, indicating that a large percentage of bound genes are also functionally regulated by NOTCH1.