BMC Bioinformatics

official impact factor 3.03

Open Access Methodology article

Effect of false positive and false negative rates on inference of binding target conservation across different conditions and species from ChIP-chip data

Debayan Datta1 and Hongyu Zhao3,2*

Author Affiliations

1 Department of Biomedical Engineering, Yale University, New Haven, CT 06520, USA

2 Department of Epidemiology and Public Health, Yale University, New Haven, CT 06520, USA

3 Department of Genetics, Yale University, New Haven, CT 06520, USA

For all author emails, please log on.

BMC Bioinformatics 2009, 10:23 doi:10.1186/1471-2105-10-23

Published: 19 January 2009

Abstract

Background

ChIP-chip data are routinely used to identify transcription factor binding targets. However, the presence of false positives and false negatives in ChIP-chip data complicates and hinders analyses, especially when the binding targets for a specific transcription factor are compared across conditions or species.

Results

We propose an Expectation Maximization based approach to infer the underlying true counts of "positives" and "negatives" from the observed counts. Based on this approach, we study the effect of false positives and false negatives on inferences related to transcription regulation.

Conclusion

Our results indicate that if there is a significant degree of association among the binding targets across conditions/species (log odds ratio > 4), moderate values of false positive and false negative rates (0.005 and 0.4 respectively) would not change our inference qualitatively (i.e. the presence or absence of conservation) based on the observed experimental data despite a significant change in the observed counts. However, if the underlying association is marginal, with odds ratios close to 1, moderate to large values of false positive and false negative rates (0.01 and 0.2 respectively) could mask the underlying association.