Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

Open Access Highly Accessed Methodology article

NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data

Nak-Kyeong Kim1*, Rasika V Jayatillake1 and John L Spouge2

Author Affiliations

1 Mathematics and Statistics Department, Old Dominion University, Norfolk, VA 23529, USA

2 Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894-6075, USA

For all author emails, please log on.

BMC Genomics 2013, 14:349  doi:10.1186/1471-2164-14-349

Published: 25 May 2013

Abstract

Background

Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) can locate transcription factor binding sites on genomic scale. Although many models and programs are available to call peaks, none has dominated its competition in comparison studies.

Results

We propose a rigorous statistical model, the normal-exponential two-peak (NEXT-peak) model, which parallels the physical processes generating the empirical data, and which can naturally incorporate mappability information. The model therefore estimates total strength of binding (even if some binding locations do not map uniquely into a reference genome, effectively censoring them); it also assigns an error to an estimated binding location. The comparison study with existing programs on real ChIP-seq datasets (STAT1, NRSF, and ZNF143) demonstrates that the NEXT-peak model performs well both in calling peaks and locating them. The model also provides a goodness-of-fit test, to screen out spurious peaks and to infer multiple binding events in a region.

Conclusions

The NEXT-peak program calls peaks on any test dataset about as accurately as any other, but provides unusual accuracy in the estimated location of the peaks it calls. NEXT-peak is based on rigorous statistics, so its model also provides a principled foundation for a more elaborate statistical analysis of ChIP-seq data.

Keywords:
ChIP-seq; Normal-exponential distribution; Continuous mixture; Poisson regression; Goodness-of-fit