Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data

Huihuang Yan12, Jared Evans1, Mike Kalmbach1, Raymond Moore1, Sumit Middha1, Stanislav Luban13, Liguo Wang1, Aditya Bhagwate1, Ying Li1, Zhifu Sun1, Xianfeng Chen1 and Jean-Pierre A Kocher1*

Author Affiliations

1 Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, 200 1st St SW, Rochester, MN 55905, USA

2 Epigenomics Translational Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA

3 Current address: Interdisciplinary Bioinformatics and Systems Biology Program, University of California at San Diego, La Jolla, CA 92093-0419, USA

For all author emails, please log on.

BMC Bioinformatics 2014, 15:280  doi:10.1186/1471-2105-15-280

Published: 15 August 2014

Abstract

Background

Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (ChIP-Seq) has been widely used to identify genomic loci of transcription factor (TF) binding and histone modifications. ChIP-Seq data analysis involves multiple steps from read mapping and peak calling to data integration and interpretation. It remains challenging and time-consuming to process large amounts of ChIP-Seq data derived from different antibodies or experimental designs using the same approach. To address this challenge, there is a need for a comprehensive analysis pipeline with flexible settings to accelerate the utilization of this powerful technology in epigenetics research.

Results

We have developed a highly integrative pipeline, termed HiChIP for systematic analysis of ChIP-Seq data. HiChIP incorporates several open source software packages selected based on internal assessments and published comparisons. It also includes a set of tools developed in-house. This workflow enables the analysis of both paired-end and single-end ChIP-Seq reads, with or without replicates for the characterization and annotation of both punctate and diffuse binding sites. The main functionality of HiChIP includes: (a) read quality checking; (b) read mapping and filtering; (c) peak calling and peak consistency analysis; and (d) result visualization. In addition, this pipeline contains modules for generating binding profiles over selected genomic features, de novo motif finding from transcription factor (TF) binding sites and functional annotation of peak associated genes.

Conclusions

HiChIP is a comprehensive analysis pipeline that can be configured to analyze ChIP-Seq data derived from varying antibodies and experiment designs. Using public ChIP-Seq data we demonstrate that HiChIP is a fast and reliable pipeline for processing large amounts of ChIP-Seq data.

Keywords:
ChIP-Seq; Next-generation sequencing; Peak calling; Duplicate filtering; Irreproducible discovery rate