Email updates

Keep up to date with the latest news and content from BMC Genomics and BioMed Central.

This article is part of the supplement: Selected articles from the Tenth Asia Pacific Bioinformatics Conference (APBC 2012)

Open Access Proceedings

Improving ChIP-seq peak-calling for functional co-regulator binding by integrating multiple sources of biological information

Hatice Ulku Osmanbeyoglu1, Ryan J Hartmaier2, Steffi Oesterreich2 and Xinghua Lu1*

Author affiliations

1 Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA

2 Department of Pharmacology and Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA

For all author emails, please log on.

Citation and License

BMC Genomics 2012, 13(Suppl 1):S1  doi:10.1186/1471-2164-13-S1-S1

Published: 17 January 2012

Abstract

Background

Chromatin immunoprecipitation coupled with massively parallel sequencing (ChIP-seq) is increasingly being applied to study genome-wide binding sites of transcription factors. There is an increasing interest in understanding the mechanism of action of co-regulator proteins, which do not bind DNA directly, but exert their effects by binding to transcription factors such as the estrogen receptor (ER). However, due to the nature of detecting indirect protein-DNA interaction, ChIP-seq signals from co-regulators can be relatively weak and thus biologically meaningful interactions remain difficult to identify.

Results

In this study, we investigated and compared different statistical and machine learning approaches including unsupervised, supervised, and semi-supervised classification (self-training) approaches to integrate multiple types of genomic and transcriptomic information derived from our experiments and public database to overcome difficulty of identifying functional DNA binding sites of the co-regulator SRC-1 in the context of estrogen response. Our results indicate that supervised learning with naïve Bayes algorithm significantly enhances peak calling of weak ChIP-seq signals and outperforms other machine learning algorithms. Our integrative approach revealed many potential ERα/SRC-1 DNA binding sites that would otherwise be missed by conventional peak calling algorithms with default settings.

Conclusions

Our results indicate that a supervised classification approach enables one to utilize limited amounts of prior knowledge together with multiple types of biological data to enhance the sensitivity and specificity of the identification of DNA binding sites from co-regulator proteins.