Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Methodology article

A motif-independent metric for DNA sequence specificity

Luca Pinello12, Giosuè Lo Bosco3*, Bret Hanlon4 and Guo-Cheng Yuan12*

Author Affiliations

1 Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston MA 02115, USA

2 Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston MA 02115, USA

3 Dipartimento di Matematica ed Informatica, Via Archirafi 34, Palermo 90123, Italy

4 Department of Statistics, University of Wisconsin, 1300 University Ave Madison, WI 53706, USA

For all author emails, please log on.

BMC Bioinformatics 2011, 12:408  doi:10.1186/1471-2105-12-408

Published: 21 October 2011

Abstract

Background

Genome-wide mapping of protein-DNA interactions has been widely used to investigate biological functions of the genome. An important question is to what extent such interactions are regulated at the DNA sequence level. However, current investigation is hampered by the lack of computational methods for systematic evaluating sequence specificity.

Results

We present a simple, unbiased quantitative measure for DNA sequence specificity called the Motif Independent Measure (MIM). By analyzing both simulated and real experimental data, we found that the MIM measure can be used to detect sequence specificity independent of presence of transcription factor (TF) binding motifs. We also found that the level of specificity associated with H3K4me1 target sequences is highly cell-type specific and highest in embryonic stem (ES) cells. We predicted H3K4me1 target sequences by using the N- score model and found that the prediction accuracy is indeed high in ES cells.The software to compute the MIM is freely available at: https://github.com/lucapinello/mim webcite.

Conclusions

Our method provides a unified framework for quantifying DNA sequence specificity and serves as a guide for development of sequence-based prediction models.