Open Access Highly Accessed Research article

The value of position-specific priors in motif discovery using MEME

Timothy L Bailey*, Mikael Bodén, Tom Whitington and Philip Machanick

Author Affiliations

Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Queensland, Australia

For all author emails, please log on.

BMC Bioinformatics 2010, 11:179  doi:10.1186/1471-2105-11-179

Published: 9 April 2010

Abstract

Background

Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM).

Results

We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior.

Conclusions

We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.