Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Optimized mixed Markov models for motif identification

Weichun Huang123*, David M Umbach2, Uwe Ohler3 and Leping Li2

Author affiliations

1 Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27606, USA

2 Biostatistics Branch, The National Institute of Environmental Health Sciences, National Institutes of Health, RTP, NC 27709, USA

3 Institute for Genome Sciences & Policy, Duke University Medical Center, Durham, NC 27708, USA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2006, 7:279  doi:10.1186/1471-2105-7-279

Published: 2 June 2006

Abstract

Background

Identifying functional elements, such as transcriptional factor binding sites, is a fundamental step in reconstructing gene regulatory networks and remains a challenging issue, largely due to limited availability of training samples.

Results

We introduce a novel and flexible model, the

    O
ptimized
    Mi
xture
    Ma
rkov model (OMiMa), and related methods to allow adjustment of model complexity for different motifs. In comparison with other leading methods, OMiMa can incorporate more than the NNSplice's pairwise dependencies; OMiMa avoids model over-fitting better than the Permuted Variable Length Markov Model (PVLMM); and OMiMa requires smaller training samples than the Maximum Entropy Model (MEM). Testing on both simulated and actual data (regulatory cis-elements and splice sites), we found OMiMa's performance superior to the other leading methods in terms of prediction accuracy, required size of training data or computational time. Our OMiMa system, to our knowledge, is the only motif finding tool that incorporates automatic selection of the best model. OMiMa is freely available at [1].

Conclusion

Our optimized mixture of Markov models represents an alternative to the existing methods for modeling dependent structures within a biological motif. Our model is conceptually simple and effective, and can improve prediction accuracy and/or computational speed over other leading methods.