PMS5: an efficient exact algorithm for the (ℓ, d)-motif finding problem
Department of CSE, University of Connecticut, Storrs, CT 06269, USA
BMC Bioinformatics 2011, 12:410 doi:10.1186/1471-2105-12-410Published: 24 October 2011
Motifs are patterns found in biological sequences that are vital for understanding gene function, human disease, drug design, etc. They are helpful in finding transcriptional regulatory elements, transcription factor binding sites, and so on. As a result, the problem of identifying motifs is very crucial in biology.
Many facets of the motif search problem have been identified in the literature. One of them is (ℓ, d)-motif search (or Planted Motif Search (PMS)). The PMS problem has been well investigated and shown to be NP-hard. Any algorithm for PMS that always finds all the (ℓ, d)-motifs on a given input set is called an exact algorithm. In this paper we focus on exact algorithms only. All the known exact algorithms for PMS take exponential time in some of the underlying parameters in the worst case scenario. But it does not mean that we cannot design exact algorithms for solving practical instances within a reasonable amount of time. In this paper, we propose a fast algorithm that can solve the well-known challenging instances of PMS: (21, 8) and (23, 9). No prior exact algorithm could solve these instances. In particular, our proposed algorithm takes about 10 hours on the challenging instance (21, 8) and about 54 hours on the challenging instance (23, 9). The algorithm has been run on a single 2.4GHz PC with 3GB RAM. The implementation of PMS5 is freely available on the web at http://www.pms.engr.uconn.edu/downloads/PMS5.zip webcite.
We present an efficient algorithm PMS5 that uses some novel ideas and combines them with well-known algorithm PMS1 and PMSPrune. PMS5 can tackle the large challenging instances (21, 8) and (23, 9). Therefore, we hope that PMS5 will help biologists discover longer motifs in the futures.