Log on / register
Feedback | Support | My details

This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009) .

Open AccessResearch

ModuleDigger: an itemset mining framework for the detection of cis-regulatory modules

Hong Sun1 email, Tijl De Bie2 email, Valerie Storms3 email, Qiang Fu3 email, Thomas Dhollander1 email, Karen Lemmens1 email, Annemieke Verstuyf4 email, Bart De Moor1 email and Kathleen Marchal3 email

1Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium

2Department of Engineering Mathematics, university of Bristol, Bristol BS8 1TR, UK

3Department of Microbial and Molecular systems, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium

4Laboratory for experimental medicine and endocrinology, Katholieke Universiteit Leuven, 3000 Leuven, Belgium

author email corresponding author email

BMC Bioinformatics 2009, 10(Suppl 1):S30doi:10.1186/1471-2105-10-S1-S30

Published: 30 January 2009

Abstract

Background

The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected.

Results

We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools.

Conclusion

We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.