This article is part of the supplement: Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)
ModuleDigger: an itemset mining framework for the detection of cis-regulatory modules
-
* Corresponding author: Kathleen Marchal kathleen.marchal@biw.kuleuven.be
1 Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium
2 Department of Engineering Mathematics, university of Bristol, Bristol BS8 1TR, UK
3 Department of Microbial and Molecular systems, Katholieke Universiteit Leuven, Kasteelpark Arenberg 20, 3001 Leuven, Belgium
4 Laboratory for experimental medicine and endocrinology, Katholieke Universiteit Leuven, 3000 Leuven, Belgium
BMC Bioinformatics 2009, 10(Suppl 1):S30 doi:10.1186/1471-2105-10-S1-S30
Published: 30 January 2009Abstract
Background
The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected.
Results
We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools.
Conclusion
We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.