Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: Highlights from the Fifth International Society for Computational Biology (ISCB) Student Council Symposium

Open Access Oral presentation

KIRMES: kernel-based identification of regulatory modules in euchromatic sequences

Sebastian J Schultheiss12*, Wolfgang Busch23, Jan Lohmann24, Oliver Kohlbacher5 and Gunnar Rätsch1

  • * Corresponding author: Sebastian J Schultheiss

Author Affiliations

1 Machine Learning in Biology Research Group, Friedrich Miescher Laboratory of the Max Planck Society, 72076 Tuebingen, Germany

2 Max Planck Institute for Developmental Biology, 72076 Tuebingen, Germany

3 Biology Department, Duke University, Durham, NC 27710, USA

4 Department of Stem Cell Research, University of Heidelberg, 69120 Heidelberg, Germany

5 Simulation of Biological Systems, Wilhelm Schickard Institute for Computer Science, University of Tuebingen, 72076 Tuebingen, Germany

For all author emails, please log on.

BMC Bioinformatics 2009, 10(Suppl 13):O1  doi:10.1186/1471-2105-10-S13-O1


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/S13/O1


Published:19 October 2009

© 2009 Schultheiss et al; licensee BioMed Central Ltd.

Background

We predict transcription factor (TF) target genes based on their regulatory sequence. A TF binding site is a short segment (~10 bp) near a gene's regulatory region that is recognized by respective TFs. Overrepresented motifs can be identified in regulatory sequences of a set of genes that is enriched with targets for a specific TF. Gibbs-sampling methods that try to identify position weight matrices to characterize binding sites have been successful for small genomes, but are problematic in higher eukaryotes, where motifs are degenerate and form cis-regulatory modules [1].

Methods

Our method classifies genes as TF targets. We use de novo motif finding and subsequently apply a Support Vector Machine employing a kernel that captures information about the motifs, their relative location, and sequence conservation (see Figure 1). The weighted degree kernel with shifts (WDS) computes the similarity of fixed-length sequences. We extend this kernel with conservation information and information about motif co-occurrence to the Regulatory Modules kernel [2]. KIRMES is available on our Galaxy server http://galaxy.tuebingen.mpg.de webcite. Using positional oligomer importance matrices [3], we are able to make the output of the kernel interpretable by displaying a sequence logo of the oligomers that contributed most to the correct classification.

thumbnailFigure 1. The idea behind the Regulatory Modules kernel: A motif finder is applied to regulatory sequences (long, gray bars) and identifies overrepresented motifs (colored segments). Around the best-matching motifs (boxed) in every sequence we excise 20 base pairs around the center. Conservation information and the pairwise distances of motifs to each other and to the end of the sequence are added to form the Regulatory Modules kernel, concatenating feature spaces.

Results

We compared our method to a state-of-the-art Gibbs sampler, PRIORITY [4], on its own dataset with the published settings with respect to successful classification. We achieve correct predictions on 74% of their sets vs. 63% for PRIORITY. We let KIRMES classify gene sets obtained from microarrays of Arabidopsis thaliana. Using conservation as weighting for the WDS kernel improves performance. These results illustrate the power of our approach in exploiting the relationship between motifs as well as conservation to improve the recognition of TF targets. Interpretable results and an easy-to-use web service make this a valuable tool for any researcher interested in gene regulation.

References

  1. Gupta M, Liu J: De novo cis-regulatory module elicitation for eukaryotic genomes.

    Proc Natl Acad Sci USA 2005, 102(20):7079-7084. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  2. Schultheiss SJ, Busch W, Lohmann JU, Kohlbacher O, Rätsch G: KIRMES: Kernel-based identification of regulatory modules in euchromatic sequences.

    Bioinformatics 2009.

    epub: 23 April 2009.

    PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  3. Sonnenburg S, Zien A, Philips P, Rätsch G: POIMs: Positional Oligomer Importance Matrices – understanding support vector machine-based signal detectors.

    Bioinformatics 2008, 24(13):i6-14. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  4. Gordan R, Narlikar L, Hartemink A: A fast, alignment-free, conservation-based method for transcription factor binding site discovery. In Lecture Notes in Computer Science: RECOMB 2008. Volume 4955. Springer, Heidelberg, Germany; :98-111. OpenURL