Log on / register
Feedback | Support | My details

This article is part of the supplement: Neural Information Processing Systems (NIPS) workshop on New Problems and Methods in Computational Biology .

Open AccessProceedings

Accurate splice site prediction using support vector machines

Sören Sonnenburg* 1 email, Gabriele Schweikert* 2,3,4 email, Petra Philips* 2 email, Jonas Behr2 email and Gunnar Rätsch2 email

1Fraunhofer Institute FIRST, Kekuléstr. 7, 12489 Berlin, Germany

2Friedrich Miescher Laboratory of the Max Planck Society, Spemannstr. 39, 72076 Tübingen, Germany

3Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany

4Max Planck Institute for Developmental Biology, Spemannstr. 35, 72076 Tübingen, Germany

author email corresponding author email* Contributed equally

BMC Bioinformatics 2007, 8(Suppl 10):S7doi:10.1186/1471-2105-8-S10-S7

Published: 21 December 2007

Abstract

Background

For splice site recognition, one has to solve two classification problems: discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems typically rely on Markov Chains to solve these tasks.

Results

In this work we consider Support Vector Machines for splice site recognition. We employ the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in several experiments where we compare its prediction accuracy with that of recently proposed systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our performance estimates indicate that splice sites can be recognized very accurately in these genomes and that our method outperforms many other methods including Markov Chains, GeneSplicer and SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction tool ready to be used for incorporation in a gene finder.

Availability

Data, splits, additional information on the model selection, the whole genome predictions, as well as the stand-alone prediction tool are available for download at http://www.fml.mpg.de/raetsch/projects/splice webcite.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.