Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

This article is part of the supplement: NIPS workshop on New Problems and Methods in Computational Biology

Open Access Highly Accessed Proceedings

Learning Interpretable SVMs for Biological Sequence Classification

Gunnar Rätsch1*, Sören Sonnenburg2 and Christin Schäfer2

Author Affiliations

1 Friedrich Miescher Laboratory, Max Planck Society, Spemannstr. 39, Tübingen, Germany

2 Fraunhofer Institute FIRST, Kekuléstr. 7, 12489 Berlin, Germany

For all author emails, please log on.

BMC Bioinformatics 2006, 7(Suppl 1):S9  doi:10.1186/1471-2105-7-S1-S9

Published: 20 March 2006

Abstract

Background

Support Vector Machines (SVMs) – using a variety of string kernels – have been successfully applied to biological sequence classification problems. While SVMs achieve high classification accuracy they lack interpretability. In many applications, it does not suffice that an algorithm just detects a biological signal in the sequence, but it should also provide means to interpret its solution in order to gain biological insight.

Results

We propose novel and efficient algorithms for solving the so-called Support Vector Multiple Kernel Learning problem. The developed techniques can be used to understand the obtained support vector decision function in order to extract biologically relevant knowledge about the sequence analysis problem at hand. We apply the proposed methods to the task of acceptor splice site prediction and to the problem of recognizing alternatively spliced exons. Our algorithms compute sparse weightings of substring locations, highlighting which parts of the sequence are important for discrimination.

Conclusion

The proposed method is able to deal with thousands of examples while combining hundreds of kernels within reasonable time, and reliably identifies a few statistically significant positions.