Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Profiled support vector machines for antisense oligonucleotide efficacy prediction

Gustavo Camps-Valls1*, Alistair M Chalk2, Antonio J Serrano-López1, José D Martín-Guerrero1 and Erik LL Sonnhammer2

Author Affiliations

1 Grup de Processament Digital de Senyals, Universitat de València, Spain. C/ Dr. Moliner, 50. 46100 Burjassot, València, Spain

2 Center for Genomics and Bioinformatics (CGB), Karolinska Institutet, S-17177, Stockholm, Sweden

For all author emails, please log on.

BMC Bioinformatics 2004, 5:135  doi:10.1186/1471-2105-5-135

Published: 22 September 2004

Abstract

Background

This paper presents the use of Support Vector Machines (SVMs) for prediction and analysis of antisense oligonucleotide (AO) efficacy. The collected database comprises 315 AO molecules including 68 features each, inducing a problem well-suited to SVMs. The task of feature selection is crucial given the presence of noisy or redundant features, and the well-known problem of the curse of dimensionality. We propose a two-stage strategy to develop an optimal model: (1) feature selection using correlation analysis, mutual information, and SVM-based recursive feature elimination (SVM-RFE), and (2) AO prediction using standard and profiled SVM formulations. A profiled SVM gives different weights to different parts of the training data to focus the training on the most important regions.

Results

In the first stage, the SVM-RFE technique was most efficient and robust in the presence of low number of samples and high input space dimension. This method yielded an optimal subset of 14 representative features, which were all related to energy and sequence motifs. The second stage evaluated the performance of the predictors (overall correlation coefficient between observed and predicted efficacy, r; mean error, ME; and root-mean-square-error, RMSE) using 8-fold and minus-one-RNA cross-validation methods. The profiled SVM produced the best results (r = 0.44, ME = 0.022, and RMSE= 0.278) and predicted high (>75% inhibition of gene expression) and low efficacy (<25%) AOs with a success rate of 83.3% and 82.9%, respectively, which is better than by previous approaches. A web server for AO prediction is available online at http://aosvm.cgb.ki.se/ webcite.

Conclusions

The SVM approach is well suited to the AO prediction problem, and yields a prediction accuracy superior to previous methods. The profiled SVM was found to perform better than the standard SVM, suggesting that it could lead to improvements in other prediction problems as well.