Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Research article

Improving model predictions for RNA interference activities that use support vector machine regression by combining and filtering features

Andrew S Peek

Author Affiliations

Department of Bioinformatics, Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA 52241, USA

BMC Bioinformatics 2007, 8:182  doi:10.1186/1471-2105-8-182

Published: 6 June 2007

Abstract

Background

RNA interference (RNAi) is a naturally occurring phenomenon that results in the suppression of a target RNA sequence utilizing a variety of possible methods and pathways. To dissect the factors that result in effective siRNA sequences a regression kernel Support Vector Machine (SVM) approach was used to quantitatively model RNA interference activities.

Results

Eight overall feature mapping methods were compared in their abilities to build SVM regression models that predict published siRNA activities. The primary factors in predictive SVM models are position specific nucleotide compositions. The secondary factors are position independent sequence motifs (N-grams) and guide strand to passenger strand sequence thermodynamics. Finally, the factors that are least contributory but are still predictive of efficacy are measures of intramolecular guide strand secondary structure and target strand secondary structure. Of these, the site of the 5' most base of the guide strand is the most informative.

Conclusion

The capacity of specific feature mapping methods and their ability to build predictive models of RNAi activity suggests a relative biological importance of these features. Some feature mapping methods are more informative in building predictive models and overall t-test filtering provides a method to remove some noisy features or make comparisons among datasets. Together, these features can yield predictive SVM regression models with increased predictive accuracy between predicted and observed activities both within datasets by cross validation, and between independently collected RNAi activity datasets. Feature filtering to remove features should be approached carefully in that it is possible to reduce feature set size without substantially reducing predictive models, but the features retained in the candidate models become increasingly distinct. Software to perform feature prediction and SVM training and testing on nucleic acid sequences can be found at the following site: ftp://scitoolsftp.idtdna.com/SEQ2SVM/ webcite.