This article is part of the supplement: The ISIBM International Joint Conferences on Bioinformatics, Systems Biology and Intelligent Computing (IJCBS)
Predicting siRNA potency with random forests and support vector machines
1 Department of Genetics and Biochemistry, Clemson University, Clemson, SC 29634, USA
2 School of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana 47907, USA
3 Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, Indianapolis, Indiana 46202, USA
4 Center for Research in Biological Systems, University of California at San Diego, La Jolla, California 92093-0043, USA
BMC Genomics 2010, 11(Suppl 3):S2 doi:10.1186/1471-2164-11-S3-S2Published: 1 December 2010
Short interfering RNAs (siRNAs) can be used to knockdown gene expression in functional genomics. For a target gene of interest, many siRNA molecules may be designed, whereas their efficiency of expression inhibition often varies.
To facilitate gene functional studies, we have developed a new machine learning method to predict siRNA potency based on random forests and support vector machines. Since there were many potential sequence features, random forests were used to select the most relevant features affecting gene expression inhibition. Support vector machine classifiers were then constructed using the selected sequence features for predicting siRNA potency. Interestingly, gene expression inhibition is significantly affected by nucleotide dimer and trimer compositions of siRNA sequence.
The findings in this study should help design potent siRNAs for functional genomics, and might also provide further insights into the molecular mechanism of RNA interference.