Protein-specific prediction of mRNA binding using RNA sequences, binding motifs and predicted secondary structures
1 Department of Information Engineering and Computer Science, University of Trento, Via Sommarive 5, Trento, Italy
2 Current address: Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, Barcelona, Spain
3 Universitat Pompeu Fabra (UPF), Barcelona, Spain
BMC Bioinformatics 2014, 15:123 doi:10.1186/1471-2105-15-123Published: 29 April 2014
RNA-binding proteins interact with specific RNA molecules to regulate important cellular processes. It is therefore necessary to identify the RNA interaction partners in order to understand the precise functions of such proteins. Protein-RNA interactions are typically characterized using in vivo and in vitro experiments but these may not detect all binding partners. Therefore, computational methods that capture the protein-dependent nature of such binding interactions could help to predict potential binding partners in silico.
We have developed three methods to predict whether an RNA can interact with a particular RNA-binding protein using support vector machines and different features based on the sequence (the Oli method), the motif score (the OliMo method) and the secondary structure (the OliMoSS method). We applied these approaches to different experimentally-derived datasets and compared the predictions with RNAcontext and RPISeq. Oli outperformed OliMoSS and RPISeq, confirming our protein-specific predictions and suggesting that tetranucleotide frequencies are appropriate discriminative features. Oli and RNAcontext were the most competitive methods in terms of the area under curve. A precision-recall curve analysis achieved higher precision values for Oli. On a second experimental dataset including real negative binding information, Oli outperformed RNAcontext with a precision of 0.73 vs. 0.59.
Our experiments showed that features based on primary sequence information are sufficiently discriminating to predict specific RNA-protein interactions. Sequence motifs and secondary structure information were not necessary to improve these predictions. Finally we confirmed that protein-specific experimental data concerning RNA-protein interactions are valuable sources of information that can be used for the efficient training of models for in silico predictions. The scripts are available upon request to the corresponding author.