Evolving stochastic context--free grammars for RNA secondary structure prediction
1 Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, UK
2 Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, DK–8000 Aarhus C, Denmark
3 Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
BMC Bioinformatics 2012, 13:78 doi:10.1186/1471-2105-13-78Published: 4 May 2012
Stochastic Context–Free Grammars (SCFGs) were applied successfully to RNA secondary structure prediction in the early 90s, and used in combination with comparative methods in the late 90s. The set of SCFGs potentially useful for RNA secondary structure prediction is very large, but a few intuitively designed grammars have remained dominant. In this paper we investigate two automatic search techniques for effective grammars – exhaustive search for very compact grammars and an evolutionary algorithm to find larger grammars. We also examine whether grammar ambiguity is as problematic to structure prediction as has been previously suggested.
These search techniques were applied to predict RNA secondary structure on a maximal data set and revealed new and interesting grammars, though none are dramatically better than classic grammars. In general, results showed that many grammars with quite different structure could have very similar predictive ability. Many ambiguous grammars were found which were at least as effective as the best current unambiguous grammars.
Overall the method of evolving SCFGs for RNA secondary structure prediction proved effective in finding many grammars that had strong predictive accuracy, as good or slightly better than those designed manually. Furthermore, several of the best grammars found were ambiguous, demonstrating that such grammars should not be disregarded.