Open Access Research article

Quantifying variances in comparative RNA secondary structure prediction

James WJ Anderson1*, Ádám Novák1, Zsuzsanna Sükösd2, Michael Golden3, Preeti Arunapuram4, Ingolfur Edvardsson5 and Jotun Hein1

Author Affiliations

1 Department of Statistics, South Parks Road, Oxford, UK

2 Bioinformatics Research Center, Aarhus University, Aarhus, Denmark

3 Computational Biology Group, University of Cape Town, Rondebosch, South Africa

4 Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

5 Department of Mathematics, Reykjavik University, Reykjavik, Iceland

For all author emails, please log on.

BMC Bioinformatics 2013, 14:149  doi:10.1186/1471-2105-14-149

Published: 1 May 2013

Abstract

Background

With the advancement of next-generation sequencing and transcriptomics technologies, regulatory effects involving RNA, in particular RNA structural changes are being detected. These results often rely on RNA secondary structure predictions. However, current approaches to RNA secondary structure modelling produce predictions with a high variance in predictive accuracy, and we have little quantifiable knowledge about the reasons for these variances.

Results

In this paper we explore a number of factors which can contribute to poor RNA secondary structure prediction quality. We establish a quantified relationship between alignment quality and loss of accuracy. Furthermore, we define two new measures to quantify uncertainty in alignment-based structure predictions. One of the measures improves on the “reliability score” reported by PPfold, and considers alignment uncertainty as well as base-pair probabilities. The other measure considers the information entropy for SCFGs over a space of input alignments.

Conclusions

Our predictive accuracy improves on the PPfold reliability score. We can successfully characterize many of the underlying reasons for and variances in poor prediction. However, there is still variability unaccounted for, which we therefore suggest comes from the RNA secondary structure predictive model itself.