<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-12-429</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction</p>
         </title>
         <aug>
            <au id="A1"><snm>Janssen</snm><fnm>Stefan</fnm><insr iid="I1"/><email>sjanssen@techfak.uni-bielefeld.de</email></au>
            <au id="A2"><snm>Schudoma</snm><fnm>Christian</fnm><insr iid="I2"/><email>schudoma@mpimp-golm.mpg.de</email></au>
            <au id="A3" ca="yes"><snm>Steger</snm><fnm>Gerhard</fnm><insr iid="I3"/><email>steger@biophys.uni-duesseldorf.de</email></au>
            <au id="A4" ca="yes"><snm>Giegerich</snm><fnm>Robert</fnm><insr iid="I1"/><email>robert@techfak.uni-bielefeld.de</email></au>
         </aug>
         <insg>
            <ins id="I1"><p>Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany</p></ins>
            <ins id="I2"><p>Bioinformatics Group, Max Planck Institute of Molecular Plant Physiology, 14476 Potsdam, Germany</p></ins>
            <ins id="I3"><p>Institut f&#252;r Physikalische Biologie, Heinrich-Heine-Universit&#228;t D&#252;sseldorf, 40204 D&#252;sseldorf, Germany</p></ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2011</pubdate>
         <volume>12</volume>
         <issue>1</issue>
         <fpage>429</fpage>
         <url>http://www.biomedcentral.com/1471-2105/12/429</url>
         <xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-12-429</pubid><pubid idtype="pmpid">22051375</pubid></pubidlist></xrefbib>
      </bibl>
      <history><rec><date><day>10</day><month>6</month><year>2011</year></date></rec><acc><date><day>3</day><month>11</month><year>2011</year></date></acc><pub><date><day>3</day><month>11</month><year>2011</year></date></pub></history>
      <cpyrt><year>2011</year><collab>Janssen et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We extract four different models of the thermodynamic folding space which underlie the programs RNA<smcaps>FOLD</smcaps>, RNA<smcaps>SHAPES</smcaps>, and RNA<smcaps>SUBOPT</smcaps>. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the <it>shape probability shift </it>as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>Motivation</p>
            </st>
            <p>A wide variety of bioinformatics tools exist, which help to analyze RNA secondary structure based on an experimentally supported, thermodynamic model of RNA folding <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Typical tasks performed by such tools are</p>
            <p indent="1">&#8226; prediction of a single, "optimal" structure of minimal free energy,</p>
            <p indent="1">&#8226; computation of near-optimal structures, either by complete enumeration up to a certain energy threshold, or by sampling from the folding space,</p>
            <p indent="1">&#8226; computation of base pair probabilities and dot plots,</p>
            <p indent="1">&#8226; computation of representative structures of different abstract shapes, or</p>
            <p indent="1">&#8226; computation of Boltzmann probabilities, either of individual structures, or accumulated over all structures of the same abstract shape.</p>
            <p>From a macroscopic point of view, all these approaches are based on the same thermodynamic model, but when checking in detail, this does not hold. Algorithms for different tasks make certain assumptions about the folding space, where little is known to which extent these assumptions influence the outcome of the analysis.</p>
            <p>The present study is designed to fill this gap. We explicate the details of four different models of the RNA folding space, named NoDangle, OverDangle, MicroState and MacroState. They capture four different models of the folding space, as they are implemented in the programs RNA<smcaps>FOLD</smcaps><abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, RNA<smcaps>SHAPES</smcaps><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, and RNA<smcaps>SUBOPT</smcaps><abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.<sup>1 </sup>We compare the outcome of predictions from the different models, and evaluate them against three data sets derived from experimentally proved structures.</p>
         </sec>
         <sec>
            <st>
               <p>Goals of the evaluation</p>
            </st>
            <p>The goal of this study is not to define a "correct" or "best" way of modeling the RNA folding space. Different definitions may retain their merits in the light of different computational constraints. We want to explicate the differences in the results which are due to the choice of a particular model. Aside being interesting in its own right, this allows future algorithms designers to make a well-founded choice of the model they base their work on.</p>
            <p>How to compare the performance of different models? A first idea would be to evaluate them with respect to prediction of the structure of minimum free energy (MFE; for details see below), using a reference set of trusted structures. This has been done occasionally <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B5">5</abbr></abbrgrp>, and we will include such an evaluation here for the sake of completeness. However, MFE structure prediction is notorious in the sense that a slight offset in energy can lead to a radically different structure. This is a consequence of the underlying thermodynamic model, and not due to its inadequate implementation. For a more robust evaluation, we need a measure which constitutes a more comprehensive characteristic of the overall folding space of an RNA molecule, including evidence for competing near-optimal structures of significant structural variation.</p>
            <p>Abstract shapes of RNA <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B6">6</abbr></abbrgrp> provide such a measure. This approach provides two essential types of analysis: (1) to compute a handsome set of representative, near-optimal structures, which are different enough to be of interest, and (2) to compute shape probabilities, which accumulate individual Boltzmann probabilities over all structures of the same shape. The shape probability is a robust measure of structural well-definedness, and in contrast to folding energy, it is independent of base composition and meaningful for comparing foldings of different sequences with similar length.</p>
            <p>Types (1) and (2) of abstract shape analysis are achieved by different algorithms, using different models of the folding space, in the program RNA<smcaps>SHAPES</smcaps>. A similar situation prevails within the Vienna RNA package, where different models of the folding space are used with various functions of RNA<smcaps>FOLD</smcaps> and RNA<smcaps>SUBOPT</smcaps> under different parameter settings.</p>
            <p>For our evaluation, we implement probabilistic shape analysis in four different ways, three of which closely correspond to the folding space models implemented for MFE prediction in RNA<smcaps>FOLD</smcaps><sup>2</sup>, and two of which correspond to the algorithms used in RNA<smcaps>SHAPES</smcaps>. This set of programs will allow us to derive observations about the underlying folding space models.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <p>In this section, we recall the definitions underlying the thermodynamic model of RNA folding, and then proceed to specify four different implementations of this model.</p>
         <sec>
            <st>
               <p>The thermodynamic model</p>
            </st>
            <sec>
               <st>
                  <p>Free energy and partition function</p>
               </st>
               <p>Structure formation of a single-stranded nucleic acid sequence <it>x</it>--from an unfolded, random coil structure <it>c </it>into the folded structure <it>s</it>--is a standard equilibrium reaction with temperature-dependent free energy <inline-formula><m:math name="1471-2105-12-429-i1" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
</m:mrow>
</m:math></inline-formula> and equilibrium constant <it>K<sub>T</sub></it>:</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable class="aligned">
      <m:mtr>
         <m:mtd columnalign="right">
            <m:mi>c</m:mi>
         </m:mtd>
         <m:mtd columnalign="left">
            <m:mi>&#8652;</m:mi>
            <m:mi>s</m:mi>
         </m:mtd>
         <m:mtd columnalign="right"/>
      </m:mtr>
      <m:mtr>
         <m:mtd columnalign="right">
            <m:msub>
               <m:mrow>
                  <m:mi>K</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
            </m:msub>
         </m:mtd>
         <m:mtd columnalign="left">
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mfrac>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>s</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mi>c</m:mi>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mrow>
            </m:mfrac>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd columnalign="right">
            <m:mi mathvariant="normal">&#916;</m:mi>
            <m:msubsup>
               <m:mrow>
                  <m:mi>G</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mn>0</m:mn>
               </m:mrow>
            </m:msubsup>
         </m:mtd>
         <m:mtd columnalign="left">
            <m:mo class="MathClass-rel">=</m:mo>
            <m:mo class="MathClass-bin">-</m:mo>
            <m:mi>R</m:mi>
            <m:mi>T</m:mi>
            <m:mo class="qopname">ln</m:mo>
            <m:msub>
               <m:mrow>
                  <m:mi>K</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mi>T</m:mi>
               </m:mrow>
            </m:msub>
            <m:mspace width="2.77695pt" class="tmspace"/>
            <m:mo class="MathClass-punc">.</m:mo>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd columnalign="right"/>
      </m:mtr>
   </m:mtable>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>The number of possible secondary structures of a single sequence, i. e. the folding space <it>F</it>(<it>x</it>) of <it>x</it>, grows exponentially with the sequence length <it>n </it><abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. These possible structures <it>s<sub>i </sub></it>of a single sequence coexist in solution with concentrations dependent on their free energies &#916;<it>G</it><sup>0</sup>(<it>s<sub>i</sub></it>); that is, each structure is present as a fraction <inline-formula><m:math name="1471-2105-12-429-i3" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>s</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math></inline-formula> according to its Boltzmann probability</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i4" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mrow>
         <m:msub>
            <m:mrow>
               <m:mi>s</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mo class="qopname"> exp</m:mo>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mi mathvariant="normal">&#916;</m:mi>
               <m:msubsup>
                  <m:mrow>
                     <m:mi>G</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msubsup>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>s</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>R</m:mi>
               <m:mi>T</m:mi>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
   <m:mo class="MathClass-bin">&#8725;</m:mo>
   <m:mi>Q</m:mi>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>given by its molar Boltzmann weight <inline-formula><m:math name="1471-2105-12-429-i5" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mo class="qopname">exp</m:mo>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mi mathvariant="normal">&#916;</m:mi>
         <m:msubsup>
            <m:mrow>
               <m:mi>G</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mn>0</m:mn>
            </m:mrow>
         </m:msubsup>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:msub>
                  <m:mrow>
                     <m:mi>s</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>i</m:mi>
                  </m:mrow>
               </m:msub>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
         <m:mo class="MathClass-bin">&#8725;</m:mo>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>R</m:mi>
               <m:mi>T</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
</m:mrow>
</m:math></inline-formula> and the partition function <it>Q </it>for the ensemble of all possible structures</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i6" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>Q</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">all</m:mtext>
         </m:mstyle>
         <m:mspace width="2.77695pt" class="tmspace"/>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">structures&#160;</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="textsf" mathvariant="sans-serif">s</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">&#8712;</m:mo>
         <m:mi>F</m:mi>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">(</m:mtext>
         </m:mstyle>
         <m:mi>x</m:mi>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">)</m:mtext>
         </m:mstyle>
      </m:mrow>
   </m:munder>
   <m:mo class="qopname">exp</m:mo>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mi mathvariant="normal">&#916;</m:mi>
               <m:msubsup>
                  <m:mrow>
                     <m:mi>G</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msubsup>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>s</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>R</m:mi>
               <m:mi>T</m:mi>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
   <m:mspace width="2.77695pt" class="tmspace"/>
   <m:mo class="MathClass-punc">.</m:mo>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>The structure of lowest free energy is called the (thermodynamically) optimal structure or structure of minimum free energy (MFE).</p>
               <p>RNA secondary structures are conveniently represented as dot-bracket strings, such as</p>
               <p>
                  <display-formula id="M1">
                     <m:math name="1471-2105-12-429-i7" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
   </m:mstyle>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mo class="MathClass-punc">.</m:mo>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mrow>
                        <m:mo class="MathClass-open">(</m:mo>
                        <m:mrow>
                           <m:mrow>
                              <m:mo class="MathClass-open">(</m:mo>
                              <m:mrow>
                                 <m:mrow>
                                    <m:mo class="MathClass-open">(</m:mo>
                                    <m:mrow>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mrow>
                                          <m:mo class="MathClass-open">(</m:mo>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo class="MathClass-open">(</m:mo>
                                                <m:mrow>
                                                   <m:mrow>
                                                      <m:mo class="MathClass-open">(</m:mo>
                                                      <m:mrow>
                                                         <m:mo class="MathClass-punc">.</m:mo>
                                                         <m:mo class="MathClass-punc">.</m:mo>
                                                         <m:mo class="MathClass-punc">.</m:mo>
                                                      </m:mrow>
                                                      <m:mo class="MathClass-close">)</m:mo>
                                                   </m:mrow>
                                                </m:mrow>
                                                <m:mo class="MathClass-close">)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                          <m:mo class="MathClass-close">)</m:mo>
                                       </m:mrow>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                       <m:mrow>
                                          <m:mo class="MathClass-open">(</m:mo>
                                          <m:mrow>
                                             <m:mrow>
                                                <m:mo class="MathClass-open">(</m:mo>
                                                <m:mrow>
                                                   <m:mo class="MathClass-punc">.</m:mo>
                                                   <m:mrow>
                                                      <m:mo class="MathClass-open">(</m:mo>
                                                      <m:mrow>
                                                         <m:mrow>
                                                            <m:mo class="MathClass-open">(</m:mo>
                                                            <m:mrow>
                                                               <m:mo class="MathClass-punc">.</m:mo>
                                                               <m:mo class="MathClass-punc">.</m:mo>
                                                               <m:mo class="MathClass-punc">.</m:mo>
                                                               <m:mo class="MathClass-punc">.</m:mo>
                                                               <m:mo class="MathClass-punc">.</m:mo>
                                                            </m:mrow>
                                                            <m:mo class="MathClass-close">)</m:mo>
                                                         </m:mrow>
                                                      </m:mrow>
                                                      <m:mo class="MathClass-close">)</m:mo>
                                                   </m:mrow>
                                                   <m:mo class="MathClass-punc">.</m:mo>
                                                   <m:mo class="MathClass-punc">.</m:mo>
                                                   <m:mo class="MathClass-punc">.</m:mo>
                                                </m:mrow>
                                                <m:mo class="MathClass-close">)</m:mo>
                                             </m:mrow>
                                          </m:mrow>
                                          <m:mo class="MathClass-close">)</m:mo>
                                       </m:mrow>
                                       <m:mo class="MathClass-punc">.</m:mo>
                                    </m:mrow>
                                    <m:mo class="MathClass-close">)</m:mo>
                                 </m:mrow>
                              </m:mrow>
                              <m:mo class="MathClass-close">)</m:mo>
                           </m:mrow>
                        </m:mrow>
                        <m:mo class="MathClass-close">)</m:mo>
                     </m:mrow>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
   </m:mstyle>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>where matched parentheses indicate a base pair and dots indicate unpaired bases.</p>
            </sec>
            <sec>
               <st>
                  <p>Abstract shapes</p>
               </st>
               <p>Many of the possible structures differ from each other by only tiny structural rearrangements like addition or removal of a base pair, or a slight shift in position of a small bulge loop. Structures can be pooled according to their abstract shape. Generally, an abstract shape gives information about the arrangement of structural elements such as helices, but no concrete base pairs <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B6">6</abbr></abbrgrp>. The MFE structure within each shape class is called "shrep", which is short for shape representative structure. The partition function <it>Q<sub>p </sub></it>for the ensemble of all structures of shape <it>p </it>is</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i8" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:msub>
      <m:mrow>
         <m:mi>Q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">all&#160;structures&#160;</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mi>s</m:mi>
            </m:mrow>
            <m:mrow>
               <m:mi>i</m:mi>
            </m:mrow>
         </m:msub>
         <m:mo class="MathClass-rel">&#8712;</m:mo>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:munder>
   <m:mo class="qopname"> exp</m:mo>
   <m:mfenced separators="" open="(" close=")">
      <m:mrow>
         <m:mo class="MathClass-bin">-</m:mo>
         <m:mfrac>
            <m:mrow>
               <m:mi mathvariant="normal">&#916;</m:mi>
               <m:msubsup>
                  <m:mrow>
                     <m:mi>G</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>T</m:mi>
                  </m:mrow>
                  <m:mrow>
                     <m:mn>0</m:mn>
                  </m:mrow>
               </m:msubsup>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:msub>
                        <m:mrow>
                           <m:mi>s</m:mi>
                        </m:mrow>
                        <m:mrow>
                           <m:mi>i</m:mi>
                        </m:mrow>
                     </m:msub>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
            </m:mrow>
            <m:mrow>
               <m:mi>R</m:mi>
               <m:mi>T</m:mi>
            </m:mrow>
         </m:mfrac>
      </m:mrow>
   </m:mfenced>
   <m:mspace width="2.77695pt" class="tmspace"/>
   <m:mo class="MathClass-punc">.</m:mo>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>Of course, the structures from all shape classes sum up to the ensemble of all structures:</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i9" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>Q</m:mi>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">all&#160;shape&#160;classes&#160;</m:mtext>
         </m:mstyle>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:munder>
   <m:msub>
      <m:mrow>
         <m:mi>Q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:msub>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>and the probability of shape <it>p </it>is</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i10" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>P</m:mi>
   <m:mi>r</m:mi>
   <m:mi>o</m:mi>
   <m:mi>b</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:msub>
      <m:mrow>
         <m:mi>Q</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
   </m:msub>
   <m:mo class="MathClass-bin">&#8725;</m:mo>
   <m:mi>Q</m:mi>
   <m:mspace width="2.77695pt" class="tmspace"/>
   <m:mo class="MathClass-punc">.</m:mo>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>Shape abstraction can be defined in various ways. RNA<smcaps>SHAPES</smcaps> provides shape abstraction functions <it>&#960;</it><sub>1</sub>, ..., <it>&#960;</it><sub>5 </sub>which implement different levels of abstraction, with <it>&#960;</it><sub>5 </sub>being the most abstract. Shapes can be represented as strings, similar to structure representations, where a single pair of square brackets marks a helix (of any length), and an underscore marks a stretch of unpaired bases, also of any length. Levels of abstraction differ in the amount of information they retain about unpaired regions. The above RNA structure (1) is mapped to shape strings on abstraction levels 2 and 5 as follows:</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i11" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable class="gathered">
      <m:mtr>
         <m:mtd>
            <m:msub>
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mn>2</m:mn>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-punc">:</m:mo>
            <m:mspace width="1em" class="quad"/>
            <m:mspace width="1em" class="quad"/>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">[</m:mo>
               <m:mrow>
                  <m:mstyle class="text">
                     <m:mtext>_</m:mtext>
                  </m:mstyle>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow>
                        <m:mrow>
                           <m:mo class="MathClass-open">[</m:mo>
                           <m:mrow/>
                           <m:mo class="MathClass-close">]</m:mo>
                        </m:mrow>
                        <m:mrow>
                           <m:mo class="MathClass-open">[</m:mo>
                           <m:mrow>
                              <m:mstyle class="text">
                                 <m:mtext>_</m:mtext>
                              </m:mstyle>
                              <m:mrow>
                                 <m:mo class="MathClass-open">[</m:mo>
                                 <m:mrow/>
                                 <m:mo class="MathClass-close">]</m:mo>
                              </m:mrow>
                              <m:mstyle class="text">
                                 <m:mtext>_</m:mtext>
                              </m:mstyle>
                           </m:mrow>
                           <m:mo class="MathClass-close">]</m:mo>
                        </m:mrow>
                     </m:mrow>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">]</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:msub>
               <m:mrow>
                  <m:mi>&#960;</m:mi>
               </m:mrow>
               <m:mrow>
                  <m:mn>5</m:mn>
               </m:mrow>
            </m:msub>
            <m:mo class="MathClass-punc">:</m:mo>
            <m:mspace width="1em" class="quad"/>
            <m:mspace width="1em" class="quad"/>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">[</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow/>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">[</m:mo>
                     <m:mrow/>
                     <m:mo class="MathClass-close">]</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">]</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd/>
      </m:mtr>
   </m:mtable>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>Both shapes indicate that the structure is a so-called Y-shape, a multiloop with a two-way branch. This most abstract view is conveyed by abstraction level 5. The less abstract level 2 shape indicates, in addition, that the outer stem is interrupted by a bulge on the 5' side, and that the 3' branch inside the multiloop is interrupted by an internal loop. For a detailed definition of shape abstraction levels, see <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
            </sec>
            <sec>
               <st>
                  <p>Implementing the basic energy model - no dangling bases</p>
               </st>
               <p>In the usual approximation, the free energy of an individual structure <it>s </it>is the sum of the energetic contributions of all structural elements of <it>s</it>:</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i12" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>s</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">helices</m:mtext>
         </m:mstyle>
         <m:mspace width="2.77695pt" class="tmspace"/>
         <m:mi>j</m:mi>
      </m:mrow>
   </m:munder>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>j</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mo class="MathClass-bin">+</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">loops</m:mtext>
         </m:mstyle>
         <m:mspace width="2.77695pt" class="tmspace"/>
         <m:mi>k</m:mi>
      </m:mrow>
   </m:munder>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>k</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>with energy of an individual helix:</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i13" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mstyle class="text">
            <m:mtext class="textsf" mathvariant="sans-serif">helix</m:mtext>
         </m:mstyle>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big"> &#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mtable class="subarray-c" rowspacing="0" columnalign="center">
            <m:mtr>
               <m:mtd>
                  <m:mstyle class="text">
                     <m:mtext class="textsf" mathvariant="sans-serif">base&#160;pair</m:mtext>
                  </m:mstyle>
               </m:mtd>
            </m:mtr>
            <m:mtr>
               <m:mtd>
                  <m:mstyle class="text">
                     <m:mtext class="textsf" mathvariant="sans-serif">stacks&#160;</m:mtext>
                  </m:mstyle>
                  <m:mi>m</m:mi>
               </m:mtd>
            </m:mtr>
         </m:mtable>
      </m:mrow>
   </m:munder>
   <m:mi mathvariant="normal">&#916;</m:mi>
   <m:msubsup>
      <m:mrow>
         <m:mi>G</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>T</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>m</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>0</m:mn>
      </m:mrow>
   </m:msubsup>
   <m:mo class="MathClass-punc">.</m:mo>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>That is, the energy of a helix depends only on its type of base pairs (G:C, C:G, A:U, U:A, G:U, U:G) stacking on its neighboring base pair <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. The minimum length of a helix is two base pairs (one base pair stack). Single (lonely) pairs should not exist. The energy of a loop depends on its type (hairpin loop closed by a helix, internal and bulge loop closed by two helices, and multiloop or junction closed by more than two helices), the sequence(s) of loop nucleotides, and type of closing base pair(s). That is, the free energy of a given secondary structure <it>s </it>is obtained by decomposition of <it>s </it>into its structural elements and summation of values obtained by respective calls of the elementary energy functions of these elements as listed in Table <tblr tid="T1">1</tblr>. With the example shown in Figure <figr fid="F1">1</figr>, this would be three calls to <it>sr</it>_<it>energy </it>for the three base pair stacks (<inline-formula><m:math name="1471-2105-12-429-i14" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array">
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msup>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">A</m:mtext>
         </m:mstyle>
         <m:msup>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">C</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msub>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">U</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">G</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center"/>
   </m:mtr>
</m:mtable>
</m:math></inline-formula>, <inline-formula><m:math name="1471-2105-12-429-i15" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array">
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msup>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">C</m:mtext>
         </m:mstyle>
         <m:msup>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">C</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msub>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">G</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">G</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center"/>
   </m:mtr>
</m:mtable>
</m:math></inline-formula>, and <inline-formula><m:math name="1471-2105-12-429-i16" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array">
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msup>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">X</m:mtext>
         </m:mstyle>
         <m:msup>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">Y</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msub>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">Y</m:mtext>
         </m:mstyle>
         <m:msub>
            <m:mrow>
               <m:mstyle class="text">
                  <m:mtext class="texttt" mathvariant="monospace">X</m:mtext>
               </m:mstyle>
            </m:mrow>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msub>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center"/>
   </m:mtr>
</m:mtable>
</m:math></inline-formula>), a call to <it>termau</it>_<it>energy </it>for the terminal <inline-formula><m:math name="1471-2105-12-429-i17" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mtable equalrows="false" columnlines="none none none none none none none none none none none none none none none none none none none" equalcolumns="false" class="array">
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msup>
            <m:mrow/>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>5</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
         </m:msup>
         <m:mstyle class="text">
            <m:mtext class="texttt" mathvariant="monospace">A</m:mtext>
         </m:mstyle>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center">
         <m:msub>
            <m:mrow>
               <m:msup>
                  <m:mrow>
                     <m:mn>3</m:mn>
                  </m:mrow>
                  <m:mrow>
                     <m:mi>&#8242;</m:mi>
                  </m:mrow>
               </m:msup>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="texttt" mathvariant="monospace">U</m:mtext>
            </m:mstyle>
         </m:msub>
      </m:mtd>
   </m:mtr>
   <m:mtr>
      <m:mtd class="array" columnalign="center"/>
   </m:mtr>
</m:mtable>
</m:math></inline-formula> pair, and a call to <it>bl</it>_<it>energy </it>for a bulge loop with sequence <sup>5'</sup><monospace>N</monospace>--<monospace>N</monospace><sup>3' </sup>and closing pairs <inline-formula><m:math name="1471-2105-12-429-i18" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable>
      <m:mtr>
         <m:mtd>
            <m:mrow>
               <m:msup>
                  <m:mrow/>
                  <m:msup>
                     <m:mn>5</m:mn>
                     <m:mo>&#8242;</m:mo>
                  </m:msup>
               </m:msup>
               <m:mtext class="texttt" mathvariant="monospace">C</m:mtext>
            </m:mrow>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mrow>
               <m:msub>
                  <m:mrow/>
                  <m:msup>
                     <m:mn>3</m:mn>
                     <m:mo>&#8242;</m:mo>
                  </m:msup>
               </m:msub>
               <m:mtext class="texttt" mathvariant="monospace">G</m:mtext>
            </m:mrow>
         </m:mtd>
      </m:mtr>
   </m:mtable>
</m:mrow>
</m:math></inline-formula> and <inline-formula><m:math name="1471-2105-12-429-i19" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable>
      <m:mtr>
         <m:mtd>
            <m:mrow>
               <m:msup>
                  <m:mrow/>
                  <m:msup>
                     <m:mn>5</m:mn>
                     <m:mo>&#8242;</m:mo>
                  </m:msup>
               </m:msup>
               <m:mtext class="texttt" mathvariant="monospace">Y</m:mtext>
            </m:mrow>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mrow>
               <m:msub>
                  <m:mrow/>
                  <m:msup>
                     <m:mn>3</m:mn>
                     <m:mo>&#8242;</m:mo>
                  </m:msup>
               </m:msub>
               <m:mtext class="texttt" mathvariant="monospace">X</m:mtext>
            </m:mrow>
         </m:mtd>
      </m:mtr>
   </m:mtable>
</m:mrow>
</m:math></inline-formula>.</p>
               <tbl id="T1"><title><p>Table 1</p></title><caption><p>Elementary functions in the basic thermodynamic energy model</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>Function</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Description</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>sr</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>The most important source for stabilizing an RNA secondary structure is stacking of two (or more) base pairs.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>termau</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>A base pair A:U at the terminal end of a stacking region adds less stabilizing energy than <it>within </it>a stacking region.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>hl</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Stabilizing contribution for the loop-closing base pair stack plus destabilizing contribution for the hairpin loop region plus bonus energy for special loop sequence (e. g. extrastable tetra loops).</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>bl</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Analog to <it>hl</it>_<it>energy</it>, but for a destabilizing loop region bulged out to the left.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>br</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Symmetric case to <it>bl</it>_<it>energy</it>.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>il</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Analog to <it>hl</it>_<it>energy</it>, but with two destabilizing loop regions.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>ml</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Since a multiloop of <it>x </it>stems is less stable than <it>x </it>adjacent stems, it gets a penalty.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>ul</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Each stem in a multiloop gets an initial penalty.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>ss</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Regions of unpaired bases could get penalized, but we set this value to zero.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>sbase</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Same as <it>ss</it>_<it>energy</it>, but for a single unpaired base.</p>
         </c>
      </r>
   </tblbdy></tbl>
               <fig id="F1"><title><p>Figure 1</p></title><caption><p>Example on structure representations</p></caption><text>
   <p><b>Example on structure representations</b>. A sequence, shown in A), folds into a structure that is represented by the three equivalent illustrations in B-D). The structure consists of a helix with three base pairs (<monospace>ACC</monospace> paired with <monospace>GGU</monospace>), a bulge loop (<monospace>N--N</monospace>; <monospace>N</monospace> meaning aNy nucleotide), and a helix with two base pairs formed by any complementary nucleotides. The dashes designate omitted sequence stretches. The structure in B) is in dot-bracket notation; that is, dots mark unpaired nucleotides and pairs of opening and closing brackets mark a base pair. The structure in C) is the usual squiggly representation. D) is the tree representation of the same structure: a stacked region (sr) is formed by an A:U pair stacked on top a bulge loop (bl) including two stacking pairs (C:G/C:G) and a loop region with one or more residues (r) on the left (5') side.<it/><monospace/><monospace/><monospace/><monospace/><monospace/><monospace/><monospace/><it/> The helix continues with a "closed" structural element (which is defined as any substructure starting with a base stack).</p>
</text><graphic file="1471-2105-12-429-1" hint_layout="single"/></fig>
            </sec>
            <sec>
               <st>
                  <p>Implementing the full energy model - with dangling bases</p>
               </st>
               <p>In addition to the basic energy model described above, unpaired bases at the end of a helix can stabilize the helix by stacking on the terminal base pair <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp><sup>3</sup>.</p>
               <p>Introducing dangling bases effectively refines our notion of structure. Any secondary structure, as defined solely by its set of base pairs, can now have several variants according to different choices of dangling bases. Such refinement can be reflected in our structure representation by replacing certain dot symbols by "d", indicating a base dangling onto a helix to its left, and "b" for a base dangling onto a helix to its right. For example, a structure like</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i20" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
   </m:mstyle>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mo class="MathClass-punc">.</m:mo>
               <m:mo class="MathClass-punc">.</m:mo>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mrow>
                        <m:mo class="MathClass-open">(</m:mo>
                        <m:mrow>
                           <m:mo class="MathClass-punc">.</m:mo>
                           <m:mo class="MathClass-punc">.</m:mo>
                           <m:mo class="MathClass-punc">.</m:mo>
                        </m:mrow>
                        <m:mo class="MathClass-close">)</m:mo>
                     </m:mrow>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-punc">.</m:mo>
               <m:mrow>
                  <m:mo class="MathClass-open">(</m:mo>
                  <m:mrow>
                     <m:mrow>
                        <m:mo class="MathClass-open">(</m:mo>
                        <m:mrow>
                           <m:mo class="MathClass-punc">.</m:mo>
                           <m:mo class="MathClass-punc">.</m:mo>
                           <m:mo class="MathClass-punc">.</m:mo>
                        </m:mrow>
                        <m:mo class="MathClass-close">)</m:mo>
                     </m:mrow>
                  </m:mrow>
                  <m:mo class="MathClass-close">)</m:mo>
               </m:mrow>
               <m:mo class="MathClass-punc">.</m:mo>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mstyle class="text">
      <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
   </m:mstyle>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>now has dangle variants such as</p>
               <p>
                  <display-formula>
                     <m:math name="1471-2105-12-429-i21" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mtable class="gathered">
      <m:mtr>
         <m:mtd>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">(</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">(</m:mo>
                     <m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">d</m:mtext>
                        </m:mstyle>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                     </m:mrow>
                     <m:mo class="MathClass-close">)</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">)</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">(</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">(</m:mo>
                     <m:mrow>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                     </m:mrow>
                     <m:mo class="MathClass-close">)</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">)</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">(</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">(</m:mo>
                     <m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">db</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                     </m:mrow>
                     <m:mo class="MathClass-close">)</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">)</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">(</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">(</m:mo>
                     <m:mrow>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">d</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                     </m:mrow>
                     <m:mo class="MathClass-close">)</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">)</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8220;</m:mtext>
            </m:mstyle>
            <m:mrow>
               <m:mo class="MathClass-open">(</m:mo>
               <m:mrow>
                  <m:mrow>
                     <m:mo class="MathClass-open">(</m:mo>
                     <m:mrow>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mo class="MathClass-punc">.</m:mo>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mstyle class="text">
                           <m:mtext class="textsf" mathvariant="sans-serif">b</m:mtext>
                        </m:mstyle>
                        <m:mrow>
                           <m:mo class="MathClass-open">(</m:mo>
                           <m:mrow>
                              <m:mrow>
                                 <m:mo class="MathClass-open">(</m:mo>
                                 <m:mrow>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                    <m:mo class="MathClass-punc">.</m:mo>
                                 </m:mrow>
                                 <m:mo class="MathClass-close">)</m:mo>
                              </m:mrow>
                           </m:mrow>
                           <m:mo class="MathClass-close">)</m:mo>
                        </m:mrow>
                        <m:mo class="MathClass-punc">.</m:mo>
                     </m:mrow>
                     <m:mo class="MathClass-close">)</m:mo>
                  </m:mrow>
               </m:mrow>
               <m:mo class="MathClass-close">)</m:mo>
            </m:mrow>
            <m:mstyle class="text">
               <m:mtext class="textsf" mathvariant="sans-serif">&#8221;</m:mtext>
            </m:mstyle>
         </m:mtd>
      </m:mtr>
      <m:mtr>
         <m:mtd/>
      </m:mtr>
   </m:mtable>
</m:mrow>
</m:math>
                  </display-formula>
               </p>
               <p>and 31 more. Each end of a helix can have dangling bases, except an end which leads to the hairpin loop. In this case, energy contributions from dangling bases are already incorporated in the energy parameters for the loops.</p>
               <p>Given a concrete secondary structure, it is no problem to consider all possible dangles and compute the optimal energy for this structure. The program RNA<smcaps>EVAL</smcaps> from the Vienna Package can be used for this purpose. However, for structure prediction from a primary RNA sequence, dangle means trouble, as we shall see shortly.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Modeling folding spaces with tree grammars</p>
            </st>
            <sec>
               <st>
                  <p>Tree representation of structures</p>
               </st>
               <p>All approaches using the thermodynamic model are implemented via dynamic programming. Recursively, structures are composed from smaller substructures. Such a dynamic programming algorithm always has an underlying grammar, which describes all the candidates in the folding space of a given RNA sequence. Hence, by extracting the grammars behind different algorithms, we can analyze the differences in their respective folding space in a precise way, and without obscuring implementation detail.</p>
               <p>The grammars we use are tree grammars. Non-terminal symbols designate different components of secondary structure, such as a stacking region or a bulge loop. Function symbols in the tree grammar are used to indicate how structures are built up from smaller components. For example, a snippet of a tree structure such as shown in Figure <figr fid="F1">1</figr> designates at its bottom an unpaired stretch of one or more bases (<monospace>r</monospace>), 5' of a <it>closed </it>substructure of any type. This situation is indicated by the function symbol <b>bl</b>, which stands for "bulge left". The unpaired stretch and the substructure is surrounded by two stacking (<monospace>C:G</monospace>) base pairs, and enclosed in yet another base pair, added by function <b>sr</b>, which extends a "stacking region". These functions can be seen as actual constructors of a tree-like data structure, representing secondary structures. They can (and will) also be seen as functions, which all call upon the energy functions of the thermodynamic model, to compute either free energies or their corresponding Boltzmann weights. We can also interpret them as functions which count base pairs in the structure they build, or compose the dot-bracket string for that structure, compute their abstract shape, and so on. Modeling structures as trees built from functions that can be interpreted in different ways provides a uniform and flexible formalism for many purposes.</p>
            </sec>
            <sec>
               <st>
                  <p>From tree grammars to folding algorithms</p>
               </st>
               <p>Tree grammars modeling the folding space of RNA essentially constitute executable code. They can be literally transcribed into a language supporting the algebraic dynamic programming technique <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. We use the language GAP-L as provided in the recent Bellman's GAP programming system <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. This approach is essential for the study at hand. It takes from us not only the burden to implement and debug dynamic programming recurrences for each of the four algorithms. It also guarantees that the different algorithms correctly implement their respective models, share the energy model, are implemented with the same degree of optimization, and are independent of the programming skills of a bunch of graduate students.</p>
            </sec>
            <sec>
               <st>
                  <p>Grammars and their relation to established structure prediction programs</p>
               </st>
               <p>We will present four grammars, NoDangle, OverDangle, MicroState and MacroState. The first three implement the folding space of RNA<smcaps>FOLD</smcaps> used with options -d0, -d2, and -d1, respectively. The grammars MicroState and MacroState implement the folding space of RNA<smcaps>SHAPES</smcaps> in its two functions. All four grammars will then be empowered with shape abstraction, and are used in our evaluation for computing shape probabilities under the different models.</p>
               <p>All grammars use the same energy parameters, but in a different way. The 16 functions of the energy model, as specified in Tables <tblr tid="T1">1</tblr> and <tblr tid="T2">2</tblr>, are used in different combinations by the evaluation functions in the grammars. For example, in all grammars the function <it>ml </it>calls the model function <it>termau</it>_<it>energy</it>, <it>sr</it>_<it>energy</it>, and <it>ml</it>_<it>energy</it>. Table <tblr tid="T3">3</tblr> provides the cross-references between the energy functions in our programs to be described below, and the energy functions of the thermodynamic model.</p>
               <tbl id="T2"><title><p>Table 2</p></title><caption><p>Energy functions for dangling bases</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>Function</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>Description</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dl</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>A single base left of a closed substructure can dangle onto this stack and thus might further stabilize it.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dr</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Symmetric case to <it>dl</it>_<it>energy</it>.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>ext</it>_<it>mismatch</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Two bases left and right of a stack, which do not form a basepair (they mismatch), can dangle from both sides to the stack.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dli</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>A multiloop is closed by one stack. A single base at the inside of the multiloop and directly next to the closing stack might dangle from left onto this stack. The energy values are the same as <it>dr</it>_<it>energy</it>, but for a reversed subsequence.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dlr</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Symmetric case to <it>dli</it>_<it>energy</it>.</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>ml</it>_<it>mismatch</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>Two bases on both inner sides of a multiloop closing stack may dangle from inside onto this stack, but do not form a basepair (mismatch).</p>
         </c>
      </r>
   </tblbdy></tbl>
               <tbl id="T3"><title><p>Table 3</p></title><caption><p>Cross-reference between the energy functions in our programs, and which energy contributions (model functions) they call upon.</p></caption><tblbdy cols="5">
      <r>
         <c ca="left">
            <p>
               <b>Function</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>Used in evaluation function</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>
               <b>NoDangle</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>OverDangle</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>MicroState</b>
            </p>
         </c>
         <c ca="left">
            <p>
               <b>MacroState</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>termauenergy</it>
            </p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>drem</p>
         </c>
         <c ca="left">
            <p>drem</p>
         </c>
         <c ca="left">
            <p>drem</p>
         </c>
         <c ca="left">
            <p>drem</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>edl</p>
         </c>
         <c ca="left">
            <p>edl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>edr</p>
         </c>
         <c ca="left">
            <p>edr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>edlr</p>
         </c>
         <c ca="left">
            <p>edlr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dl</it>_<it>energy</it></p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>edl</p>
         </c>
         <c ca="left">
            <p>edl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ambd</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ambd'</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>acomb</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dr</it>_<it>energy</it></p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>edr</p>
         </c>
         <c ca="left">
            <p>edr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ambd</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ambd'</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>acomb</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ext_mismatch_energy</it>
            </p>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>drem</p>
         </c>
         <c ca="left">
            <p>edlr</p>
         </c>
         <c ca="left">
            <p>edlr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dli</it>_<it>energy</it></p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>dri</it>_<it>energy</it></p>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ml_mismatch_energy</it>
            </p>
         </c>
         <c ca="left">
            <p/>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p><it>sr</it>_<it>energy</it></p>
         </c>
         <c ca="left">
            <p>sr</p>
         </c>
         <c ca="left">
            <p>sr</p>
         </c>
         <c ca="left">
            <p>sr</p>
         </c>
         <c ca="left">
            <p>sr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>hl_energy</it>
            </p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
         <c ca="left">
            <p>hl</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>bl_energy</it>
            </p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
         <c ca="left">
            <p>bl</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>br_energy</it>
            </p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
         <c ca="left">
            <p>br</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>il_energy</it>
            </p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
         <c ca="left">
            <p>il</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ml_energy=3.4</it>
            </p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
         <c ca="left">
            <p>mldl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
         <c ca="left">
            <p>mldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
         <c ca="left">
            <p>mldlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladlr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mldladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladldr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladr</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>mladl</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ul_energy=0.4</it>
            </p>
         </c>
         <c ca="left">
            <p>incl</p>
         </c>
         <c ca="left">
            <p>incl</p>
         </c>
         <c ca="left">
            <p>incl</p>
         </c>
         <c ca="left">
            <p>incl</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
         <c ca="left">
            <p>ml</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ssadd</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>ss_energy=0</it>
            </p>
         </c>
         <c ca="left">
            <p>addss</p>
         </c>
         <c ca="left">
            <p>addss</p>
         </c>
         <c ca="left">
            <p>addss</p>
         </c>
         <c ca="left">
            <p>addss</p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c ca="left">
            <p>ssadd</p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>
               <it>sbase_energy=0</it>
            </p>
         </c>
         <c ca="left">
            <p>sadd</p>
         </c>
         <c ca="left">
            <p>sadd</p>
         </c>
         <c ca="left">
            <p>sadd</p>
         </c>
         <c ca="left">
            <p>sadd</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>This table shows the use of the very same energy functions for all grammars. Energy differences only stem from different combinations. In the first column, we list the energy model functions. The next four columns contain the evaluation functions of the four grammars.</p>
      <p>To retrieve the energy of the example structure of Figure 1 for NoDangle, you should read the table like this: The first evaluation function of the structure is <it>sr</it>. Look for all rows in column two where <it>sr </it>appears. It is just the case for <it>sr</it>_<it>energy</it>. Next is <it>bl</it>, which again shows up in the row for <it>sr</it>_<it>energy </it>but also for <it>bl</it>_<it>energy</it>. The concrete energy values depend on the concrete input bases, thus one should understand the model functions as table look-ups with the bases as parameters. The energy of the whole structure is just the sum of all local energy contributions.</p>
      <p>Some evaluation functions do not use model functions. The four variants of the evaluation function <it>cadd </it>and <it>combine </it>just add energies from their left and right substructures. <it>Trafo </it>and <it>incl </it>do not change the energy value at all and <it>nil </it>simply returns 0.</p>
   </tblfn></tbl>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Model NoDangle</p>
            </st>
            <p>NoDangle is our grammar incorporating the elementary energy model, without considering dangling bases at all. It corresponds to the model underlying RNA<smcaps>FOLD</smcaps> when used with option -noLP -d0<sup>4</sup>. It is also used in RNA<smcaps>SUBOPT</smcaps>. We give a narrative explanation of how this grammar works.</p>
            <p>Each complete structure is a <b><it>struct</it></b>, i. e. it is derived from the axiom of the grammar (see Figure <figr fid="F2">2</figr>). It might have leading unpaired bases (<b>sadd</b>), hold one or more closed substructures (non-terminal <it>dangle</it>, function <b>cadd</b>), or just end with the empty word (<b>nil</b>). A <it>dangle </it>is a closed substructure whose directly neighbored bases <it>might </it>dangle onto the stack of base pairs. We keep the name <it>dangle </it>for consistency with the other grammars, but no dangle energies are considered in NoDangle; the function <b>drem </b>simply passes on the energy of its <it>closed </it>substructure, which may include a penalty for a terminal A:U pair if appropriate.</p>
            <fig id="F2"><title><p>Figure 2</p></title><caption><p>Grammar for "NoDangle" and "OverDangle"</p></caption><text>
   <p><b>Grammar for "NoDangle" and "OverDangle"</b>. The axiom is <b><it>struct</it></b>. Alternative productions starting at the same non-terminal are separated by vertical bars. Terminals, <monospace>b</monospace> (a single base), <monospace>r</monospace> (a region of bases), <monospace>&#949;</monospace> (the empty word) and <monospace>loc</monospace> (the position of a neighbored subword), are colored in blue. Green algebra function names, e. g. <it>sadd </it>or <it>hl</it>, help to write the structures as trees, and are used to associate thermodynamic energies with the structures. Magenta colored words beneath non-terminals are filters, e. g. "stackpairing" requires that the two leftmost bases of the substructure can make base pairs with the two rightmost ones. All different secondary structures for a given RNA sequence, i. e. its complete folding space, can be enumerated by parsing the sequence with grammar NoDangle. The grammar is non-ambiguous in the sense that each structure is found exactly once.</p>
</text><graphic file="1471-2105-12-429-2" hint_layout="double"/></fig>
            <p>A <it>closed </it>substructure is a <it>stack </it>of base pairs which eventually leads to one of five structural motifs: hairpin loop (<it>hairpin</it>), bulge to the left (<it>leftB</it>), bulge to the right (<it>rightB</it>), internal loop (<it>iloop</it>) or <it>multiloop</it>. The multiloop is a concatenation (<it>ml</it>_<it>comps </it>and <it>ml</it>_<it>comps1</it>) of two or more substructures, embraced by one closing stack. Note that all motifs have at least two closing base pairs which form a stack. This implements the convention of disallowing lonely pairs. The helix initiated by two closing pairs can be elongated by <b>sr</b>. A region (<monospace>r</monospace>) is a non-empty stretch of unpaired bases (<monospace><b>b</b></monospace>), whose length can be further constrained, e. g. to be at most 30 bases (<it>r30</it>) for internal loops or at least 3 bases (<it>r3</it>) for a hairpin loop.</p>
            <p>The algebra functions <b>drem </b>and <b>ml </b>control the dangling behavior, which is the only difference between NoDangle and OverDangle. In NoDangle, they do not make any dangling energy contributions at all.</p>
         </sec>
         <sec>
            <st>
               <p>Model OverDangle</p>
            </st>
            <p>OverDangle is the grammar which considers dangling base energies in a simplified form. It corresponds to RNA<smcaps>FOLD</smcaps> called with options -noLP -d2<sup>5</sup>. The grammar itself is identical to NoDangle (cf. Figure <figr fid="F2">2</figr>). It computes the same folding space, but evaluates energies differently. It assumes an energy contribution from dangling bases on every side of a helix, even if a base is not available for dangling, for example because it is itself engaged in another helix, or already dangling there. The algebra functions <b>drem </b>and <b>ml </b>control the dangling behavior, which is the only difference between NoDangle and OverDangle. In OverDangle <b>drem </b>and <b>ml </b>always adds dangling energies for left and right dangles. This is why the production using <b>drem </b>uses two <it>loc </it>symbols: <it>loc </it>recognizes the empty word, and returns its position in the sequence. These positions are used by <b>drem </b>to look at the two bases to the left and right of the <it>closed </it>substructure.</p>
            <p>This "overdangling" model is used because a correct treatment of dangles is much more complicated, as we shall see below. As a plausibility argument in favor of this heuristic, one may say that when a base is overdangled, for example between two adjacent helices, as with the midpoint in "<monospace>((...)).((...))</monospace>", this can be seen as a bonus for co-axial stacking of the two helices. Including full co-axial stacking could be considered as a further refinement of the folding space beyond the MicroState model, which will be described below. Still, due to overdangling, the MFE energy value computed may be smaller than actually assigned by the thermodynamic model to the underlying structure. Partition function computations in RNA<smcaps>FOLD</smcaps> use the OverDangle approach, and so does RNA<smcaps>SUBOPT</smcaps> with option -d2 (and even -d1, but see below).</p>
            <p>Would we use both NoDangle and OverDangle to produce a list of all structures in the folding space, sorted by free energy, these lists would hold the same structures, but in a different order. The true MFE structure (under the full model with correct dangles) will be near the front of each list, but it is not guaranteed to come out on first place. Our next two grammars are designed to achieve this goal.</p>
         </sec>
         <sec>
            <st>
               <p>Model MicroState</p>
            </st>
            <p>Grammar MicroState is a grammar which refines our model of a secondary structure. It corresponds to <monospace>RNA<smcaps>FOLD</smcaps> -noLP -d1</monospace><sup>6 </sup>and is used in the 2004 release of RNA<smcaps>SHAPES</smcaps><abbrgrp><abbr bid="B3">3</abbr></abbrgrp> for the computation of representative structures of different shape.</p>
            <p>MicroState has separate rules for a helix end with two bases, one base or no base dangling onto it (see Figure <figr fid="F3">3</figr>). These four cases compete with each other for minimum free energy. If surrounding bases are already base paired, only the <b>drem </b>case applies (no dangles). If it is decided (say) that the left neighboring base dangles onto the helix, then this base is not available for also dangling on another helix. In this way, grammar MicroState correctly finds the structure of minimal free energy, and could, in principle, also explicitly report the optimal dangles, as in "<monospace>..b((...))d((...))...</monospace>".</p>
            <fig id="F3"><title><p>Figure 3</p></title><caption><p>Grammar MicroState extends the rules of grammars NoDangle or OverDangle for the non-terminal symbols "dangle" and "multiloop"</p></caption><text>
   <p><b>Grammar MicroState extends the rules of grammars NoDangle or OverDangle for the non-terminal symbols "dangle" and "multiloop"</b>. Instead of just one way, we now have four alternatives to dangle bases onto a closed substructure: Both neighboring bases do not dangle (<it>drem </it>and <it>ml</it>), only the left neighbored base dangles onto the stack (<it>edl </it>and <it>mldl</it>), only the right one (<it>edr </it>and <it>mldr</it>), or both ones (<it>edlr </it>and <it>mldlr</it>).</p>
</text><graphic file="1471-2105-12-429-3" hint_layout="double"/></fig>
            <p>All variants of the same secondary structure, augmented with different dangles, are now separate members of the folding space. In contrast to the classical model, accounting only for base pairs, we call them "microstates". Let us derive a rough estimate of this folding space enlargement. The size of the folding space for a sequence of length <it>n </it>grows asymptotically with <it>a </it>&#183; <it>b<sup>n </sup></it>&#183; <it>n </it><sup>-3/2</sup>, with <it>b </it>= 1.44358 and <it>a </it>= 3.45373 <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. A structure has, on average, <it>k</it>(<it>n</it>) helices, where <it>k </it>grows with <it>n</it>. Each helix end has up to four ways to play with the dangles, but helix ends in hairpin loops do not count. Directly adjacent helices further reduce the number of dangling alternatives.</p>
            <p>Let us, for simplicity, assume that an helix has 4 dangle variants on average. Then, the above formula changes for the number of microstates to <it>a </it>&#183; 4<sup><it>k</it>(<it>n</it>) </sup>&#183; <it>b<sup>n </sup></it>&#183; <it>n </it><sup>-3/2</sup>. An empirical measurement is shown in Figure <figr fid="F4">4</figr>. From the measurements, and for their particular data sequences and lengths, we can estimate <inline-formula><m:math name="1471-2105-12-429-i22" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>k</m:mi>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">&#8776;</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mi>n</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mn>15</m:mn>
      </m:mrow>
   </m:mfrac>
</m:mrow>
</m:math></inline-formula>. For a sequence of length 100, for example, we see an increase by a factor of 10<sup>4</sup>. Clearly, this is a substantial enlargement of the folding space, and different structures are affected to a different extent. (For example, the open structure (no base pairs) gives rise to only one microstate.)</p>
            <fig id="F4"><title><p>Figure 4</p></title><caption><p>Growth of folding spaces for all four grammars</p></caption><text>
   <p><b>Growth of folding spaces for all four grammars</b>. We used uniformly distributed random sequences, with step-size 5 bp. The number of secondary structures heavily depends on sequence composition, thus we took the average over 100 sequences per data point. Curves for "MacroState" and "OverDangle" are not visible, because they are perfectly overlayed by "NoDangle", i. e. all three folding spaces have exactly the same size.</p>
</text><graphic file="1471-2105-12-429-4" hint_layout="single"/></fig>
            <p>This enlargement of the search space is not a problem for MFE structure prediction. The dynamic programming algorithm derived from the grammar MicroState only does a constant amount of extra work compared to NoDangle and OverDangle. But a severe problem arises with the desire to investigate near-optimal structures. The roughly 4<it><sup>k </sup></it>microstates of an optimal structure with <it>k </it>helices crowd the near-optimal folding space, while representing the same structure in the non-dangling sense. Enumerating suboptimals returns a tremendous amount of useless information. RNA<smcaps>SUBOPT</smcaps> therefore uses OverDangle for enumeration, even when option -d1 is specified. Afterwards, it re-evaluates the energy of predicted structures using correct dangling. Hence, the ranking of structures may change. Occasionally, we observe that the energy of the true MFE structure is so much above the energy of other, overdangled structures that it falls above the energy threshold for enumeration and is not returned at all.<sup>7</sup></p>
            <p>The second problem arises with computations that are based on Boltzmann statistics. The partition function <it>Q </it>sums up the Boltzmann-weighted energies of all members in the folding space. Each secondary structure contributes to the partition function as many times as it has microstates, hence the result would be skewed towards structures with many microstates. The significance of this bias is hard to judge<sup>8</sup>, and up to this study, it could not be evaluated empirically. For this reason, RNA<smcaps>FOLD</smcaps> does not support partition function computation with the MicroState model (option -d1).</p>
            <p>Fortunately, the partition function with correct dangles, avoiding overdangling as well as explosion of the folding space, can also be computed. To keep the folding space simple, we need a more sophisticated grammar: MacroState.</p>
         </sec>
         <sec>
            <st>
               <p>Model MacroState</p>
            </st>
            <p>Grammar MacroState (see Figure <figr fid="F5">5</figr>) follows the overall pattern of the other grammars, but is much more refined. This grammar was designed originally with the 2006 release of RNA<smcaps>SHAPES</smcaps><abbrgrp><abbr bid="B6">6</abbr></abbrgrp> to compute complete probabilistic shape analysis. Its rules are written to record and distinguish the situation where a helix (1) ends with a base pair, (2) already has a single unpaired base to its right or left, or (3) has several unpaired bases on either side. No dangle energies are added in cases (1) and (3), and in case (2), all possible dangle variants (up to four microstates) are evaluated and minimized over while considering the corresponding macrostate. This leads to a much larger number of non-terminal symbols and functions in the grammar. MacroState has 25 non-terminal symbols and 32 functions, compared to NoDangle with 11 non-terminals and 12 functions.</p>
            <fig id="F5"><title><p>Figure 5</p></title><caption><p>"MacroState" grammar</p></caption><text>
   <p><b>"MacroState" grammar</b>. The color code is identical to Figure 2. The basic structure of the "MacroState" grammar is inherited from the previous three grammars, but it has a more complex distinction of cases for dangling bases. "MacroState" has to consider all the different dangling situations as in "MicroState", but its search space is restricted to the <it>k</it>(<it>n</it>)-times smaller folding space of the input sequence. To achieve these contradicting goals, dangling alternatives do not exist as search space candidates but are implicitly examined within the evaluation algebra. The grammar has to ensure that a substructure is of a defined dangling type whenever its energy or partition function value is used in an algebra evaluation function. We know that any helix derivated from <it>nodg </it>has no unpaired bases to its left or right, while helices from <it>edgl</it>, <it>edgr </it>or <it>edglr </it>have exactly one unpaired base dangling from left, right or exactly two unpaired bases dangling from both sides, respectively. In all four cases, there is no unpaired base left for a further dangling. Care must be taken, where we can not be sure if e. g. the leftmost unpaired base of a <it>block</it>_<it>dl </it>derivation is free to dangle to some helix to its left. The unpaired base would be available for a dangling if we use <it>ssadd</it>, but is occupied in <it>incl </it>situations. This uncertainty is passed to every calling function, but with a clever grammar design we can at least ensure that its type does not change. For example every <it>mc1 </it>or <it>mcadd2 </it>derivation contains one or more helices with one or more unpaired bases at its 5' end and definitely no unpaired base at its 3' end. Furthermore <it>mc2 </it>and <it>mcadd1 </it>always have no unpaired bases to both sides, <it>mc3 </it>or <it>mcadd4 </it>have one or more unpaired bases only at its 3' end and finally <it>mc4 </it>or <it>mcadd3 </it>are known to have one or more unpaired bases to both ends. The benefit of these distinctions can be demonstrated with the multiloop functions <it>mldl </it>and <it>mladl</it>. The important base is the one that is directly left to the <it>mc1 </it>or <it>mc2 </it>substructure. In principle, it can either dangle to the left, that is the closing stem of the multiloop, or the right, that is the leftmost helix within the multiloop. Actually, for <it>mldl </it>our base of interest can only dangle to the left, because every <it>mc1 </it>derivation already has at least one further base in front of the first inner helix. For <it>mladl </it>we truly have an <b>a</b>mbiguous situation, where the base of interest could dangle to one of both sides. Please note that <it>mldl </it>and <it>mladl </it>correspond to two different dot-bracket structures. <it>mldl </it>handles macrostates of the type "<monospace>((...</monospace>" including microstates "<monospace>((...</monospace>" and "<monospace>((d..</monospace>", whereas <it>mladl </it>handles macrostates of type "<monospace>((.((...</monospace>" and includes the microstates "<monospace>((.((...</monospace>", "<monospace>((d((...</monospace>", and "<monospace>((b((...</monospace>". The mfe algebra function locally chooses the variant with the better free energy, even if a global analysis would reveal that the locally worse structure would become MFE in the end. This constitutes a rare case where the MFE structure may be missed. Our partition function algebra correctly keeps track of these situations.</p>
</text><graphic file="1471-2105-12-429-5" hint_layout="double"/></fig>
            <p>The important feature of MacroState is that for any sequence, it defines the identical folding space as NoDangle. This is hard to believe when just looking at the grammar, but has been shown in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and is further demonstrated by the measurements shown in Figure <figr fid="F4">4</figr>. The size of the folding space, as defined by MacroState, agrees with that of NoDangle and OverDangle not only on average, but also on each individual sequence.</p>
            <p>What is the effect of using either MicroState or MacroState? Does it really matter? Table <tblr tid="T4">4</tblr> shows an extreme example of how the choice of the state space affects the computed probabilities:</p>
            <tbl id="T4"><title><p>Table 4</p></title><caption><p>
   <monospace>Extreme probability shift example</monospace>
</p></caption><tblbdy cols="3">
      <r>
         <c ca="center" cspan="3">
            <p>GACCAAAGCCUUUGUCCCACAAAUUGCGAUCGCGUCGCGGAGC</p>
         </c>
      </r>
      <r>
         <c ca="right">
            <p><b>MacroState prob</b>.</p>
         </c>
         <c ca="right">
            <p><b>MicroState prob</b>.</p>
         </c>
         <c ca="left">
            <p>
               <b>shape class</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="3">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="right">
            <p>58.44%</p>
         </c>
         <c ca="right">
            <p>32.58%</p>
         </c>
         <c ca="left">
            <p>
               <monospace>[][]</monospace>
            </p>
         </c>
      </r>
      <r>
         <c ca="right">
            <p>29.32%</p>
         </c>
         <c ca="right">
            <p>63.43%</p>
         </c>
         <c ca="left">
            <p>
               <monospace>[[][]]</monospace>
            </p>
         </c>
      </r>
      <r>
         <c ca="right">
            <p>12.24%</p>
         </c>
         <c ca="right">
            <p>03.99%</p>
         </c>
         <c ca="left">
            <p>
               <monospace>[]</monospace>
            </p>
         </c>
      </r>
   </tblbdy></tbl>
            <p>In this example, 40% of the probability mass is shifted by switching models, causing the order of the two top-ranking shapes to be reversed. To find out whether this situation is the exception or the rule is a main motivation of this study.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results &amp; Discussion</p>
         </st>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <p>The four data sets used in this study, DARTS, FR3D:3A, FR3D:4A, and RNAstrand:91 are based on RNA 3D structure data sets prepared in the context of previously published studies.</p>
            <sec>
               <st>
                  <p>Structures drawn from PDB</p>
               </st>
               <p>We examined three datasets - DARTS, FR3D:3A, and FR3D:4A- based on RNA 3D structural data sets prepared in the context of previously published studies. All three original data sets were created in order to reflect the currently available structural repertoire of RNA molecules as given by structures solved experimentally by X-ray and NMR analysis.</p>
               <p>The DARTS set was used for the analysis and classification of RNA tertiary structures in <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. It was built from all structures available in the March 2007 version of the Protein Data Bank (PDB) <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. The DARTS data set is available at <url>http://bioinfo3d.cs.tau.ac.il/DARTS</url> and contains 244 structures. The creation of this data set involved dedicated structural comparisons to ensure pairwise structural and sequence variability. Unfortunately, the DARTS database is not updated anymore and therefore is limited to data deposited in the PDB before March 2007.</p>
               <p>The two FR3D data sets <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp> are representative sets based on all RNA X-ray structures with a resolution of up to 3 &#197; (246 structures containing 653 chains) and up to 4 &#197; (293 structures containing 764 chains), respectively, that were contained in the PDB in 2010. Both sets contain one representative structure for each group of RNA structures found similar (or identical) according to the employed sequence (&gt; 95% identity) and structural (cf. <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>) similarity cutoffs. Both data sets FR3D:3A and FR3D:4A are available as weekly updated lists at <url>http://rna.bgsu.edu/FR3D</url>. The FR3D data sets were created taking recently solved structures into consideration and therefore represent the currently known RNA 3D structural space. Here, the FR3D:3A set is restricted to structures that have been solved at a better resolution and may therefore be more reliable than structures contained in the FR3D:4A set. In turn, the FR3D:4A set has a less strict resolution cutoff and therefore contains more structures.</p>
            </sec>
            <sec>
               <st>
                  <p>From PDB structures to "gold" structures</p>
               </st>
               <p>In order to generate the data sets for this study, we downloaded all 3D structures contained in the original data sets from the PDB and extracted the secondary structures of each RNA chain using the stereo-geometrical information encoded within the atomic coordinates. Each chain was processed with the base pair annotation software tool MC-Annotate <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> resulting in a list of all intramolecular contacts in the chain. For this study, we only used base pair interactions that are formally involved in secondary structure formation, namely the <it>cis </it>Watson-Crick (cWC) base pairs (G:C, C:G, A:U, U:A, G:U, U:G). All other interactions, such as non-canonical base pairs, base stackings, and base-backbone interactions were ignored since they are not part of the secondary structure. The secondary structure of an RNA chain could then be reconstructed directly from the ordered list of canonical base pairs. In a next step, this "preliminary" structure was scanned for lonely base pairs and pseudoknot interactions. Since lonely base pairs are thermodynamically unstable in a secondary structure, they were removed from the list. Due to the fact that there is no unique solution to remove the knot(s) from a pseudoknotted structure, these structures are unusable for the purpose of our study. Therefore, structures containing pseudoknots larger than one base pair, were also discarded. We consider the set of structures reduced in this way as the set of "gold" structures. They constitute our standard of truth, but we are reluctant to call them "true" structures, not only because of our removal of information, but also since structures <it>in cristallo </it>may be different from structures <it>in vivo</it><sup>9</sup>.</p>
               <p>Our gold data sets resulting from DARTS, FR3D:3A, and FR3D:4A consist of 147, 111, and 136 structures, respectively.</p>
               <p>As a final detail: in a few cases, FR3D:3A and FR3D:4A contain the same sequence, with different resolution in 3D and with <it>different secondary structure </it>derived from it. No secondary structure prediction program can be expected to be correct in both cases.</p>
               <sec>
                  <st>
                     <p>A data set derived from RNAstrand</p>
                  </st>
                  <p>Aside from these data sets, we also created a data set RNAstrand:91 with 91 structures from the RNAstrand database <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Since RNAstrand was designed as a source of validated structures, with an eye on the evaluation of RNA-related bioinformatics tools, it will be interesting to observe if the findings on this data set agree with the others.</p>
                  <p>Overall, we shall find that our four data sets deliver consistent sets of results. Therefore, the text of this article will discuss only selected measurements in detail, with the other ones given in the additional file <supplr sid="S1">1</supplr>, as well as all four raw data sets in additional file <supplr sid="S2">2</supplr>.</p>
                  <suppl id="S1">
                     <title>
                        <p>Additional file 1</p>
                     </title>
                     <text>
                        <p><b>Measurements on Data Sets FR3D:3A, FR3D:4A and RNAstrand:91</b>. File "supplement.pdf" contains detailed results for the three mentioned data sets FR3D:3A, FR3D:4A and RNAstrand:91, which have not been shown in the main paper. We also provide four Venn diagrams to demonstrate overlaps between the data sets.</p>
                     </text>
                     <file name="1471-2105-12-429-S1.TGZ">
   <p>Click here for file</p>
</file>
                  </suppl>
                  <suppl id="S2">
                     <title>
                        <p>Additional file 2</p>
                     </title>
                     <text>
                        <p><b>Data Sets</b>. Archive "datasets.tgz" contains all four data sets DARTS, FR3D:3A, FR3D:4A and RNAstrand:91 as FASTA like files. Format description is given in additional file <supplr sid="S1">1</supplr>: "supplement.pdf".</p>
                     </text>
                     <file name="1471-2105-12-429-S2.PDF">
   <p>Click here for file</p>
</file>
                  </suppl>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Evaluation of models for MFE structure prediction</p>
            </st>
            <p>While our main interest is in the effect of the chosen model on the partition function based computations, we here evaluate the four grammars with respect to prediction of a single MFE structure.</p>
            <sec>
               <st>
                  <p>Evaluation setup</p>
               </st>
               <p>In evaluating models with respect to MFE structure prediction, we include not only our programs NoDangle and OverDangle, MicroState and MacroState, but also the folding programs <smcaps>UNAFOLD</smcaps> and RNA<smcaps>FOLD</smcaps>, which our readers are rightfully curious about because of their practical importance. <it>Turner'99 </it>parameters <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> were used throughout<sup>10</sup>. These parameters are derived from melting experiments, with a few exceptions. Multiloop parameters such as <it>ml</it>_<it>energy </it>in <it>Turner'99 </it>are not derived from experiment, but are optimized from structure data to be used in conjunction with the MicroState model. Out of competition, we also include <smcaps>CENTROIDFOLD</smcaps>, which goes beyond strict energy minimization by producing a near-optimal ensemble of structures and choosing the eventual, single-structure prediction based on this sample.</p>
               <p>Relative performance of programs of different origin is, however, not our main interest here. Mainly, the evaluation should support that our four grammars faithfully reproduce the behavior of the models underlying RNA<smcaps>FOLD</smcaps> with options -d0, -d1, and -d2, as postulated at the outset of this study.</p>
               <p>The data set in this evaluation is DARTS. Evaluation results are summarized in Figure <figr fid="F6">6</figr>. We use an asymmetric base pair distance for comparison, as explained with Figure <figr fid="F6">6</figr>, where one structure (row entry) is treated as the prediction, the other as the reference (column entry).</p>
               <fig id="F6"><title><p>Figure 6</p></title><caption><p>Comparison of different MFE prediction programs</p></caption><text>
   <p><b>Comparison of different MFE prediction programs</b>. <b>Dataset: </b>we use the 147 sequences from the DARTS set, except pdb1ajt1B, pdb1kod1A, pdb1koc1A, pdb1lpw1B and pdb1t4x1B, which crashed under <smcaps>UNAFOLD</smcaps>. Together, all according "PDB" structures contain 1,614 base pairs. All "gold" structures have 1,593 base pairs. <b>Distance: </b>One base pair set, i.e. secondary structure, is the reference (<it>R</it>: table columns), the other one is the prediction (<it>P</it>: table rows). Traditional base pair distance is defined as|<it>R </it>\<it>P</it>| + |<it>P </it>\<it>R</it>|. Following <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, we decide to allow additional base pairs in the prediction, as long as they are compatible with the reference, i.e. both bases are unpaired and the additional base pair does not introduce a pseudoknot in the reference. The set of compatible base pairs is <it>P</it><sup>-<it>c </it></sup>= <it>P</it>\{(<it>a</it>, <it>b</it>)|(<it>a</it>, <it>b</it>) &#8713; <it>R </it>&#923; (<it>a</it>, <it>b</it>) compatible to <it>R</it>}. Then, our asymmetric base pair distance is: |<it>R </it>\<it>P</it>| + |<it>P</it><sup>-<it>c </it></sup>\<it>R</it>|. Table values are the sums of base pair distances for all 142 sequences. In the case of co-optimal results, the one with the smallest distance to the reference is chosen. Our distance function is rather strict and does not allow base pair slippage. If a gold base pair (<it>i</it>, <it>j</it>) is mispredicted as (<it>i </it>+ 1, <it>j</it>), this contributes a distance of 2. <b>Programs: </b>for each RNA sequence we called the programs with the following command line options: RNA<smcaps>FOLD </smcaps>(version 1.8.5): <monospace>echo sequence | RNAfold -noPS -noLP -dX</monospace>, where X is 0, 1 or 2. <smcaps>UNAFOLD</smcaps> (version 3.8): <monospace>hybrid-ss-min --suffix = DAT --mfold --NA=RNA --tmin = 37 --tinc = 1 --tmax = 37 --sodium = 1 --magnesium = 0 --noisolate --nodangle tmpseqfile >/dev/null &amp;&amp; ct2b.pl tmpseqfile.ct</monospace>, with and without the <monospace>--nodangle</monospace> switch, where "tmpseqfile" is a fasta file containing the sequence and "ct2b.pl" is a small Perl script from the Vienna Package, which converts RNA structures from "connect" to "dot-bracket" format. <smcaps>CENTROIDFOLD</smcaps> (version v0.0.9): <monospace>centroid_fold --engine=X tmpseqfile</monospace>, where <it>X </it>is the source of base pair probabilities and is either computed by RNA<smcaps>FOLD</smcaps> (McCaskill) or by CONTRA<smcaps>FOLD</smcaps>. Our ADP implementation of the four grammars "NoDangle", "OverDangle", "MicroState" and "MacroState" get the sequence as their sole input. The binaries can be built with the source code from the additional file <supplr sid="S3">3</supplr> and the Bellman's GAP compiler.</p>
</text><graphic file="1471-2105-12-429-6" hint_layout="double"/></fig>
            </sec>
            <sec>
               <st>
                  <p>Observations from MFE prediction experiment</p>
               </st>
               <sec>
                  <st>
                     <p>Consistency of implementations</p>
                  </st>
                  <p>Naturally, comparing the results from the same tool leads to entries of zero base pair distance in the diagonal of Figure <figr fid="F6">6</figr>. The off-diagonal zero entries, however, are quite remarkable. When two different algorithms perfectly agree in their MFE predictions on the complete data set, this provides strong evidence that they both faithfully implement the same thermodynamic model of the folding space in each of its variants. In particular, this shows that our grammars NoDangle/OverDangle and MicroState indeed capture the analysis computed by RNA<smcaps>FOLD</smcaps> with options -d0/d2 and -d1. The perfect zeroes might even make our reader suspicious! Occasionally, there must be two (or more) co-optimal structures of minimal free energy, and it is not formally defined which one a program should return in this situation. Hence, it is accidental whether or not two different programs, implemented by different programmers, make the same choice. We therefore have designed our new programs to report all co-optimal solutions in such a situation, and then choose the structure closest to the RNA<smcaps>FOLD</smcaps> prediction. This always delivered a perfect match.</p>
                  <p>We apply the same technique of safe-guarding against co-optimals when comparing to a database structure. Note that in practice, when predicting structure for a novel RNA, the users of a structure prediction program have no reference structure to resort to. In this case, reporting all co-optimal structures makes them aware of the ambiguity of the situation, and leaves them with the choice to make. This is somewhat preferable to quietly reporting a single MFE structure, selected from several by implementation peculiarities.</p>
                  <p>The perfect agreement of MacroState with the MFE prediction of RNA<smcaps>FOLD</smcaps> -d1 as well as with MicroState demonstrates that MacroState in fact computes the energy model of the other two programs, while avoiding (as explained above) their explosion of the state space. Taken together, these consistency results shows that we have correct programs set up for our second experiment, where we will evaluate the effect of the chosen energy and state space model on partition function calculations.</p>
               </sec>
               <sec>
                  <st>
                     <p>Quality of MFE predictions</p>
                  </st>
                  <p>Overall, the quality of MFE predictions compared to "real" structures is moderate when measured on the individual base pair level, with errors<sup>11 </sup>ranging from 16% to 21% for the gold structures. This is expected and well-known. It is the reason why researchers have developed more advanced techniques, such as structure sampling, complete enumeration, or shape abstraction. The PDB structures contain base pairs which by definition are not predicted - non-standard pairs, 3D interactions, pseudoknots, and lonely pairs. As explained above, the data set of gold structures has been cleaned up in these respects, and as expected, the predictions come closer, but deviations are still considerable.</p>
                  <p>The gold structures are best predicted by MacroState and MicroState (distance 521) and RNA<smcaps>FOLD</smcaps> -d1 (distance 531). The small difference is accidental and arises from the rare case where RNA<smcaps>FOLD</smcaps> picks an unlucky choice from several co-optimal structures.</p>
               </sec>
               <sec>
                  <st>
                     <p>Performance of different dangling models</p>
                  </st>
                  <p>Comparing the full dangling model (MicroState, MacroState) to its upper and lower approximations NoDangle and OverDangle, we find that its proper implementation pays off. It reduces the accumulated distance by about 14% over NoDangle, and by 9% over OverDangle. Similar percentages apply for RNA<smcaps>FOLD</smcaps> option -d1 versus -d0 and -d2. This also shows that OverDangle approximates the correct model better than NoDangle and justifies its use as a substitute for the full model in partition function calculations with RNA<smcaps>FOLD</smcaps> and RNA<smcaps>SUBOPT</smcaps>, where the grammar MacroState is not available.</p>
               </sec>
               <sec>
                  <st>
                     <p>unafold performance</p>
                  </st>
                  <p>The two versions of <smcaps>UNAFOLD</smcaps> consistently score a bit worse against the gold structures than all other programs. Compared to each other, we also observe that the distance is improved by considering dangling energies, here by 17%. Otherwise, the two <smcaps>UNAFOLD</smcaps> versions cluster with the NoDangle/MicroState groups, as they should<sup>12</sup>.</p>
               </sec>
               <sec>
                  <st>
                     <p>Looking deeper into the near-optimal folding space</p>
                  </st>
                  <p>We included <smcaps>CENTROIDFOLD</smcaps><abbrgrp><abbr bid="B24">24</abbr></abbrgrp> as a representative of methods which, in contrast to the above programs, look deeper into the Boltzmann ensemble of near-optimal structures. Our evaluation shows that the extra effort is well spent. <smcaps>CENTROIDFOLD</smcaps> comes closest to the good structures, and with respect to the single structure predictors, it corresponds best with the group of RNA<smcaps>FOLD</smcaps> -d1, MacroState and MicroState.</p>
               </sec>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Evaluating models for partition function and related computations</p>
            </st>
            <p>We will explain our evaluations in detail based on our largest data set, DARTS. Results on the other data sets are obtained in an analogous way and are summarized in the end of this section.</p>
            <sec>
               <st>
                  <p>Evaluation Criteria</p>
               </st>
               <p>In this section, we apply probabilistic shape analysis to our data set. We are interested in the difference of performance of the four models NoDangle, OverDangle, MicroState and MacroState. For simplicity, we call the abstract shape of the reference structure the "reference shape", and refer to the most likely predicted shape as the "dominant shape", although its actual dominance within the Boltzmann ensemble will not be strong if there is another shape with similar probability. The shape string of the reference shape of sequence <it>s </it>is obtained by a call to <monospace>RNAshapes -t l -D</monospace> "<it>s</it>", where <monospace>1</monospace> is one of the five shape abstraction levels.</p>
               <p>We ask the following questions:</p>
               <p indent="1">&#8226; What are the differences in the shape probabilities computed with each of the four models?</p>
               <p indent="1">&#8226; How is the difference affected by the shape abstraction level considered?</p>
               <p>Since we do observe significant differences in model behavior, we also ask which model comes closer to the truth:</p>
               <p indent="1">&#8226; To what extend does the dominant shape agree with the reference shape?</p>
               <p indent="1">&#8226; What is the median (or the 75% and 90% quantile) of the reference shape among the predicted shapes?</p>
               <p>Finally, we consider</p>
               <p indent="1">&#8226; What are the runtime or memory trade-offs for computing with different models?</p>
               <sec>
                  <st>
                     <p>Evaluation method</p>
                  </st>
                  <p>Shape probabilities do not make a structure prediction per se. They provide holistic information by assigning probabilities to all shapes in the folding space of a sequence <it>x</it>. It is our responsibility how we interpret theses data. The hope is, of course, to find the biologically functional structure among the high-probability shapes, to find two high probability shapes for a riboswitch, to use lack of any shape with high probability as an indicator of absence of a well-defined structure, and so on. Such analysis goes beyond shape probabilities, and takes into account the concrete shreps returned for each shape.</p>
                  <p>Independent of what the shape probabilities will be used for, we want to focus on the agreement between the four grammars. To measure this, we use the <it>shape probability shift </it>(SPS). For a given sequence <it>x</it>, all grammars will report the same shape classes, but with different probabilities. Let <it>P </it>(<it>x</it>) be the <it>shape space</it>, i. e. the set of all shape classes for <it>x</it>, and <it>Prob<sub>G</sub></it>(<it>p</it>) the shape probability of <it>p </it>under grammar <it>G</it>. The shape probability shift for <it>x </it>and grammars <it>A </it>and <it>B </it>is defined as:</p>
                  <p>
                     <display-formula id="M2">
                        <m:math name="1471-2105-12-429-i23" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:mrow>
   <m:mi>S</m:mi>
   <m:mi>P</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>S</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>A</m:mi>
         <m:mo class="MathClass-punc">,</m:mo>
         <m:mi>B</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>x</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">=</m:mo>
   <m:mfrac>
      <m:mrow>
         <m:mn>1</m:mn>
      </m:mrow>
      <m:mrow>
         <m:mn>2</m:mn>
      </m:mrow>
   </m:mfrac>
   <m:mo class="MathClass-bin">&#8901;</m:mo>
   <m:munder class="msub">
      <m:mrow>
         <m:mo mathsize="big">&#8721;</m:mo>
      </m:mrow>
      <m:mrow>
         <m:mi>p</m:mi>
         <m:mo class="MathClass-rel">&#8712;</m:mo>
         <m:mi>P</m:mi>
         <m:mrow>
            <m:mo class="MathClass-open">(</m:mo>
            <m:mrow>
               <m:mi>x</m:mi>
            </m:mrow>
            <m:mo class="MathClass-close">)</m:mo>
         </m:mrow>
      </m:mrow>
   </m:munder>
   <m:mo class="MathClass-rel">&#8739;</m:mo>
   <m:mi>P</m:mi>
   <m:mi>r</m:mi>
   <m:mi>o</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>b</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>A</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-bin">-</m:mo>
   <m:mi>P</m:mi>
   <m:mi>r</m:mi>
   <m:mi>o</m:mi>
   <m:msub>
      <m:mrow>
         <m:mi>b</m:mi>
      </m:mrow>
      <m:mrow>
         <m:mi>B</m:mi>
      </m:mrow>
   </m:msub>
   <m:mrow>
      <m:mo class="MathClass-open">(</m:mo>
      <m:mrow>
         <m:mi>p</m:mi>
      </m:mrow>
      <m:mo class="MathClass-close">)</m:mo>
   </m:mrow>
   <m:mo class="MathClass-rel">&#8739;</m:mo>
</m:mrow>
</m:math>
                     </display-formula>
                  </p>
                  <p>Note that 0 &#8804; <it>SPS</it>(<it>x</it>) &#8804; 1, where the extreme case of 1 would only be achieved when all shapes with positive probability by grammar <it>A </it>have zero probability by grammar <it>B </it>and vice versa. The SPS can be interpreted as the overall probability mass that moves between shapes.</p>
                  <p>We chose the SPS measure because of this nice interpretation. We also evaluated two alternative measures. The squared distance of base pair probability matrices is correlated with the SPS by a factor around 0.83 at shape level 5 and not much lower on less abstract shape levels. The Kullback-Leibler divergence turned out to be unsuitable for the purpose, as it is not symmetric and both versions (KL(x, y) versus KL(y, x)) show the poorest correlation among all methods tested. Details of this investigation of alternatives are given in additional file <supplr sid="S1">1</supplr>.</p>
               </sec>
               <sec>
                  <st>
                     <p>Observations</p>
                  </st>
                  <p>The values in Figure <figr fid="F7">7</figr> are average SPS<sup>13 </sup>over all <it>x </it>&#8712; DARTS, which is the largest of our data sets.</p>
                  <fig id="F7"><title><p>Figure 7</p></title><caption><p>Model similarity: shape probability shift</p></caption><text>
   <p><b>Model similarity: shape probability shift</b>.</p>
</text><graphic file="1471-2105-12-429-7" hint_layout="double"/></fig>
                  <p>First, consider shape abstraction level 5. We find that models MacroState and MicroState show the most agreement, where the SPS is around 3.7%. MacroState shows a significant SPS against the others, strongest against NoDangle (9.6%) but also against OverDangle (5.7%). A SPS in this range means that while in many cases, the predicted dominant shape will be the same for all models, this need not hold in general.</p>
                  <p>This justifies the question which of the model finds the gold shape as the dominant shape more often (see below). By the way: the dominant shape and the shape of the MFE structure agree for MacroState in 143 out of 147 cases.</p>
                  <p>Let us next turn from level 5 to decreasinging levels of abstraction. Moving to abstraction levels 4, 3, 2, and 1, the number of shapes increases with each step, while each shape class holds a smaller number of structures. The overall relationship between the models on levels 4 through 1 is consistent with what we observe for level 5. Overall, the SPS values increase. A closer inspection of the raw data shows that SPS values actually decrease for each individual shape, but due to the larger number of (smaller) shifts, their sum increases. Evidence is provided in Figure <figr fid="F8">8</figr>.</p>
                  <fig id="F8"><title><p>Figure 8</p></title><caption><p>Model similarity: average shape probability shift per shape</p></caption><text>
   <p><b>Model similarity: average shape probability shift per shape</b>.</p>
</text><graphic file="1471-2105-12-429-8" hint_layout="double"/></fig>
               </sec>
               <sec>
                  <st>
                     <p>Dominant shape is gold shape?</p>
                  </st>
                  <p>The values in Table <tblr tid="T5">5</tblr> show the ratios of correct shape predictions vs. the size of the testset, which is 147 in the case of DARTS. We observe the following:</p>
                  <tbl id="T5"><title><p>Table 5</p></title><caption><p>Ratio of agreement between dominant shape and gold shape for the different grammars (columns) and different shape abstraction levels (rows).</p></caption><tblbdy cols="5">
      <r>
         <c ca="center">
            <p>
               <b>Level</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MacroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MicroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>OverDangle</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>NoDangle</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>5</b>
            </p>
         </c>
         <c ca="center">
            <p>0.823</p>
         </c>
         <c ca="center">
            <p>0.816</p>
         </c>
         <c ca="center">
            <p>0.796</p>
         </c>
         <c ca="center">
            <p>0.810</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>4</b>
            </p>
         </c>
         <c ca="center">
            <p>0.694</p>
         </c>
         <c ca="center">
            <p>0.694</p>
         </c>
         <c ca="center">
            <p>0.660</p>
         </c>
         <c ca="center">
            <p>0.687</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>3</b>
            </p>
         </c>
         <c ca="center">
            <p>0.687</p>
         </c>
         <c ca="center">
            <p>0.680</p>
         </c>
         <c ca="center">
            <p>0.660</p>
         </c>
         <c ca="center">
            <p>0.673</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>2</b>
            </p>
         </c>
         <c ca="center">
            <p>0.653</p>
         </c>
         <c ca="center">
            <p>0.653</p>
         </c>
         <c ca="center">
            <p>0.612</p>
         </c>
         <c ca="center">
            <p>0.646</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>0.585</p>
         </c>
         <c ca="center">
            <p>0.551</p>
         </c>
         <c ca="center">
            <p>0.565</p>
         </c>
         <c ca="center">
            <p>0.592</p>
         </c>
      </r>
   </tblbdy></tbl>
                  <p>The best ratio of agreement of dominant shape and gold shape is 82.3%. The fact that this value is not higher is the reason which makes investigators look into several high-probability shapes and their shreps in practice. Comparing the models, we find that there is no clear winner, with a margin of only 2.7% between the best and the worst performer. (Moreover, the first position varies over our data sets.) Here, MacroState finds agreement most often, with a 0.7% margin over MicroState and 1.3% margin over NoDangle. OverDangle performs worst (79.6%), but not hopeless when we consider that one will look at a number of top-ranking shapes anyway.</p>
                  <p>Thus, the more interesting question is how the gold shape is placed among the predicted shapes - cf. Table <tblr tid="T6">6</tblr>. We investigate this aspect by compiling a list of <it>rank</it>(<it>p</it><sup>gold</sup>) for all 147 testsequences, sorting this list ascendingly and report the median (50%), the 75%, and the 90% quantile of the list, as well as the complete list (100%). For example, the value 2 for MacroState in shape abstraction level 5 in the 90% column means that, if we decide to take only the top two shapes for closer study, the gold shape is among them in 90% of the cases. Three top shapes are suffice to reach this coverage with MicroState and OverDangle. Overall, the advantage of MacroState appears marginal over the other grammars on level 5, and appears somewhat randomized for weaker abstraction levels.</p>
                  <tbl id="T6"><title><p>Table 6</p></title><caption><p>Positions of correct shapes.</p></caption><tblbdy cols="17">
      <r>
         <c ca="center">
            <p>
               <b>Level</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>MacroState</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>MicroState</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>OverDangle</b>
            </p>
         </c>
         <c cspan="4" ca="center">
            <p>
               <b>NoDangle</b>
            </p>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c ca="right">
            <p>
               <b>50%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>75%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>100%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>50%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>75%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>100%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>50%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>75%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>100%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>50%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>75%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>90%</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>100%</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="17">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>5</b>
            </p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>8</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>12</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>10</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>4</b>
            </p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>85</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>124</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>217</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>64</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>3</b>
            </p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>108</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>6</p>
         </c>
         <c ca="right">
            <p>192</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>315</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>54</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>2</b>
            </p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>2</p>
         </c>
         <c ca="right">
            <p>16</p>
         </c>
         <c ca="right">
            <p>759</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>21</p>
         </c>
         <c ca="right">
            <p>3729</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>13</p>
         </c>
         <c ca="right">
            <p>2404</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>21</p>
         </c>
         <c ca="right">
            <p>6534</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>46</p>
         </c>
         <c ca="right">
            <p>1373</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>4</p>
         </c>
         <c ca="right">
            <p>68</p>
         </c>
         <c ca="right">
            <p>6395</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>58</p>
         </c>
         <c ca="right">
            <p>2349</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>42</p>
         </c>
         <c ca="right">
            <p>4674</p>
         </c>
      </r>
   </tblbdy></tbl>
                  <p>An unexpected observation is the strong performance of shape level 2. Considering the 75% quartile, 3 shapes suffice to find the gold shape, independent of the model chosen. We will return to this observation in the Conclusion.</p>
               </sec>
               <sec>
                  <st>
                     <p>Relative runtime and memory consumption</p>
                  </st>
                  <p>Using the Unix tool "memtime", we logged the "resident set size" as an estimate for memory consumption, see Table <tblr tid="T7">7</tblr>, and the sum of "user-space" plus "kernel-space" times as an estimate for the process runtime, Table <tblr tid="T8">8</tblr>, for all test sequences and summed them up for runtime and used the maximum for memory. Since the actual values highly depend on hardware and software issues, e. g. 64 vs. 32 bit or compiler optimizations, we set the MacroState level 5 value (first row, first column) to 1.0 and give all other values relative to it.</p>
                  <tbl id="T7"><title><p>Table 7</p></title><caption><p>Relative memory.</p></caption><tblbdy cols="5">
      <r>
         <c ca="center">
            <p>
               <b>Level</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MacroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MicroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>OverDangle</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>NoDangle</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>5</b>
            </p>
         </c>
         <c ca="center">
            <p>1.00</p>
         </c>
         <c ca="center">
            <p>0.26</p>
         </c>
         <c ca="center">
            <p>0.26</p>
         </c>
         <c ca="center">
            <p>0.21</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>4</b>
            </p>
         </c>
         <c ca="center">
            <p>3.90</p>
         </c>
         <c ca="center">
            <p>0.76</p>
         </c>
         <c ca="center">
            <p>0.76</p>
         </c>
         <c ca="center">
            <p>0.53</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>3</b>
            </p>
         </c>
         <c ca="center">
            <p>6.62</p>
         </c>
         <c ca="center">
            <p>1.31</p>
         </c>
         <c ca="center">
            <p>1.24</p>
         </c>
         <c ca="center">
            <p>0.74</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>2</b>
            </p>
         </c>
         <c ca="center">
            <p>139.12</p>
         </c>
         <c ca="center">
            <p>6.93</p>
         </c>
         <c ca="center">
            <p>7.89</p>
         </c>
         <c ca="center">
            <p>7.36</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>795.14</p>
         </c>
         <c ca="center">
            <p>47.38</p>
         </c>
         <c ca="center">
            <p>51.21</p>
         </c>
         <c ca="center">
            <p>24.29</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The MacroState level 5 value equals 31.8 MB resident set size.</p>
   </tblfn></tbl>
                  <tbl id="T8"><title><p>Table 8</p></title><caption><p>Relative runtime.</p></caption><tblbdy cols="5">
      <r>
         <c ca="center">
            <p>
               <b>Level</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MacroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>MicroState</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>OverDangle</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>NoDangle</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="5">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>5</b>
            </p>
         </c>
         <c ca="center">
            <p>1.00</p>
         </c>
         <c ca="center">
            <p>0.25</p>
         </c>
         <c ca="center">
            <p>0.15</p>
         </c>
         <c ca="center">
            <p>0.12</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>4</b>
            </p>
         </c>
         <c ca="center">
            <p>3.70</p>
         </c>
         <c ca="center">
            <p>0.95</p>
         </c>
         <c ca="center">
            <p>0.59</p>
         </c>
         <c ca="center">
            <p>0.39</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>3</b>
            </p>
         </c>
         <c ca="center">
            <p>5.99</p>
         </c>
         <c ca="center">
            <p>1.59</p>
         </c>
         <c ca="center">
            <p>0.96</p>
         </c>
         <c ca="center">
            <p>0.60</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>2</b>
            </p>
         </c>
         <c ca="center">
            <p>145.56</p>
         </c>
         <c ca="center">
            <p>14.46</p>
         </c>
         <c ca="center">
            <p>9.39</p>
         </c>
         <c ca="center">
            <p>8.37</p>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>1</b>
            </p>
         </c>
         <c ca="center">
            <p>643.20</p>
         </c>
         <c ca="center">
            <p>117.16</p>
         </c>
         <c ca="center">
            <p>51.76</p>
         </c>
         <c ca="center">
            <p>28.99</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>The MacroState level 5 value equals 20.76 seconds on an Intel<sup>&#174; </sup>Xeon<sup>&#174; </sup>CPU L5420 @ 2.50 GHz.</p>
   </tblfn></tbl>
                  <p>MacroState is the most sophisticated grammar and hence the most expensive to compute with. It is slower compared to MicroState, OverDangle, and NoDangle by factors of about 4.0, 6.7, and 8.3, respectively, on level 5. This slowdown factors are about the same for level 4 and 3, and increases for levels 2 and 1, but not consistently so. The largest slowdown measured is 643.20/28.99 = 22.2.</p>
                  <p>In terms of memory requirements, similar observations hold. This is clear, since all algorithms are implemented via dynamic programming, where a difference in the number of tables to be filled (with MacroState needing the most) directly maps to the difference in runtime as well as in space requirements.</p>
                  <p>Overall, the selected shape abstraction level makes more difference with resource requirements than the chosen model. For example, NoDangle (the most efficient) used with abstraction level 2 uses more time and space than MacroState (the least efficient) with abstraction levels 5, or 4.</p>
               </sec>
            </sec>
            <sec>
               <st>
                  <p>Consistent results on data sets DARTS, FR3D:3A, FR3D:4A, and RNAstrand:91</p>
               </st>
               <p>We performed the same analysis as described above for the data set DARTS also for the data sets FR3D:3A and FR3D:4A and RNAstrand:91. Our observations on these data sets are consistent with what was reported above. Therefore, measurement results on these data sets are reported in additional file <supplr sid="S1">1</supplr>, but not further discussed here.</p>
               <p>RNAstrand:91 performing similar to the PDB-derived data sets demonstrates that the RNAstrand data base meets its design goal to provide a solid base of validated structures for tool evaluation <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. Structures from RNAstrand can be selected according to specifc criteria of interest, and do not require the clean-up operations we had to perform with structures taken from PDB.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <sec>
            <st>
               <p>Model comparison</p>
            </st>
            <p>Summing up our observations from model comparison and model performance evaluation, we conclude the following:</p>
            <p><b>Conclusion 1 </b><it>For prediction of a single structure, there is no better alternative (among the models considered) than </it><monospace>RNA<smcaps>FOLD</smcaps> -<it>d1</it></monospace>, <it>possibly augmented to report ALL structures with the optimal MFE value as in MicroState, when several exist</it>.</p>
            <p>However, with such augmentation, a filter must be provided to safeguard against co-optimal microstates of the same optimal macrostate being reported.</p>
            <p><b>Conclusion 2 </b><it>The distortion of shape probabilities caused by state space explosion (MacroState versus MicroState) is smaller than the one caused by over- or underestimating energies (MacroState and MicroState versus NoDangle or OverDangle)</it>.</p>
            <p>Models being so similar leads us to the question of runtime effort.</p>
            <p><b>Conclusion 3 </b><it>Since results between MacroState and MicroState differ only marginally, MicroState may be used for probability calculation. The higher computational effort of MacroState is not justified</it>.</p>
            <p>In the light of the previous conclusions we find:</p>
            <p><b>Conclusion 4 </b><it>On longer sequences, the only remaining virtue of MacroState appears to be its ability to enumerate suboptimal structures with correct energies, and without redundancy</it>.</p>
            <p>This answers the questions raised at the outset of this study.</p>
         </sec>
         <sec>
            <st>
               <p>Evaluation of further models</p>
            </st>
            <p>Our evaluation has concentrated on the models underlying the programs RNA<smcaps>FOLD</smcaps>, RNA<smcaps>SHAPES</smcaps>, and RNA<smcaps>SUBOPT</smcaps>. There are many other folding programs out there. If these implementations adhere to the abstract models we present here in the form of tree grammars, our evaluation pertains to them as well. More likely, each implementation has its own peculiarities. In fact, one may think of extending our evaluation to models that are not based on thermodynamics at all, but are derived via machine learning techniques <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. These programs could be evaluated in the setting of this study in one of two ways. Either, the program source code is extended by the computation of abstract shapes and their shape probabilities (a useful feature anyway), and applied to our data sets directly. Or, the model behind the program is extracted as a tree grammar, coded in Bellman's GAP, and combined with existing modules for shape abstraction and partition function computations. Depending on the model differences, extracting the grammar behind the code may come down to a few minor changes to the four models provided here.</p>
            <p>Generally, the four models MacroState, MicroState, Overdangle and NoDangle are available as a starting point for future research into on thermodynamic RNA folding. Implemented in the Bellman's GAP language, these programs are especially easy to modify or extend, while the Bellman's GAP compiler provides automatic translation into efficient and correct dynamic programming algorithms. The complete source code of our four models is included in additional file <supplr sid="S3">3</supplr>.</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p><b>Source Code of all models</b>. The archive "fold-grammars.tgz" hold source code for all four models (NoDangle, OverDangle, MicroState and MacroState) in the Advanced Dynamic Programming language Bellman's GAP. Please see the enclosed readme file for further instructions on how to compile binaries.</p>
               </text>
               <file name="1471-2105-12-429-S3.TGZ">
   <p>Click here for file</p>
</file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>A new strategy for level-2 shape probabilities?</p>
            </st>
            <p>Our observations about the performance of shape level 2 gives rise to the investigation of a new strategy. Recall that level 2 gives much stronger information than levels 5 or even 3. Level 2 records not only the overall arrangement of helices, but also reports and distinguishes internal loops, 5' and 3' bulges.</p>
            <p>Over all our data sets, consideration of (only) the five most likely level-2 shapes (using MicroState) reports the gold shape in 75% of the cases, while 25 level-2 shapes reach 90% coverage. However, the cost of level-2 shape analysis becomes prohibitive for longer sequences. Our data show a slowdown factor of 55 (for MicroState) over level-5 analysis, which should become even worse for longer sequences. Therefore, we conclude</p>
            <p>
               <b>Conclusion 5 </b>
               <it>A strategy to efficiently compute level-2 shapes for long sequences is desirable</it>
            </p>
            <p>Let us sketch a strategy how this can be achieved, borrowing ideas from the <smcaps>RAPIDSHAPES</smcaps> method <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Directly accessing the complete level-2 shape space of a long sequence appears infeasible. But we can compute a level-5 analysis at 90% or 100% coverage quickly, by reporting a small number top-ranking level-5 shapes (12 would suffice for 100% coverage on our data sets). For these shapes, we can generate a thermodynamic matcher <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> to perform a separate level-2 analysis within each of the reported level-5 shape classes. Generating such a matcher as a tree grammar, encoded in Bellman's GAP, plus its subsequent compilation has negligible runtime. This should reduce the computational effort (which results from the number of shapes) considerably. While this is not mathematically guaranteed to yield the most likely level-2 shape, the idea appears promising.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SJ suggested to tackle the problem of model discrepancies empirically. SJ, RG and GS designed the study. CS prepared the data sets derived from 3D structural data, and GS provided background on the thermodynamic model. SJ implemented the four models and ran the evaluations. All authors closely cooperated in interpreting the results and writing the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
      <sec>
         <st>
            <p>Notes</p>
         </st>
         <p><sup>1</sup>Our observations may pertain also to other popular programs such as <smcaps>MFOLD</smcaps><abbrgrp><abbr bid="B28">28</abbr></abbrgrp>, <smcaps>UNAFOLD</smcaps><abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and RNA<smcaps>STRUCTURE</smcaps><abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, but their folding space implementations have not been re-modeled here.</p>
         <p><sup>2</sup>One may view our re-engineering as adding shape probability functionality to the Vienna RNA package from outside.</p>
         <p><sup>3</sup>Similarly, stacking of helices <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp> can further contribute free energy. This aspect is not considered here.</p>
         <p><sup>4</sup>RNA<smcaps>FOLD</smcaps>-manual: "-d or -d0 ignores dangling ends altogether (mostly for debugging)."</p>
         <p><sup>5</sup>RNA<smcaps>FOLD</smcaps>-manual: "With -d2 this check is ignored, dangling energies will be added for the bases adjacent to a helix on both sides in any case; this is the default for partition function folding (-p)."</p>
         <p><sup>6</sup>RNA<smcaps>FOLD</smcaps>-manual: "With -d1 only unpaired bases can participate in at most one dangling end, this is the default for mfe folding but unsupported for the partition function folding."</p>
         <p><sup>7</sup>A larger threshold will always help. However, one cannot tell whether this situation has occurred.</p>
         <p><sup>8</sup>Whether or not it is adequate in partition function computations to split a secondary structure into several microstates is an unresolved dispute among experts (M. Zuker, personal communication).</p>
         <p><sup>9</sup>This can be evaluated by experimental techniques <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, but sufficient data are not yet available.</p>
         <p><sup>10</sup>While in press, <it>Turner'2004 </it>energy parameters became available. Results for all evaluations are listed in additional file <supplr sid="S4">4</supplr>.</p>
         <suppl id="S4">
            <title>
               <p>Additional file 4</p>
            </title>
            <text>
               <p><b>Evaluation results for Turner 2004 energy parameters</b>. File "turner2004.pdf" contains results for all our evaluations, but computed with the more recent <it>Turner 2004 </it>energy parameter set, which became available while our manuscript was in press.</p>
            </text>
            <file name="1471-2105-12-429-S4.PDF">
   <p>Click here for file</p>
</file>
         </suppl>
         <p><sup>11</sup>It is not obvious how to convert our absolute distances into error rates. Remember that a mispredicted base pair can contribute a distance of 2 (cf. Figure <figr fid="F6">6</figr>). Assuming that predictions hold about the same number of base pairs as the gold structures (1593), the interval of possible distance scores is [0, 3186], from which the above percentages are derived.</p>
         <p><sup>12</sup>We also looked at four further <smcaps>UNAFOLD</smcaps> variants in dangle and no-dangle mode. Their behavior deviates considerably, which is explained by differences in the implemented energy model (M. Zuker, personal communication).</p>
         <p><sup>13</sup>In theory, these tables should be symmetric. We see a small asymmetry on the last decimal position in eight cases. This results from the fact that our programs - for better speed - ignore shapes with an initial probability less that 10<sup>-6</sup>. This means our resulting shape lists are nor perfectly identical in the low probability tail, and together with rounding errors, this leads to discrepancies &#8804; 0.002.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Thanks go to Michael Zuker for comments on the energy model and the <smcaps>UNAFOLD</smcaps> program, and to Georg Sauthoff for support with the Bellman's GAP system. Additional thanks go to Craig Zirbel for providing the FR3D:3A data set.</p>
            <p>We acknowledge support of the publication fee by Deutsche Forschungsgemeinschaft and the Open Access Publication Funds of Bielefeld University.</p>
            <p>We also thank the anonymous reviewers for helpful comments on the manuscript.</p>
         </sec>
      </ack>
      <refgrp><bibl id="B1"><title><p>Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure</p></title><aug><au><snm>Mathews</snm><fnm>D</fnm></au><au><snm>Sabina</snm><fnm>J</fnm></au><au><snm>Zuker</snm><fnm>M</fnm></au><au><snm>Turner</snm><fnm>D</fnm></au></aug><source>J Mol Biol</source><pubdate>1999</pubdate><volume>288</volume><fpage>911</fpage><lpage>940</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.1999.2700</pubid><pubid idtype="pmpid" link="fulltext">10329189</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Fast Folding and Comparison of RNA Secondary Structures</p></title><aug><au><snm>Hofacker</snm><fnm>IL</fnm></au><au><snm>Fontana</snm><fnm>W</fnm></au><au><snm>Stadler</snm><fnm>PF</fnm></au><au><snm>Bonhoeffer</snm><fnm>SL</fnm></au><au><snm>Tacker</snm><fnm>M</fnm></au><au><snm>Schuster</snm><fnm>P</fnm></au></aug><source>Monatsh Chem</source><pubdate>1994</pubdate><volume>125</volume><fpage>167</fpage><lpage>188</lpage><xrefbib><pubid idtype="doi">10.1007/BF00818163</pubid></xrefbib></bibl><bibl id="B3"><title><p>Abstract shapes of RNA</p></title><aug><au><snm>Giegerich</snm><fnm>R</fnm></au><au><snm>Vo&#223;</snm><fnm>B</fnm></au><au><snm>Rehmsmeier</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Research</source><pubdate>2004</pubdate><volume>32</volume><issue>16</issue><fpage>4843</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkh779</pubid><pubid idtype="pmcid">519098</pubid><pubid idtype="pmpid" link="fulltext">15371549</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Complete suboptimal folding of RNA and the stability of secondary structures</p></title><aug><au><snm>Wuchty</snm><fnm>S</fnm></au><au><snm>Fontana</snm><fnm>W</fnm></au><au><snm>Hofacker</snm><fnm>IL</fnm></au><au><snm>Schuster</snm><fnm>P</fnm></au></aug><source>Biopolymers</source><pubdate>1999</pubdate><volume>49</volume><issue>2</issue><fpage>145</fpage><lpage>165</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/(SICI)1097-0282(199902)49:2&lt;145::AID-BIP4&gt;3.0.CO;2-G</pubid><pubid idtype="pmpid" link="fulltext">10070264</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction</p></title><aug><au><snm>Dowell</snm><fnm>R</fnm></au><au><snm>Eddy</snm><fnm>S</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2004</pubdate><volume>5</volume><fpage>71</fpage><url>http://www.biomedcentral.com/1471-2105/5/71</url><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-5-71</pubid><pubid idtype="pmcid">442121</pubid><pubid idtype="pmpid" link="fulltext">15180907</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Complete probabilistic analysis of RNA shapes</p></title><aug><au><snm>Vo&#223;</snm><fnm>B</fnm></au><au><snm>Giegerich</snm><fnm>R</fnm></au><au><snm>Rehmsmeier</snm><fnm>M</fnm></au></aug><source>BMC Biology</source><pubdate>2006</pubdate><volume>4</volume><fpage>5</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1741-7007-4-5</pubid><pubid idtype="pmcid">1479382</pubid><pubid idtype="pmpid" link="fulltext">16480488</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><aug><au><snm>Waterman</snm><fnm>M</fnm></au></aug><source>Introduction to computational biology. Maps, sequences and genomes</source><publisher>London: Chapman &amp; Hall</publisher><pubdate>1995</pubdate></bibl><bibl id="B8"><title><p>On quantitative effects of RNA shape abstraction</p></title><aug><au><snm>Nebel</snm><fnm>M</fnm></au><au><snm>Scheid</snm><fnm>A</fnm></au></aug><source>Theory in Biosciences</source><pubdate>2009</pubdate><volume>128</volume><fpage>211</fpage><lpage>225</lpage><note>[10.1007/s12064-009-0074-z]</note><xrefbib><pubidlist><pubid idtype="doi">10.1007/s12064-009-0074-z</pubid><pubid idtype="pmpid">19756808</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Shape based indexing for faster search of RNA family databases</p></title><aug><au><snm>Janssen</snm><fnm>S</fnm></au><au><snm>Reeder</snm><fnm>J</fnm></au><au><snm>Giegerich</snm><fnm>R</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>131+</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-131</pubid><pubid idtype="pmcid">2277397</pubid><pubid idtype="pmpid" link="fulltext">18312625</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Stability of ribonucleic acid double-stranded helices</p></title><aug><au><snm>Borer</snm><fnm>P</fnm></au><au><snm>Dengler</snm><fnm>B</fnm></au><au><snm>Tinoco</snm><fnm>I</fnm><suf>Jr</suf></au><au><snm>Uhlenbeck</snm><fnm>O</fnm></au></aug><source>J Mol Biol</source><pubdate>1974</pubdate><volume>86</volume><fpage>843</fpage><lpage>853</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0022-2836(74)90357-X</pubid><pubid idtype="pmpid" link="fulltext">4427357</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Thermodynamics of unpaired terminal nucleotides on short RNA helixes correlates with stacking at helix termini in larger RNAs</p></title><aug><au><snm>Burkard</snm><fnm>M</fnm></au><au><snm>Kierzek</snm><fnm>R</fnm></au><au><snm>Turner</snm><fnm>D</fnm></au></aug><source>J Mol Biol</source><pubdate>1999</pubdate><volume>290</volume><fpage>967</fpage><lpage>982</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.1999.2906</pubid><pubid idtype="pmpid" link="fulltext">10438596</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>Long RNA dangling end has large energetic contribution to duplex stability</p></title><aug><au><snm>Ohmichi</snm><fnm>T</fnm></au><au><snm>Nakano</snm><fnm>S</fnm></au><au><snm>Miyoshi</snm><fnm>D</fnm></au><au><snm>Sugimoto</snm><fnm>N</fnm></au></aug><source>J Am Chem Soc</source><pubdate>2002</pubdate><volume>124</volume><fpage>10367</fpage><lpage>10372</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/ja0255406</pubid><pubid idtype="pmpid" link="fulltext">12197739</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>The dynamic structural basis of differential enhancement of conformational stability by 5'- and 3'-dangling ends in RNA</p></title><aug><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Zhao</snm><fnm>L</fnm></au><au><snm>Xia</snm><fnm>T</fnm></au></aug><source>Biochemistry</source><pubdate>2008</pubdate><volume>47</volume><fpage>5962</fpage><lpage>5975</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/bi800210t</pubid><pubid idtype="pmpid" link="fulltext">18457418</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>A discipline of dynamic programming over sequence data</p></title><aug><au><snm>Giegerich</snm><fnm>R</fnm></au><au><snm>Meyer</snm><fnm>C</fnm></au><au><snm>Steffen</snm><fnm>P</fnm></au></aug><source>Science of Computer Programming</source><pubdate>2004</pubdate><volume>51</volume><issue>3</issue><fpage>215</fpage><lpage>263</lpage><xrefbib><pubid idtype="doi">10.1016/j.scico.2003.12.005</pubid></xrefbib></bibl><bibl id="B15"><title><p>Yield grammar analysis in the Bellman's GAP compiler</p></title><aug><au><snm>Giegerich</snm><fnm>R</fnm></au><au><snm>Sautho</snm><fnm>G</fnm></au></aug><source>Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications</source><publisher>LDTA 2011, ACM</publisher><pubdate>2011</pubdate></bibl><bibl id="B16"><title><p>Bellman's GAP - A Declarative Language for Dynamic Programming</p></title><aug><au><snm>Sauthoff</snm><fnm>G</fnm></au><au><snm>Janssen</snm><fnm>S</fnm></au><au><snm>Giegerich</snm><fnm>R</fnm></au></aug><source>13th International ACM SIGPLAN Symposium on Principles and Practice of Declarative Programming, PPDP</source><pubdate>2011</pubdate><note>ACM 2011</note></bibl><bibl id="B17"><title><p>Analysis and classification of RNA tertiary structures</p></title><aug><au><snm>Abraham</snm><fnm>M</fnm></au><au><snm>Dror</snm><fnm>O</fnm></au><au><snm>Nussinov</snm><fnm>R</fnm></au><au><snm>Wolfson</snm><fnm>H</fnm></au></aug><source>RNA</source><pubdate>2008</pubdate><volume>14</volume><issue>11</issue><fpage>2274</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1261/rna.853208</pubid><pubid idtype="pmcid">2578864</pubid><pubid idtype="pmpid" link="fulltext">18824509</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>The protein data bank</p></title><aug><au><snm>Berman</snm><fnm>H</fnm></au><au><snm>Westbrook</snm><fnm>J</fnm></au><au><snm>Feng</snm><fnm>Z</fnm></au><au><snm>Gilliland</snm><fnm>G</fnm></au><au><snm>Bhat</snm><fnm>T</fnm></au><au><snm>Weissig</snm><fnm>H</fnm></au><au><snm>Shindyalov</snm><fnm>I</fnm></au><au><snm>Bourne</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2000</pubdate><volume>28</volume><fpage>235</fpage><lpage>242</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/28.1.235</pubid><pubid idtype="pmcid">102472</pubid><pubid idtype="pmpid" link="fulltext">10592235</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>The RCSB Protein Data Bank: redesigned web site and web services</p></title><aug><au><snm>Rose</snm><fnm>P</fnm></au><au><snm>Beran</snm><fnm>B</fnm></au><au><snm>Bi</snm><fnm>C</fnm></au><au><snm>Bluhm</snm><fnm>W</fnm></au><au><snm>Dimitropoulos</snm><fnm>D</fnm></au><au><snm>Goodsell</snm><fnm>D</fnm></au><au><snm>Prlic</snm><fnm>A</fnm></au><au><snm>Quesada</snm><fnm>M</fnm></au><au><snm>Quinn</snm><fnm>G</fnm></au><au><snm>Westbrook</snm><fnm>J</fnm></au><au><snm>Young</snm><fnm>J</fnm></au><au><snm>Yukich</snm><fnm>B</fnm></au><au><snm>Zardecki</snm><fnm>C</fnm></au><au><snm>Berman</snm><fnm>H</fnm></au><au><snm>Bourne</snm><fnm>P</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>39</volume><fpage>D392</fpage><lpage>401</lpage><xrefbib><pubidlist><pubid idtype="pmcid">3013649</pubid><pubid idtype="pmpid" link="fulltext">21036868</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>FR3D: finding local and composite recurrent structural motifs in RNA 3D structures</p></title><aug><au><snm>Sarver</snm><fnm>M</fnm></au><au><snm>Zirbel</snm><fnm>CL</fnm></au><au><snm>Stombaugh</snm><fnm>J</fnm></au><au><snm>Mokdad</snm><fnm>A</fnm></au><au><snm>Leontis</snm><fnm>NB</fnm></au></aug><source>J Math Biol</source><pubdate>2008</pubdate><volume>56</volume><issue>1-2</issue><fpage>215</fpage><lpage>252</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2837920</pubid><pubid idtype="pmpid" link="fulltext">17694311</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Frequency and isostericity of RNA base pairs</p></title><aug><au><snm>Stombaugh</snm><fnm>J</fnm></au><au><snm>Zirbel</snm><fnm>CL</fnm></au><au><snm>Westhof</snm><fnm>E</fnm></au><au><snm>Leontis</snm><fnm>NB</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><issue>7</issue><fpage>2294</fpage><lpage>2312</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkp011</pubid><pubid idtype="pmcid">2673412</pubid><pubid idtype="pmpid" link="fulltext">19240142</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Quantitative analysis of nucleic acid three-dimensional structures</p></title><aug><au><snm>Gendron</snm><fnm>P</fnm></au><au><snm>Lemieux</snm><fnm>S</fnm></au><au><snm>Major</snm><fnm>F</fnm></au></aug><source>J Mol Biol</source><pubdate>2001</pubdate><volume>308</volume><issue>5</issue><fpage>919</fpage><lpage>936</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.2001.4626</pubid><pubid idtype="pmpid" link="fulltext">11352582</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>RNA STRAND: The RNA Secondary Structure and Statistical Analysis Database</p></title><aug><au><snm>Andronescu</snm><fnm>M</fnm></au><au><snm>Bereg</snm><fnm>V</fnm></au><au><snm>Hoos</snm><fnm>H</fnm></au><au><snm>Condon</snm><fnm>A</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><fpage>340</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-9-340</pubid><pubid idtype="pmcid">2536673</pubid><pubid idtype="pmpid" link="fulltext">18700982</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Prediction of RNA secondary structure using generalized centroid estimators</p></title><aug><au><snm>Hamada</snm><fnm>M</fnm></au><au><snm>Kiryu</snm><fnm>H</fnm></au><au><snm>Sato</snm><fnm>K</fnm></au><au><snm>Mituyama</snm><fnm>T</fnm></au><au><snm>Asai</snm><fnm>K</fnm></au></aug><source>Bioinformatics</source><pubdate>2009</pubdate><volume>25</volume><issue>4</issue><fpage>465</fpage><lpage>473</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btn601</pubid><pubid idtype="pmpid" link="fulltext">19095700</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>CONTRAfold: RNA secondary structure prediction without physics-based models</p></title><aug><au><snm>Do</snm><fnm>CB</fnm></au><au><snm>Woods</snm><fnm>DA</fnm></au><au><snm>Batzoglou</snm><fnm>S</fnm></au></aug><source>Bioinformatics</source><pubdate>2006</pubdate><volume>22</volume><issue>14</issue><fpage>e90</fpage><lpage>98</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btl246</pubid><pubid idtype="pmpid" link="fulltext">16873527</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Computational approaches for RNA energy parameter estimation</p></title><aug><au><snm>Andronescu</snm><fnm>M</fnm></au><au><snm>Condon</snm><fnm>A</fnm></au><au><snm>Hoos</snm><fnm>H</fnm></au><au><snm>Mathews</snm><fnm>DH</fnm></au><au><snm>Murphy</snm><fnm>KP</fnm></au></aug><source>RNA</source><pubdate>2010</pubdate><volume>16</volume><issue>12</issue><fpage>2304</fpage><lpage>2318</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1261/rna.1950510</pubid><pubid idtype="pmcid">2995392</pubid><pubid idtype="pmpid" link="fulltext">20940338</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Faster computation of exact RNA shape probabilities</p></title><aug><au><snm>Janssen</snm><fnm>S</fnm></au><au><snm>Giegerich</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><issue>5</issue><fpage>632</fpage><lpage>639</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/btq014</pubid><pubid idtype="pmcid">2828121</pubid><pubid idtype="pmpid" link="fulltext">20080511</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Mfold web server for nucleic acid folding and hybridization prediction</p></title><aug><au><snm>Zuker</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>3406</fpage><lpage>3415</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg595</pubid><pubid idtype="pmcid">169194</pubid><pubid idtype="pmpid" link="fulltext">12824337</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>UNAFold: software for nucleic acid folding and hybridization</p></title><aug><au><snm>Markham</snm><fnm>NR</fnm></au><au><snm>Zuker</snm><fnm>M</fnm></au></aug><source>Methods in molecular biology (Clifton, N.J.)</source><pubdate>2008</pubdate><volume>453</volume><fpage>3</fpage><lpage>31</lpage><xrefbib><pubid idtype="doi">10.1007/978-1-60327-429-6_1</pubid></xrefbib></bibl><bibl id="B30"><title><p>RNAstructure: software for RNA secondary structure prediction and analysis</p></title><aug><au><snm>Reuter</snm><fnm>J</fnm></au><au><snm>Mathews</snm><fnm>D</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><fpage>129</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-129</pubid><pubid idtype="pmcid">2984261</pubid><pubid idtype="pmpid" link="fulltext">20230624</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Coaxial stacking of helixes enhances binding of oligoribonucleotides and improves predictions of RNA folding</p></title><aug><au><snm>Walter</snm><fnm>A</fnm></au><au><snm>Turner</snm><fnm>D</fnm></au><au><snm>Kim</snm><fnm>J</fnm></au><au><snm>Lyttle</snm><fnm>M</fnm></au><au><snm>M&#252;ller</snm><fnm>P</fnm></au><au><snm>Mathews</snm><fnm>D</fnm></au><au><snm>Zuker</snm><fnm>M</fnm></au></aug><source>Proc Nat Acad Sci USA</source><pubdate>1994</pubdate><volume>91</volume><fpage>9218</fpage><lpage>9222</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.91.20.9218</pubid><pubid idtype="pmcid">44783</pubid><pubid idtype="pmpid" link="fulltext">7524072</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs</p></title><aug><au><snm>Xia</snm><fnm>T</fnm></au><au><snm>SantaLucia</snm><fnm>J</fnm></au><au><snm>Burkard</snm><fnm>M</fnm></au><au><snm>Kierzek</snm><fnm>R</fnm></au><au><snm>Schroeder</snm><fnm>S</fnm></au><au><snm>Jiao</snm><fnm>X</fnm></au><au><snm>Cox</snm><fnm>C</fnm></au><au><snm>Turner</snm><fnm>D</fnm></au></aug><source>Biochemistry</source><pubdate>1998</pubdate><volume>37</volume><fpage>14719</fpage><lpage>14735</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/bi9809425</pubid><pubid idtype="pmpid" link="fulltext">9778347</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Strong correlation between SHAPE chemistry and the generalized NMR order parameter (S2) in RNA</p></title><aug><au><snm>Gherghe</snm><fnm>C</fnm></au><au><snm>Shajani</snm><fnm>Z</fnm></au><au><snm>Wilkinson</snm><fnm>K</fnm></au><au><snm>Varani</snm><fnm>G</fnm></au><au><snm>Weeks</snm><fnm>K</fnm></au></aug><source>J Am Chem Soc</source><pubdate>2008</pubdate><volume>130</volume><issue>37</issue><fpage>12244</fpage><lpage>5</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1021/ja804541s</pubid><pubid idtype="pmcid">2712629</pubid><pubid idtype="pmpid" link="fulltext">18710236</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>A comprehensive comparison of comparative RNA structure prediction approaches</p></title><aug><au><snm>Gardner</snm><fnm>P</fnm></au><au><snm>Giegerich</snm><fnm>R</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2004</pubdate><volume>5</volume><fpage>140</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-5-140</pubid><pubid idtype="pmcid">526219</pubid><pubid idtype="pmpid" link="fulltext">15458580</pubid></pubidlist></xrefbib></bibl></refgrp>
   </bm>
</art>