<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-357</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>An evolutionary method for learning HMM structure: prediction of protein secondary structure</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Won</snm>
               <fnm>Kyoung-Jae</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <insr iid="I3"/>
               <email>won@binf.ku.dk</email>
            </au>
            <au id="A2">
               <snm>Hamelryck</snm>
               <fnm>Thomas</fnm>
               <insr iid="I1"/>
               <email>thamelry@binf.ku.dk</email>
            </au>
            <au id="A3">
               <snm>Pr&#252;gel-Bennett</snm>
               <fnm>Adam</fnm>
               <insr iid="I2"/>
               <email>apb@ecs.soton.ac.uk</email>
            </au>
            <au id="A4">
               <snm>Krogh</snm>
               <fnm>Anders</fnm>
               <insr iid="I1"/>
               <email>krogh@binf.ku.dk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen, Denmark</p>
            </ins>
            <ins id="I2">
               <p>School of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK</p>
            </ins>
            <ins id="I3">
               <p>Department of Chemistry &amp; Biochemistry, UCSD, 9500 Gilman Drive, Mail Code 0359, La Jolla, CA, 92093-0359, USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>357</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/357</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17888163</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-357</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>28</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>21</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>21</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Won et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs).</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>In the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Prediction of protein secondary structure is an important step towards understanding protein structure and function from protein sequences. This task has attracted considerable attention and consequently represents one of the most studied problems in bioinformatics. Early prediction methods were developed based on stereochemical principles <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and statistics <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. Since then the prediction rate has steadily risen due to both algorithmic development and the proliferation of the available data. The first machine learning predictions of secondary structure were done using neural networks <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>. Later methods using neural networks include PHD <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, PSIPRED <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, SSpro <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, SSpro8 <abbrgrp><abbr bid="B9">9</abbr></abbrgrp> and YASPIN <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Support vector machines have also been used and show promising results <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Recently, the prediction accuracy has been improved by cascading a second layer of support vector machines <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. The currently used machine learning methods typically improve their performance by combining several predictors and using evolutionary information obtained from PSI-BLAST <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Combining results from different predictors has been shown to improve the performance of secondary structure prediction <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
         <p>Even though Hidden Markov Models (HMMs) have been successfully applied to many problems in biological sequence modelling, they have not been used much for protein secondary structure prediction. Asai et al. suggested the first HMM for the prediction of protein secondary structure <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Later, an HMM with a hierarchical structure was suggested <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. However, both predictors had limited accuracy.</p>
         <p>HMMSTR <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> is a successful HMM predictor for this problem. It was constructed by identifying recurring protein backbone motifs (called invariant/initiation sites or I-sites) and representing them as a Markov chain. Consequently, the topology of HMMSTR can be interpreted as a description of the protein backbone in terms of consecutive I-sites. YASPIN <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, which is one of the most recent methods, builds on a combination of hidden Markov models and neural networks <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
         <p>In this paper, we report a new method for optimizing the structure of HMMs for secondary structure prediction. Over the last couple of years we have developed a method for optimizing the structure of HMMs automatically using Genetic Algorithms (GAs) <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. In previous work, we applied this method to promoter finding in DNA. Here, we use the evolutionary method to optimize the structure of an HMM for secondary structure prediction. During the evolutionary optimization, the HMM's structure is assembled from biologically meaningful building blocks <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Hence, we call our evolutionary method Block-HMM. The evolved HMM using the Block-HMM remodels the training protein sequences and shows the prediction probability of the secondary conformations calculated for each amino acid.</p>
         <p>In the literature, we have found a few HMM structure learning methods. Stolcke developed a state merging method, which starts from an HMM with a large number of states <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. On the other hand, a state splitting method was suggested in <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. A structure evolving method using GAs was first suggested to change the structure of a TATA box HMM <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Later, they upgraded the HMM structure learning method considering statistical significance <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. A structure evolving method using a genetic programming was also suggested, in which the HMM structures is represented by probabilistic trees <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. The evolving method was also applied to protein secondary structure prediction. Thomsen suggested a GA very similar to Yada et al. and achieved 49% prediction accuracy <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>.</p>
         <p>Our structure learning method is different from previous methods in that we use block models inspired by HMM applications used in biological sequence analysis. Instead of crossing over arbitrary number of states, we cross a number of blocks over. This enables different number of states to be exchanged through the crossover operation. Mutation occurs in a limited area that adding or deleting transitions do not break the property of blocks. As a result, our approach makes use of characteristics of HMM modularity more strategically than previously suggested genetic methods. Genetic programming methods <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> encode HMM networks with probabilistic trees. Linguistic representations were derived from each particular HMM topology. Similar to genetic programming method, our approach encodes several types of blocks into linguistic forms <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. The basic shapes of linguistic blocks are different each other in both of the methods. The encoding differences effect the searching space of a topology evolution. It also suggests that various types of topological encoding may be useful for other problems.</p>
         <p>We analyze one of the evolved HMM structure under the single-sequence condition. We also test it under the multiple-sequence condition after designing a whole predictor using an ensemble of three independently trained predictors as well as simple neural networks.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Block-HMM for labelled sequences</p>
            </st>
            <p>Block-HMM restricts its search to a subset of HMM topologies made up of blocks of states. Each block is assigned a label that corresponds to one of the three secondary structure classes. The states that make up the blocks emit amino acid symbols. Secondary structure prediction is done by inferring the values of the hidden states for a given amino acid sequence, and examining the secondary structure labels of the blocks these states belong to. Four types of blocks are used: linear, self-loop, forward-jump blocks and zero blocks (figure <figr fid="F1">1</figr>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>HMM blocks that compose the whole HMM structure</p>
               </caption>
               <text>
                  <p><b>HMM blocks that compose the whole HMM structure</b>. (a) linear block (b) self-loop block (tying is optional) (c) forward-jump block (tying is optional) (d) zero block.</p>
               </text>
               <graphic file="1471-2105-8-357-1"/>
            </fig>
            <p>Linear blocks consist of <it>N </it>states (labelled from 1 to <it>N</it>) where state <it>n </it>is only connected to state <it>n </it>+ 1 (with 1 &#8804; <it>n </it>&lt;<it>N</it>). Self-loop blocks are linear blocks in which each state has an additional loop to itself. A forward-jump block is a linear block where the first state is also connected to the last <it>M </it>states (with 1 &lt;= <it>M </it>&lt;<it>N</it>). Zero blocks are empty blocks with no states: they can replace other block types during the GA procedure and thus allow the exploration of simpler topologies.</p>
            <p>The self-loop and forward-jump blocks can be either tied (in the figures, tied blocks are shaded) or untied. When a block is tied all the emission and transition probabilities of states inside the block are equal. In the case of linear blocks we did not consider tying because tying a linear blocks is equivalent to a single-state self-loop block.</p>
            <p>The various blocks can model different types of sequence fragments. A linear block can model a particular conserved sequence pattern. The self-loop block can model a sequence of any length, while the forward-jump block can be used to represent subsequences with varying length up to some fixed length. Initially, the blocks are fully linked to form HMM architectures. In this context, fully linked means that the end state of each block is connected to the starting states of all other blocks and itself. Each block is labelled with one of the three protein structure classes 'H' (helix), 'E' (strand), or 'C' (coil). Figure <figr fid="F2">2</figr> shows a simple example of HMM structure. The HMM structure is composed of 3 blocks. From the left it has blocks labelled with 'H','C' and 'E'. Each block also can be tied. After training, most of the transition probabilities are close to zero, resulting in a final structure that is typically much simpler than the fully connected HMM shown in the figure.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>An example of an HMM composed of blocks resulting from the Block-HMM procedure</p>
               </caption>
               <text>
                  <p><b>An example of an HMM composed of blocks resulting from the Block-HMM procedure</b>. Three blocks are used in this model and all the blocks are fully connected to each other. The blocks are divided by dotted lines. The states in tied blocks are shaded in grey.</p>
               </text>
               <graphic file="1471-2105-8-357-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Genetic operators for Block-HMM</p>
            </st>
            <p>Genetic algorithms evolve a population of solutions with genetic operators. Inside the genetic cycle, genetic operators select members of the population (called parents) and evolve them to produce new members (called children). New children after the genetic operators along with the remaining old members in a population are evaluated to calculate fitness. According to the fitness selection procedure select a number of members in a population for the next genetic cycle.</p>
            <p>We used three genetic operators in Block-HMM: crossover, mutation and type-mutation. The number of blocks is kept fixed but the number of the states of an HMM can be changed by the genetic operators. Crossover swaps a number of blocks in two parents to create two children. The crossover points and the number of blocks are chosen randomly. Figure <figr fid="F3">3</figr> shows an example of the crossover scheme. The last block of the first child crosses with the first block of the second child. To simplify the diagram, transitions between blocks are not shown here. The crossover operator enables HMMs to exchange states without breaking basic blocks. Several blocks can be chosen to be crossed, which allows GA to search broad area of solution space. Mutations can take place inside any block of the HMM. A forward-jump block can have 6 different types of mutation, which are illustrated in figure <figr fid="F4">4</figr>. It can delete or insert transition (figure <figr fid="F4">4(a),(b)</figr>), delete one state (figure <figr fid="F4">4(c),(d)</figr>), and add one state (figure <figr fid="F4">4(e),(f)</figr>). For linear and self-loop blocks, only adding and deleting a state are possible.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Crossover in Block-HMM</p>
               </caption>
               <text>
                  <p><b>Crossover in Block-HMM</b>. Crossover swaps the HMM states without changing the properties of an individual HMM block. Here, the last block of the first child crosses with the first block of the second child. To simplify the diagram, transitions between blocks are not shown.</p>
               </text>
               <graphic file="1471-2105-8-357-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Mutation in Block-HMM</p>
               </caption>
               <text>
                  <p><b>Mutation in Block-HMM</b>. Six possible types of mutations from a 5-state forward-jump block: (a) a transition from the first to the fourth state is deleted (b) a transition from the first to the third state is added (c) the second or the third state is deleted (d) the fourth state is deleted (e) a state is added between the fourth and the fifth state (f) a state is added between the first and the fourth state.</p>
               </text>
               <graphic file="1471-2105-8-357-4"/>
            </fig>
            <p>In addition to changing the length of a block and its transitions, we also allow another form of mutation, called <it>type-mutation</it>, that changes the type or label of a block. Type-mutation to a zero block is also allowed (figure <figr fid="F5">5</figr>). When a type mutation transforms the type of a block, new transition probabilities are generated randomly. Self-loop and forward-jump blocks can type-mutate between tied and untied versions. Zero-blocks can be type-mutated to any of the other block forms.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Type-mutation in Block-HMM</p>
               </caption>
               <text>
                  <p><b>Type-mutation in Block-HMM</b>. A forward jump block is type mutated (a) to a tied block (b) to a block with a different label (c) to a zero block (d) to a self loop block or a linear block.</p>
               </text>
               <graphic file="1471-2105-8-357-5"/>
            </fig>
            <p>We ran the GA that hybridize the parameter learning method with these genetic operators that train the structure of HMMs. The detailed description of the whole procedure is on Methods.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the evolved HMM</p>
            </st>
            <sec>
               <st>
                  <p>The evolved model</p>
               </st>
               <p>Figure <figr fid="F6">6</figr> illustrates the structure of the best result of Block-HMMs. The simulation used 30 blocks, but the result shows only 26 blocks: the remaining 4 are zero blocks. Figure <figr fid="F7">7</figr> shows the full HMM structure. Assigned with each state is one of the label of 3 states of secondary structure <it>l </it>&#8712; {<it>H</it>, <it>E</it>, <it>x</it>}. It is composed of 22 states for helix (<it>H</it>), 15 for <it>&#946;</it>-strand (<it>E</it>), and 15 for coil (<it>x</it>) region. Each state emits a set of symbols of 20 amino acids according to the given probability. The full HMM structure is trained using 1662 sequences (see Methods).</p>
               <fig id="F6">
                  <title>
                     <p>Figure 6</p>
                  </title>
                  <caption>
                     <p>The best HMM topology</p>
                  </caption>
                  <text>
                     <p><b>The best HMM topology</b>. The best HMM topology evolved using Block-HMM. It is composed of 26 non-zero blocks and 52 states. Transitions between blocks are not shown here (including the transition from a block to itself). On each state a label is assigned ('H' for helices, 'E' for <it>&#946;</it>-strands and 'x' for coils). Helix states are red colored and <it>&#946;</it>-strand states are blue colored.</p>
                  </text>
                  <graphic file="1471-2105-8-357-6"/>
               </fig>
               <fig id="F7">
                  <title>
                     <p>Figure 7</p>
                  </title>
                  <caption>
                     <p>The full HMM structure</p>
                  </caption>
                  <text>
                     <p><b>The full HMM structure</b>. The full structure of the best HMM topology. Transitions over 0.1 are only shown. States for helix (H), <it>&#946;</it>-strand (E) coil (x) are colored with red, blue and white, respectively.</p>
                  </text>
                  <graphic file="1471-2105-8-357-7"/>
               </fig>
               <p>The evolved model contains the information of the protein secondary sequences in its structure and parameters. Firstly, we checked the distribution of emission probabilities to see how well the evolved model learned biological information. Table <tblr tid="T1">1</tblr> summarizes the characteristics of 51 states, presenting the probabilities of emitting hydrophobic, hydrophilic amino acids, <it>Gly </it>and <it>Pro</it>. In this table, the linear blocks for <it>&#946;</it>-strand (<it>i.e. state43-state44</it>, <it>state7-state8-state9</it>, <it>state34-state35</it>) shows the periodic hydrophilic and hydrophobic characteristics.</p>
               <tbl id="T1">
                  <title>
                     <p>Table 1</p>
                  </title>
                  <caption>
                     <p>Information of all the trained states</p>
                  </caption>
                  <tblbdy cols="7">
                     <r>
                        <c ca="center">
                           <p>state</p>
                        </c>
                        <c>
                           <p/>
                        </c>
                        <c ca="center">
                           <p>AA</p>
                        </c>
                        <c ca="center">
                           <p>H-phobic</p>
                        </c>
                        <c ca="center">
                           <p>H-philic</p>
                        </c>
                        <c ca="center">
                           <p>G</p>
                        </c>
                        <c ca="center">
                           <p>P</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="7">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state0</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.4%</p>
                        </c>
                        <c ca="center">
                           <p>77.8%</p>
                        </c>
                        <c ca="center">
                           <p>21.7%</p>
                        </c>
                        <c ca="center">
                           <p>0.5%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state1</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.3%</p>
                        </c>
                        <c ca="center">
                           <p>65.3%</p>
                        </c>
                        <c ca="center">
                           <p>24.6%</p>
                        </c>
                        <c ca="center">
                           <p>8.6%</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state2</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>0.9%</p>
                        </c>
                        <c ca="center">
                           <p>80.1%</p>
                        </c>
                        <c ca="center">
                           <p>14.2%</p>
                        </c>
                        <c ca="center">
                           <p>4.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.9%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state3</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.3%</p>
                        </c>
                        <c ca="center">
                           <p>64.6%</p>
                        </c>
                        <c ca="center">
                           <p>27.4%</p>
                        </c>
                        <c ca="center">
                           <p>7.4%</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state4</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>2.1%</p>
                        </c>
                        <c ca="center">
                           <p>36.6%</p>
                        </c>
                        <c ca="center">
                           <p>53.2%</p>
                        </c>
                        <c ca="center">
                           <p>4.8%</p>
                        </c>
                        <c ca="center">
                           <p>5.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state5</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>0.5%</p>
                        </c>
                        <c ca="center">
                           <p>70.3%</p>
                        </c>
                        <c ca="center">
                           <p>28.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.2%</p>
                        </c>
                        <c ca="center">
                           <p>0.7%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state6</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>2.1%</p>
                        </c>
                        <c ca="center">
                           <p>90.4%</p>
                        </c>
                        <c ca="center">
                           <p>9.2%</p>
                        </c>
                        <c ca="center">
                           <p>0.3%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state7</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>92.7%</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                        <c ca="center">
                           <p>5.5%</p>
                        </c>
                        <c ca="center">
                           <p>0.3%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state8</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.6%</p>
                        </c>
                        <c ca="center">
                           <p>48.5%</p>
                        </c>
                        <c ca="center">
                           <p>47.8%</p>
                        </c>
                        <c ca="center">
                           <p>3.3%</p>
                        </c>
                        <c ca="center">
                           <p>0.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state9</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>84.8%</p>
                        </c>
                        <c ca="center">
                           <p>10.8%</p>
                        </c>
                        <c ca="center">
                           <p>3.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state10</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>2.8%</p>
                        </c>
                        <c ca="center">
                           <p>82.2%</p>
                        </c>
                        <c ca="center">
                           <p>17.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state11</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>2.8%</p>
                        </c>
                        <c ca="center">
                           <p>8.7%</p>
                        </c>
                        <c ca="center">
                           <p>50.8%</p>
                        </c>
                        <c ca="center">
                           <p>12.8%</p>
                        </c>
                        <c ca="center">
                           <p>27.7%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state12</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>0.9%</p>
                        </c>
                        <c ca="center">
                           <p>16.3%</p>
                        </c>
                        <c ca="center">
                           <p>79.4%</p>
                        </c>
                        <c ca="center">
                           <p>1.8%</p>
                        </c>
                        <c ca="center">
                           <p>2.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state13</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>0.7%</p>
                        </c>
                        <c ca="center">
                           <p>53.8%</p>
                        </c>
                        <c ca="center">
                           <p>44.9%</p>
                        </c>
                        <c ca="center">
                           <p>1.2%</p>
                        </c>
                        <c ca="center">
                           <p>0.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state14</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>0.9%</p>
                        </c>
                        <c ca="center">
                           <p>86.1%</p>
                        </c>
                        <c ca="center">
                           <p>13.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state15</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>10.5%</p>
                        </c>
                        <c ca="center">
                           <p>26.1%</p>
                        </c>
                        <c ca="center">
                           <p>50.7%</p>
                        </c>
                        <c ca="center">
                           <p>10.1%</p>
                        </c>
                        <c ca="center">
                           <p>13.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state16</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>2.9%</p>
                        </c>
                        <c ca="center">
                           <p>24.9%</p>
                        </c>
                        <c ca="center">
                           <p>45.9%</p>
                        </c>
                        <c ca="center">
                           <p>16.3%</p>
                        </c>
                        <c ca="center">
                           <p>13.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state17</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                        <c ca="center">
                           <p>27.1%</p>
                        </c>
                        <c ca="center">
                           <p>62.8%</p>
                        </c>
                        <c ca="center">
                           <p>7.7%</p>
                        </c>
                        <c ca="center">
                           <p>2.3%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state18</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                        <c ca="center">
                           <p>35.7%</p>
                        </c>
                        <c ca="center">
                           <p>59.3%</p>
                        </c>
                        <c ca="center">
                           <p>5.0%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state19</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.0%</p>
                        </c>
                        <c ca="center">
                           <p>28.1%</p>
                        </c>
                        <c ca="center">
                           <p>56.2%</p>
                        </c>
                        <c ca="center">
                           <p>6.2%</p>
                        </c>
                        <c ca="center">
                           <p>9.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state20</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                        <c ca="center">
                           <p>66.4%</p>
                        </c>
                        <c ca="center">
                           <p>27.3%</p>
                        </c>
                        <c ca="center">
                           <p>5.1%</p>
                        </c>
                        <c ca="center">
                           <p>1.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state21</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.1%</p>
                        </c>
                        <c ca="center">
                           <p>11.8%</p>
                        </c>
                        <c ca="center">
                           <p>75.0%</p>
                        </c>
                        <c ca="center">
                           <p>11.0%</p>
                        </c>
                        <c ca="center">
                           <p>2.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state22</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>2.2%</p>
                        </c>
                        <c ca="center">
                           <p>97.8%</p>
                        </c>
                        <c ca="center">
                           <p>2.1%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state23</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>2.2%</p>
                        </c>
                        <c ca="center">
                           <p>43.2%</p>
                        </c>
                        <c ca="center">
                           <p>51.1%</p>
                        </c>
                        <c ca="center">
                           <p>5.5%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state24</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.2%</p>
                        </c>
                        <c ca="center">
                           <p>92.4%</p>
                        </c>
                        <c ca="center">
                           <p>6.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.7%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state25</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.2%</p>
                        </c>
                        <c ca="center">
                           <p>38.9%</p>
                        </c>
                        <c ca="center">
                           <p>60.1%</p>
                        </c>
                        <c ca="center">
                           <p>0.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state26</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.2%</p>
                        </c>
                        <c ca="center">
                           <p>19.3%</p>
                        </c>
                        <c ca="center">
                           <p>79.0%</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state27</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>2.4%</p>
                        </c>
                        <c ca="center">
                           <p>62.0%</p>
                        </c>
                        <c ca="center">
                           <p>33.0%</p>
                        </c>
                        <c ca="center">
                           <p>4.9%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state28</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>2.0%</p>
                        </c>
                        <c ca="center">
                           <p>24.7%</p>
                        </c>
                        <c ca="center">
                           <p>54.8%</p>
                        </c>
                        <c ca="center">
                           <p>12.6%</p>
                        </c>
                        <c ca="center">
                           <p>7.9%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state29</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>2.0%</p>
                        </c>
                        <c ca="center">
                           <p>29.6%</p>
                        </c>
                        <c ca="center">
                           <p>45.0%</p>
                        </c>
                        <c ca="center">
                           <p>17.1%</p>
                        </c>
                        <c ca="center">
                           <p>8.4%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state30</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.3%</p>
                        </c>
                        <c ca="center">
                           <p>75.4%</p>
                        </c>
                        <c ca="center">
                           <p>20.8%</p>
                        </c>
                        <c ca="center">
                           <p>3.7%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>4.6%</p>
                        </c>
                        <c ca="center">
                           <p>22.5%</p>
                        </c>
                        <c ca="center">
                           <p>63.0%</p>
                        </c>
                        <c ca="center">
                           <p>6.1%</p>
                        </c>
                        <c ca="center">
                           <p>8.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state32</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.8%</p>
                        </c>
                        <c ca="center">
                           <p>20.2%</p>
                        </c>
                        <c ca="center">
                           <p>45.7%</p>
                        </c>
                        <c ca="center">
                           <p>10.5%</p>
                        </c>
                        <c ca="center">
                           <p>23.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state33</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.0%</p>
                        </c>
                        <c ca="center">
                           <p>63.2%</p>
                        </c>
                        <c ca="center">
                           <p>33.7%</p>
                        </c>
                        <c ca="center">
                           <p>2.3%</p>
                        </c>
                        <c ca="center">
                           <p>0.8%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state34</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.0%</p>
                        </c>
                        <c ca="center">
                           <p>95.4%</p>
                        </c>
                        <c ca="center">
                           <p>2.9%</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state35</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>1.0%</p>
                        </c>
                        <c ca="center">
                           <p>18.0%</p>
                        </c>
                        <c ca="center">
                           <p>65.5%</p>
                        </c>
                        <c ca="center">
                           <p>11.0%</p>
                        </c>
                        <c ca="center">
                           <p>5.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state36</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>1.6%</p>
                        </c>
                        <c ca="center">
                           <p>23.6%</p>
                        </c>
                        <c ca="center">
                           <p>65.9%</p>
                        </c>
                        <c ca="center">
                           <p>7.4%</p>
                        </c>
                        <c ca="center">
                           <p>3.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state37</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>1.4%</p>
                        </c>
                        <c ca="center">
                           <p>3.5%</p>
                        </c>
                        <c ca="center">
                           <p>40.3%</p>
                        </c>
                        <c ca="center">
                           <p>53.7%</p>
                        </c>
                        <c ca="center">
                           <p>2.5%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state38</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>1.6%</p>
                        </c>
                        <c ca="center">
                           <p>30.0%</p>
                        </c>
                        <c ca="center">
                           <p>57.4%</p>
                        </c>
                        <c ca="center">
                           <p>11.2%</p>
                        </c>
                        <c ca="center">
                           <p>1.4%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state39</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>15.7%</p>
                        </c>
                        <c ca="center">
                           <p>71.9%</p>
                        </c>
                        <c ca="center">
                           <p>2.8%</p>
                        </c>
                        <c ca="center">
                           <p>9.6%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state40</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.7%</p>
                        </c>
                        <c ca="center">
                           <p>27.8%</p>
                        </c>
                        <c ca="center">
                           <p>67.1%</p>
                        </c>
                        <c ca="center">
                           <p>2.7%</p>
                        </c>
                        <c ca="center">
                           <p>2.3%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state41</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.5%</p>
                        </c>
                        <c ca="center">
                           <p>76.5%</p>
                        </c>
                        <c ca="center">
                           <p>21.0%</p>
                        </c>
                        <c ca="center">
                           <p>2.6%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state42</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>1.4%</p>
                        </c>
                        <c ca="center">
                           <p>58.7%</p>
                        </c>
                        <c ca="center">
                           <p>40.8%</p>
                        </c>
                        <c ca="center">
                           <p>0.2%</p>
                        </c>
                        <c ca="center">
                           <p>0.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state43</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>2.0%</p>
                        </c>
                        <c ca="center">
                           <p>60.4%</p>
                        </c>
                        <c ca="center">
                           <p>34.5%</p>
                        </c>
                        <c ca="center">
                           <p>5.1%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state44</p>
                        </c>
                        <c ca="center">
                           <p>E</p>
                        </c>
                        <c ca="center">
                           <p>2.0%</p>
                        </c>
                        <c ca="center">
                           <p>30.5%</p>
                        </c>
                        <c ca="center">
                           <p>57.0%</p>
                        </c>
                        <c ca="center">
                           <p>5.6%</p>
                        </c>
                        <c ca="center">
                           <p>6.9%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state45</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                        <c ca="center">
                           <p>35.1%</p>
                        </c>
                        <c ca="center">
                           <p>64.3%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state46</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                        <c ca="center">
                           <p>77.6%</p>
                        </c>
                        <c ca="center">
                           <p>19.4%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                        <c ca="center">
                           <p>2.9%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state47</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>0.6%</p>
                        </c>
                        <c ca="center">
                           <p>14.6%</p>
                        </c>
                        <c ca="center">
                           <p>71.2%</p>
                        </c>
                        <c ca="center">
                           <p>2.1%</p>
                        </c>
                        <c ca="center">
                           <p>12.2%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state48</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>3.6%</p>
                        </c>
                        <c ca="center">
                           <p>21.9%</p>
                        </c>
                        <c ca="center">
                           <p>74.3%</p>
                        </c>
                        <c ca="center">
                           <p>3.0%</p>
                        </c>
                        <c ca="center">
                           <p>0.7%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state49</p>
                        </c>
                        <c ca="center">
                           <p>H</p>
                        </c>
                        <c ca="center">
                           <p>3.5%</p>
                        </c>
                        <c ca="center">
                           <p>62.1%</p>
                        </c>
                        <c ca="center">
                           <p>34.9%</p>
                        </c>
                        <c ca="center">
                           <p>3.0%</p>
                        </c>
                        <c ca="center">
                           <p>0.1%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state50</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>4.7%</p>
                        </c>
                        <c ca="center">
                           <p>51.4%</p>
                        </c>
                        <c ca="center">
                           <p>32.0%</p>
                        </c>
                        <c ca="center">
                           <p>12.9%</p>
                        </c>
                        <c ca="center">
                           <p>3.7%</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state51</p>
                        </c>
                        <c ca="center">
                           <p>x</p>
                        </c>
                        <c ca="center">
                           <p>3.2%</p>
                        </c>
                        <c ca="center">
                           <p>27.6%</p>
                        </c>
                        <c ca="center">
                           <p>57.0%</p>
                        </c>
                        <c ca="center">
                           <p>15.4%</p>
                        </c>
                        <c ca="center">
                           <p>0.0%</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>At <it>state11 </it>and <it>state32 </it>we found a strong probability of <it>Pro</it>. Among 13637 visits on <it>state11 </it>we found <it>Pro </it>3765 times (= 27.6%) in the generated sequences, which closely matches with the emission probability of 27.7%. <it>State11 </it>usually modelled 'xxx' (2783 times, 73.9%), 'xxH' (685 times, 18.2%), 'xxE' (286 times, 7.6%), and at the end of the sequence ('xx', 11 times). This indicates that <it>state11 </it>is used to link a coil with other compositions. In the case of <it>state 32</it>, <it>Pro </it>was usually used to model 'xHH' (1828 cases out of 2084, 87.7%) or 'HHH' (205 cases out of 2084, 9.8%).</p>
               <p><it>Gly </it>was found strong on <it>state37 </it>and <it>state45</it>. We found <it>Gly </it>on <it>state37 </it>is only between two coil conformations (3570 times). <it>Gly </it>on <it>state45 </it>worked in the apposite way to <it>Pro </it>on <it>state11</it>, producing 'Hxx' (710 times out of 2018, 35.2%), 'xxx' (1300 times, 64.4%), 'Exx' (2 times 0.1%), and at the end of the sequences.</p>
               <p>We examined overall distribution of the emission probabilities in the evolved HMM. We averaged the emission probabilities of all the states assigned to the same secondary label. Figure <figr fid="F8">8</figr> shows the average distribution of emission probabilities for helix, <it>&#946;</it>-strand and coil states. For helices <it>Ala </it>and <it>Leu </it>are stronger than other amino acids. <it>Gly </it>and <it>Pro </it>are shown prominently in coils and <it>Val </it>is strong in <it>&#946;</it>-strands.</p>
               <fig id="F8">
                  <title>
                     <p>Figure 8</p>
                  </title>
                  <caption>
                     <p>The averaged emission probabilities of all the states</p>
                  </caption>
                  <text>
                     <p><b>The averaged emission probabilities of all the states</b>. The averaged emission probabilities of all the states. Emission probabilities from the states that share the same secondary structural label are averaged.</p>
                  </text>
                  <graphic file="1471-2105-8-357-8"/>
               </fig>
            </sec>
            <sec>
               <st>
                  <p>The HMM's grammar</p>
               </st>
               <p>We evaluated how well the evolved HMM models general features of protein structure. We generated 1662 (the number of training sequences) random sequences from the evolved HMM. We set the length of the generated sequences to be the average length of the training sequences. The third column of Table <tblr tid="T2">2</tblr> shows how much each state is used to generate the random sequence. In the generated sequence the overall secondary structure contents are 35.5% of helices, 23.5% of <it>&#946;</it>-strands, and 44.5% of coils. This shows that the evolved HMM closely remodels the training sequences composed of 35.3% of helices, 22.8% of <it>&#946;</it>-strands, and 41.9% of coils. Figure <figr fid="F9">9</figr> shows the length distributions of helices, <it>&#946;</it>-strands, and coils in the training dataset and the generated set. The distributions closely match each other for three of the cases. The length distribution confirms that a block or a group of blocks model the grammar of protein secondary structure quite closely. We checked how the evolved HMM expresses the grammar of protein sequence in its structure. From the generated sequences we counted the transitions from one block to the other blocks. Table <tblr tid="T2">2</tblr> shows summarizes the number of times each block transition is used in the generated sequences and the probability of the transition to be made on each state. We showed only dominant grammars that have been visited more than 2000 times. This result shows how the blocks are used to model the sequences. <it>State0 </it>is in a helix block and usually used with other helix blocks (<it>state17 </it>and <it>state39 </it>are in helix block). <it>State14 </it>has a strong transition from helix to coil and <it>state23 </it>has transitions to a helix block and a coil block. As a whole the transitions that link blocks with the same label are dominant. This seems to be because the HMM needs to model long secondary elements with very short blocks.</p>
               <tbl id="T2">
                  <title>
                     <p>Table 2</p>
                  </title>
                  <caption>
                     <p>The block transition</p>
                  </caption>
                  <tblbdy cols="3">
                     <r>
                        <c ca="center">
                           <p>block transition</p>
                        </c>
                        <c ca="center">
                           <p>percentage used on each state</p>
                        </c>
                        <c ca="center">
                           <p>number of times used in the generated sequence</p>
                        </c>
                     </r>
                     <r>
                        <c cspan="3">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state0 (H) &#8594; state17 (H)</p>
                        </c>
                        <c ca="center">
                           <p>36%</p>
                        </c>
                        <c ca="center">
                           <p>2468</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state0 (H) &#8594; state39 (H)</p>
                        </c>
                        <c ca="center">
                           <p>42%</p>
                        </c>
                        <c ca="center">
                           <p>2867</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state3 (H) &#8594; state1 (H)</p>
                        </c>
                        <c ca="center">
                           <p>56%</p>
                        </c>
                        <c ca="center">
                           <p>3477</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state6 (E) &#8594; state33 (E)</p>
                        </c>
                        <c ca="center">
                           <p>24%</p>
                        </c>
                        <c ca="center">
                           <p>2461</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state9 (E) &#8594; state27 (E)</p>
                        </c>
                        <c ca="center">
                           <p>30%</p>
                        </c>
                        <c ca="center">
                           <p>2377</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state9 (E) &#8594; state43 (E)</p>
                        </c>
                        <c ca="center">
                           <p>33%</p>
                        </c>
                        <c ca="center">
                           <p>2635</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state11 (x) &#8594; state15 (x)</p>
                        </c>
                        <c ca="center">
                           <p>30%</p>
                        </c>
                        <c ca="center">
                           <p>4143</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state11 (x) &#8594; state31 (x)</p>
                        </c>
                        <c ca="center">
                           <p>15%</p>
                        </c>
                        <c ca="center">
                           <p>2093</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state11 (x) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>16%</p>
                        </c>
                        <c ca="center">
                           <p>2165</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state14 (H) &#8594; state51 (x)</p>
                        </c>
                        <c ca="center">
                           <p>60%</p>
                        </c>
                        <c ca="center">
                           <p>2733</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state16 (x) &#8594; state4 (E)</p>
                        </c>
                        <c ca="center">
                           <p>24%</p>
                        </c>
                        <c ca="center">
                           <p>3310</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state16 (x) &#8594; state31 (x)</p>
                        </c>
                        <c ca="center">
                           <p>19%</p>
                        </c>
                        <c ca="center">
                           <p>2662</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state16 (x) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>35%</p>
                        </c>
                        <c ca="center">
                           <p>4809</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state18 (H) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>33%</p>
                        </c>
                        <c ca="center">
                           <p>2359</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state19 (E) &#8594; state7 (E)</p>
                        </c>
                        <c ca="center">
                           <p>42%</p>
                        </c>
                        <c ca="center">
                           <p>2086</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state21 (E) &#8594; state7 (E)</p>
                        </c>
                        <c ca="center">
                           <p>55%</p>
                        </c>
                        <c ca="center">
                           <p>2702</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state23 (H) &#8594; state12 (H)</p>
                        </c>
                        <c ca="center">
                           <p>31%</p>
                        </c>
                        <c ca="center">
                           <p>3347</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state23 (H) &#8594; state48 (H)</p>
                        </c>
                        <c ca="center">
                           <p>41%</p>
                        </c>
                        <c ca="center">
                           <p>4436</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state23 (H) &#8594; state51 (x)</p>
                        </c>
                        <c ca="center">
                           <p>21%</p>
                        </c>
                        <c ca="center">
                           <p>2272</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state26 (H) &#8594; state51 (x)</p>
                        </c>
                        <c ca="center">
                           <p>57%</p>
                        </c>
                        <c ca="center">
                           <p>3394</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state27 (E) &#8594; state15 (x)</p>
                        </c>
                        <c ca="center">
                           <p>26%</p>
                        </c>
                        <c ca="center">
                           <p>3005</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state27 (E) &#8594; state28 (x)</p>
                        </c>
                        <c ca="center">
                           <p>34%</p>
                        </c>
                        <c ca="center">
                           <p>3914</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state29 (x) &#8594; state28 (x)</p>
                        </c>
                        <c ca="center">
                           <p>22%</p>
                        </c>
                        <c ca="center">
                           <p>2193</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state29 (x) &#8594; state36 (x)</p>
                        </c>
                        <c ca="center">
                           <p>31%</p>
                        </c>
                        <c ca="center">
                           <p>3017</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state30 (H) &#8594; state48 (H)</p>
                        </c>
                        <c ca="center">
                           <p>75%</p>
                        </c>
                        <c ca="center">
                           <p>4577</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state0 (H)</p>
                        </c>
                        <c ca="center">
                           <p>21%</p>
                        </c>
                        <c ca="center">
                           <p>4564</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state10 (x)</p>
                        </c>
                        <c ca="center">
                           <p>12%</p>
                        </c>
                        <c ca="center">
                           <p>2588</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state31 (x)</p>
                        </c>
                        <c ca="center">
                           <p>16%</p>
                        </c>
                        <c ca="center">
                           <p>3555</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state32 (H)</p>
                        </c>
                        <c ca="center">
                           <p>21%</p>
                        </c>
                        <c ca="center">
                           <p>4551</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state39 (H)</p>
                        </c>
                        <c ca="center">
                           <p>11%</p>
                        </c>
                        <c ca="center">
                           <p>2408</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state31 (x) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>13%</p>
                        </c>
                        <c ca="center">
                           <p>2822</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state32 (H) &#8594; state17 (H)</p>
                        </c>
                        <c ca="center">
                           <p>45%</p>
                        </c>
                        <c ca="center">
                           <p>4062</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state38 (x) &#8594; state4 (E)</p>
                        </c>
                        <c ca="center">
                           <p>42%</p>
                        </c>
                        <c ca="center">
                           <p>3304</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state40 (H) &#8594; state41 (H)</p>
                        </c>
                        <c ca="center">
                           <p>66%</p>
                        </c>
                        <c ca="center">
                           <p>5482</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state42 (H) &#8594; state48 (H)</p>
                        </c>
                        <c ca="center">
                           <p>87%</p>
                        </c>
                        <c ca="center">
                           <p>6105</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state44 (E) &#8594; state27 (E)</p>
                        </c>
                        <c ca="center">
                           <p>30%</p>
                        </c>
                        <c ca="center">
                           <p>2996</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state49 (H) &#8594; state22 (H)</p>
                        </c>
                        <c ca="center">
                           <p>39%</p>
                        </c>
                        <c ca="center">
                           <p>6674</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state49 (H) &#8594; state24 (H)</p>
                        </c>
                        <c ca="center">
                           <p>26%</p>
                        </c>
                        <c ca="center">
                           <p>4508</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state49 (H) &#8594; state30 (H)</p>
                        </c>
                        <c ca="center">
                           <p>18%</p>
                        </c>
                        <c ca="center">
                           <p>3162</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state50 (x) &#8594; state10 (x)</p>
                        </c>
                        <c ca="center">
                           <p>13%</p>
                        </c>
                        <c ca="center">
                           <p>2807</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state50 (x) &#8594; state31 (x)</p>
                        </c>
                        <c ca="center">
                           <p>34%</p>
                        </c>
                        <c ca="center">
                           <p>7362</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state50 (x) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>17%</p>
                        </c>
                        <c ca="center">
                           <p>3772</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state51 (x) &#8594; state15 (x)</p>
                        </c>
                        <c ca="center">
                           <p>14%</p>
                        </c>
                        <c ca="center">
                           <p>2056</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state51 (x) &#8594; state31 (x)</p>
                        </c>
                        <c ca="center">
                           <p>25%</p>
                        </c>
                        <c ca="center">
                           <p>3597</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state51 (x) &#8594; state50 (x)</p>
                        </c>
                        <c ca="center">
                           <p>20%</p>
                        </c>
                        <c ca="center">
                           <p>2986</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>state51 (x) &#8594; state51 (x)</p>
                        </c>
                        <c ca="center">
                           <p>20%</p>
                        </c>
                        <c ca="center">
                           <p>2892</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <fig id="F9">
                  <title>
                     <p>Figure 9</p>
                  </title>
                  <caption>
                     <p>Histograms of secondary structure element length</p>
                  </caption>
                  <text>
                     <p><b>Histograms of secondary structure element length</b>. Histograms of the lengths of the secondary structure elements in the training set (white bars) and the generated set (black bars). It shows the probabilities of secondary structure element lengths in the generated sequence.</p>
                  </text>
                  <graphic file="1471-2105-8-357-9"/>
               </fig>
               <p>We checked if the model has grammar for short secondary elements. We found 57 'H-x-H' linked helices. For this sequence grammar we found an HMM grammar of '<it>state18</it>-<it>state50</it>-<it>state32</it>' 44 times (77.2%). We also checked how each coil-state contribute to this grammar. For the coil region <it>State50</it>, <it>state51 </it>and <it>state31 </it>are used 44 time, 6 time and 7 times, respectively. In the case of 666 'H-x-E' in the generated sequences, the dominant grammars is '<it>state18</it>-<it>state50</it>-<it>state43</it>' (27.6%). For the coil region <it>State51 </it>and <it>State50 </it>were used 90 times (13.5%) and 576 times (86.5%), respectively. Interestingly, <it>state31 </it>was not used for this grammar. For the grammar 'E-x-H', however, <it>state51 </it>was never used on the other hand. About 97.2% of the HMM grammar uses <it>state31 </it>(923 times out of 950) and 2.8% (27 times) was used by <it>state50</it>. This indicates that <it>state50</it>, <it>state51 </it>and <it>state31 </it>are used in a different way when they compose a sequence grammar. For the grammar 'H-xx-H', the dominant HMM grammar used for coil region was '<it>state51</it>-<it>state31</it>' (1175 times out of 2146 (54.8%)), '<it>state50</it>-<it>state31</it>' (19.5%) and '<it>state10</it>-<it>state11</it>' (22.2%). We checked how the HMM is organized to model hairpin structures. For a grammar 'E-x-E', <it>state50 </it>was found dominant (81.1% = 310 out of 382). <it>State51 </it>and <it>state31 </it>covered 3.7% and 15.2%, respectively. For the structure 'E-xx-E', '<it>state28</it>-<it>state29</it>' are mostly used (58.2% = 1830 out of 3142), followed by '<it>state15</it>-<it>state16</it>' (16.9%) and '<it>state36</it>-<it>state38</it>' (11.0%). The single state blocks (<it>state50</it>, <it>state51 </it>and <it>state31</it>) are instead rarely used. In the case of the structure 'E-xxx-E', '<it>state36</it>-<it>state37</it>-<it>state38</it>' covered 68.2% (1937 out of 2842) and <it>state15</it>-<it>state15</it>-<it>state16 </it>occupied 14.0%. Each of other compositions is less than 5%. We generated no sequence for the grammar 'x-H-x', which disobeys the grammar of protein secondary structure.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Prediction results with posterior decoding</p>
            </st>
            <p>The HMM predictor evolved using Block-HMM method calculates the probability of being in each secondary state. The posterior label probability (PLP) calculates probability of a label of each amino acid. The PLP of a label at position <it>t </it>is the sum of posterior probability of all states that emit the same label. The PLP for label <it>l </it>&#8712; {<it>H</it>, <it>E</it>, <it>C</it>} at position <it>t </it>is</p>
            <p>
               <display-formula id="M1">
                  <m:math name="1471-2105-8-357-i1" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mi>P</m:mi>
                           <m:mo stretchy="false">(</m:mo>
                           <m:msub>
                              <m:mi>y</m:mi>
                              <m:mi>t</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mi>l</m:mi>
                           <m:mo>|</m:mo>
                           <m:mi>x</m:mi>
                           <m:mo>,</m:mo>
                           <m:mi>&#920;</m:mi>
                           <m:mo stretchy="false">)</m:mo>
                           <m:mo>=</m:mo>
                           <m:mstyle displaystyle="true">
                              <m:munder>
                                 <m:mo>&#8721;</m:mo>
                                 <m:mrow>
                                    <m:mi>i</m:mi>
                                    <m:mo>&#8712;</m:mo>
                                    <m:mi mathvariant="script">Q</m:mi>
                                 </m:mrow>
                              </m:munder>
                              <m:mrow>
                                 <m:mi>p</m:mi>
                                 <m:mo stretchy="false">(</m:mo>
                                 <m:msub>
                                    <m:mi>y</m:mi>
                                    <m:mi>t</m:mi>
                                 </m:msub>
                                 <m:mo>=</m:mo>
                                 <m:mi>l</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:msub>
                                    <m:mi>q</m:mi>
                                    <m:mi>t</m:mi>
                                 </m:msub>
                                 <m:mo>=</m:mo>
                                 <m:mi>i</m:mi>
                                 <m:mo>|</m:mo>
                                 <m:mi>x</m:mi>
                                 <m:mo>,</m:mo>
                                 <m:mi>&#920;</m:mi>
                                 <m:mo stretchy="false">)</m:mo>
                              </m:mrow>
                           </m:mstyle>
                           <m:mo>.</m:mo>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGqbaucqGGOaakcqWG5bqEdaWgaaWcbaGaemiDaqhabeaakiabg2da9iabdYgaSjabcYha8Hqabiab=Hha4jabcYcaSiabfI5arjabcMcaPiabg2da9maaqafabaGaemiCaaNaeiikaGIaemyEaK3aaSbaaSqaaiabdsha0bqabaGccqGH9aqpcqWGSbaBcqGGSaalcqWGXbqCdaWgaaWcbaGaemiDaqhabeaakiabg2da9iabdMgaPjabcYha8jab=Hha4jabcYcaSiabfI5arjabcMcaPaWcbaGaemyAaKMaeyicI48enfgDOvwBHrxAJfwnHbqeg0uy0HwzTfgDPnwy1aaceaGae4heXhfabeqdcqGHris5aOGaeiOla4caaa@6132@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <b>x </b>is an amino acid sequence and <b>y </b>is a accompanying sequence labels of protein secondary structure conformation. &#920; is the evolved HMM, and <inline-formula><m:math name="1471-2105-8-357-i2" xmlns:m="http://www.w3.org/1998/Math/MathML"><m:semantics><m:mi mathvariant="script">Q</m:mi><m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaat0uy0HwzTfgDPnwy1egaryqtHrhAL1wy0L2yHvdaiqaacqWFqeFuaaa@3840@</m:annotation></m:semantics></m:math></inline-formula> is the set of all the states in the HMM.</p>
            <p>We assign each state to one of the classes in the secondary structure. That is, we take the probability of a label given a state to be 1 if the state is assigned to that class and 0 otherwise. Thus the sum in equation (1) only gets contributions from states that have been assigned to class <it>l</it>.</p>
            <p>Figure <figr fid="F10">10</figr> shows the PLP value along part of a protein sequence <it>1ciy</it>. The probability of each label is calculated and drawn in the graph. The dominant label is assigned to each amino acid as a prediction result.</p>
            <fig id="F10">
               <title>
                  <p>Figure 10</p>
               </title>
               <caption>
                  <p>The decoding result with posterior decoding</p>
               </caption>
               <text>
                  <p><b>The decoding result with posterior decoding</b>. The decoding result with posterior decoding. The PLP calculates probability of a label of each amino acid. The dominant label is assigned as a final prediction</p>
               </text>
               <graphic file="1471-2105-8-357-10"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Prediction under single-sequence condition</p>
            </st>
            <sec>
               <st>
                  <p>Cross-validation results</p>
               </st>
               <p>We conducted 5 cross-validation tests with very stringent dataset conditions (see Methods). By running Block-HMM we evolved HMM structures separately from each of the cross-validation test. Under the single-sequence condition, we achieved a overall prediction rate (<it>Q</it><sub>&#204;</sub>) of 68.3% using a single HMM predictor (Table <tblr tid="T3">3</tblr>).</p>
               <tbl id="T3">
                  <title>
                     <p>Table 3</p>
                  </title>
                  <caption>
                     <p>Prediction under the single-sequence condition</p>
                  </caption>
                  <tblbdy cols="9">
                     <r>
                        <c ca="center">
                           <p>Test</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>&#204;</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>H</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>E</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>C</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>H</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>E</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>C</it>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="9">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>5-fold cross-validation</p>
                        </c>
                        <c ca="center">
                           <p>68.3</p>
                        </c>
                        <c ca="center">
                           <p>65.9</p>
                        </c>
                        <c ca="center">
                           <p>56.4</p>
                        </c>
                        <c ca="center">
                           <p>74.8</p>
                        </c>
                        <c ca="center">
                           <p>63.9</p>
                        </c>
                        <c ca="center">
                           <p>63.8</p>
                        </c>
                        <c ca="center">
                           <p>59.8</p>
                        </c>
                        <c ca="center">
                           <p>65.8</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Non-common (Block-HMM)</p>
                        </c>
                        <c ca="center">
                           <p>68.6</p>
                        </c>
                        <c ca="center">
                           <p>67.6</p>
                        </c>
                        <c ca="center">
                           <p>58.0</p>
                        </c>
                        <c ca="center">
                           <p>74.1</p>
                        </c>
                        <c ca="center">
                           <p>64.1</p>
                        </c>
                        <c ca="center">
                           <p>64.9</p>
                        </c>
                        <c ca="center">
                           <p>61.2</p>
                        </c>
                        <c ca="center">
                           <p>65.4</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Non-common (PSIPRED)</p>
                        </c>
                        <c ca="center">
                           <p>67.3</p>
                        </c>
                        <c ca="center">
                           <p>65.8</p>
                        </c>
                        <c ca="center">
                           <p>58.9</p>
                        </c>
                        <c ca="center">
                           <p>70.5</p>
                        </c>
                        <c ca="center">
                           <p>63.6</p>
                        </c>
                        <c ca="center">
                           <p>64.2</p>
                        </c>
                        <c ca="center">
                           <p>60.7</p>
                        </c>
                        <c ca="center">
                           <p>63.1</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Common (Block-HMM)</p>
                        </c>
                        <c ca="center">
                           <p>69.0</p>
                        </c>
                        <c ca="center">
                           <p>66.1</p>
                        </c>
                        <c ca="center">
                           <p>56.6</p>
                        </c>
                        <c ca="center">
                           <p>76.3</p>
                        </c>
                        <c ca="center">
                           <p>63.6</p>
                        </c>
                        <c ca="center">
                           <p>63.4</p>
                        </c>
                        <c ca="center">
                           <p>59.8</p>
                        </c>
                        <c ca="center">
                           <p>66.7</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Common (PSIPRED)</p>
                        </c>
                        <c ca="center">
                           <p>67.6</p>
                        </c>
                        <c ca="center">
                           <p>63.4</p>
                        </c>
                        <c ca="center">
                           <p>56.0</p>
                        </c>
                        <c ca="center">
                           <p>73.8</p>
                        </c>
                        <c ca="center">
                           <p>63.1</p>
                        </c>
                        <c ca="center">
                           <p>62.8</p>
                        </c>
                        <c ca="center">
                           <p>58.2</p>
                        </c>
                        <c ca="center">
                           <p>63.8</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
            </sec>
            <sec>
               <st>
                  <p>Prediction comparison</p>
               </st>
               <p>We compared performance of the best HMM topology trained on all the 1662 training sequences with other predictors under the single-sequence condition. As a test set we used the data set published on October 2002 on the EVA server <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. From this we prepared two sets. Firstly, from 1828 sequences we deleted the common sequences with our training set and finally retrieved 1584 sequences (non-common set). Secondly, we only used the sequences which are common in our training set and PSIPRED training set and found 153 sequences (common set). Table <tblr tid="T3">3</tblr> shows the comparison with PSIPRED for the two tests. These tests at least show that Block-HMM has good performance as a secondary structure predictor.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Prediction under multi-sequence condition</p>
            </st>
            <p>We designed a whole secondary structure predictor using multiple sequences information. Structure-to-Structure layer is added to get more prediction. See Methods for more detail.</p>
            <sec>
               <st>
                  <p>Cross-validation &amp; comparison</p>
               </st>
               <p>Table <tblr tid="T4">4</tblr> shows the result of 5 cross-validation tests. By using multiple sequence alignment the <it>Q</it><sub>&#204; </sub>value increased about 6.8% (68.3% under single-sequence condition).</p>
               <tbl id="T4">
                  <title>
                     <p>Table 4</p>
                  </title>
                  <caption>
                     <p>Prediction under the multiple sequences condition</p>
                  </caption>
                  <tblbdy cols="9">
                     <r>
                        <c ca="center">
                           <p>Test</p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>&#204;</sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>H</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>E</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>Q</it>
                              <sub>
                                 <it>C</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>H</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>E</it>
                              </sub>
                           </p>
                        </c>
                        <c ca="center">
                           <p>
                              <it>SOV</it>
                              <sub>
                                 <it>C</it>
                              </sub>
                           </p>
                        </c>
                     </r>
                     <r>
                        <c cspan="9">
                           <hr/>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>5-fold cross-validation</p>
                        </c>
                        <c ca="center">
                           <p>75.1</p>
                        </c>
                        <c ca="center">
                           <p>67.8</p>
                        </c>
                        <c ca="center">
                           <p>70.8</p>
                        </c>
                        <c ca="center">
                           <p>77.5</p>
                        </c>
                        <c ca="center">
                           <p>71.7</p>
                        </c>
                        <c ca="center">
                           <p>68.4</p>
                        </c>
                        <c ca="center">
                           <p>73.4</p>
                        </c>
                        <c ca="center">
                           <p>69.6</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Non-Common (PSIPRED)</p>
                        </c>
                        <c ca="center">
                           <p>78.9</p>
                        </c>
                        <c ca="center">
                           <p>76.7</p>
                        </c>
                        <c ca="center">
                           <p>74.5</p>
                        </c>
                        <c ca="center">
                           <p>77.3</p>
                        </c>
                        <c ca="center">
                           <p>75.6</p>
                        </c>
                        <c ca="center">
                           <p>76.3</p>
                        </c>
                        <c ca="center">
                           <p>75.6</p>
                        </c>
                        <c ca="center">
                           <p>71.3</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Non-Common (YASPIN)</p>
                        </c>
                        <c ca="center">
                           <p>73.4</p>
                        </c>
                        <c ca="center">
                           <p>68.8</p>
                        </c>
                        <c ca="center">
                           <p>83.0</p>
                        </c>
                        <c ca="center">
                           <p>68.9</p>
                        </c>
                        <c ca="center">
                           <p>71.1</p>
                        </c>
                        <c ca="center">
                           <p>70.1</p>
                        </c>
                        <c ca="center">
                           <p>76.5</p>
                        </c>
                        <c ca="center">
                           <p>65.8</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Non-Common (BLOCK-HMM)</p>
                        </c>
                        <c ca="center">
                           <p>74.5</p>
                        </c>
                        <c ca="center">
                           <p>70.3</p>
                        </c>
                        <c ca="center">
                           <p>69.6</p>
                        </c>
                        <c ca="center">
                           <p>76.2</p>
                        </c>
                        <c ca="center">
                           <p>70.6</p>
                        </c>
                        <c ca="center">
                           <p>69.5</p>
                        </c>
                        <c ca="center">
                           <p>72.7</p>
                        </c>
                        <c ca="center">
                           <p>68.2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Common (PSIPRED)</p>
                        </c>
                        <c ca="center">
                           <p>79.5</p>
                        </c>
                        <c ca="center">
                           <p>74.6</p>
                        </c>
                        <c ca="center">
                           <p>71.7</p>
                        </c>
                        <c ca="center">
                           <p>79.6</p>
                        </c>
                        <c ca="center">
                           <p>75.8</p>
                        </c>
                        <c ca="center">
                           <p>74.4</p>
                        </c>
                        <c ca="center">
                           <p>73.2</p>
                        </c>
                        <c ca="center">
                           <p>72.6</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Common (YASPIN)</p>
                        </c>
                        <c ca="center">
                           <p>74.6</p>
                        </c>
                        <c ca="center">
                           <p>68.2</p>
                        </c>
                        <c ca="center">
                           <p>80.1</p>
                        </c>
                        <c ca="center">
                           <p>71.0</p>
                        </c>
                        <c ca="center">
                           <p>71.3</p>
                        </c>
                        <c ca="center">
                           <p>68.4</p>
                        </c>
                        <c ca="center">
                           <p>74.7</p>
                        </c>
                        <c ca="center">
                           <p>67.2</p>
                        </c>
                     </r>
                     <r>
                        <c ca="center">
                           <p>Common (BLOCK-HMM)</p>
                        </c>
                        <c ca="center">
                           <p>75.0</p>
                        </c>
                        <c ca="center">
                           <p>67.4</p>
                        </c>
                        <c ca="center">
                           <p>67.2</p>
                        </c>
                        <c ca="center">
                           <p>78.7</p>
                        </c>
                        <c ca="center">
                           <p>70.5</p>
                        </c>
                        <c ca="center">
                           <p>67.7</p>
                        </c>
                        <c ca="center">
                           <p>68.6</p>
                        </c>
                        <c ca="center">
                           <p>69.7</p>
                        </c>
                     </r>
                  </tblbdy>
               </tbl>
               <p>In an attempt to benchmark our method with existing predictors we compare our prediction results with those of YASPIN <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and PSIPRED. We asked Dr. Kuang Lin to train YASPIN with the same data set we used. The same dataset is used in running PSIPRED which has been already trained and publicly available. We used 2 data sets used in the test under the single-sequence condition. Table <tblr tid="T4">4</tblr> shows the benchmarking result. When we used non-common dataset the <it>Q</it><sub>&#204; </sub>rate of our method is about 1% better than that of YASPIN, though the <it>SOV </it>of YASPIN is about 0.5% higher. The <it>Q</it><sub><it>E </it></sub>of YASPIN is impressive, showing better performance than PSIPRED. Obviously, PSIPRED showed best performance. Next, we used the common set. Again, PSIPRED showed best performance, and the performance of Block-HMM is about 1% better than YASPIN. This result is interesting, considering that the performance of Block-HMM is better using same dataset under the single-sequence condition.</p>
            </sec>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>The predictor using HMMs inherited all the advantages of HMMs. Artificial protein sequence with secondary structure can be generated. The generated sequences show matched characteristics with the training dataset in the contents and the length distributions of the secondary conformation. Also, it is easy to see probabilistic reasoning of the prediction result. The analysis on the evolved model and the generated sequences shows that the evolving method successfully interprets the grammar of the protein sequences and converts it into the grammar of HMMs. It is more noteworthy considering that the grammar and biological information is constructed automatically without human intervention.</p>
         <p>Recently, an HMM based protein secondary predictor was hand designed and showed good performance in predicting <it>beta</it>-strands under single-sequence condition <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Also, structure learning method using <it>Bayesian information criterion </it>has been introduced <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. It increases the number of states while checking the optimal balance between fitting to the data and the HMM size. Our method has more operations to change HMM structure and we penalised the number of HMM structure by evaluating the trained model with the separated set. As shown in the test under the single-sequence condition, the overall prediction performance of an evolved HMM is quite excellent. We do not claim that our HMM is better under the single-sequence condition. The test set we used may be biased to HMMs. However, the result at least shows that the evolving method is a good way to design an HMM for this problem and further applications. In the case of testing under the multiple sequences condition, the performance of PSIPRED is obviously better than Block-HMM. The way of incorporating multiple sequences information as well as the structure-to-structure layer of PSIPRED works far better than our approach. Incorporating multiple sequences information remains further area of study. However, our result still comparable to YASPIN's result.</p>
         <p>Our method does not require a sliding window as most other secondary structure prediction methods do. The size of the window is chosen in order to obtain good performance (for example, PSIPRED has a window size of 15 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>). The evolving HMM method uses the whole sequence as input, which avoids the use of a fixed sequence window that might affect performance in specific cases.</p>
         <p>At present the Block-HMM method is relatively slow because it has to train and calculate fitness for all the HMM members in the population. Fortunately, the method is suitable for parallel computation. To evolve an HMM using GAs with 30 members in a population, we used 31 2.4 GHz P4 processors each with 512 Mb RAM run in parallel. Each processor trains one HMM. Ideally, the CPU time consumed in each processor is the time to train and evaluate an HMM multiplied by the number of iterations. It took about 7 hours to produce an HMM with 40 states. Prediction using three trained HMMs without evolutionary information takes about 30 seconds.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Optimizing HMM structures using an evolutionary algorithm has several benefits. First of all, the structure of an HMM is automatically evolved without prior knowledge. The success is remarkable given that other methods for secondary structure prediction require considerable calibration. Compared to the hand-designed HMMSTR <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, the evolutionary method produced good results with a smaller number of states. In the case of neural networks, the selection of the number of units needs careful attention. Here again, the evolving HMM method is an attractive alternative.</p>
         <p>Compared to other HMM structure evolving methods, our approach shows excellences. Thomsen's results for the secondary structure prediction (49%) indirectly tells that our method is very effective for the secondary structure prediction problem.</p>
         <p>The P.S.HMM (Protein Secondary structure predictor using HMMs)server is online, providing secondary structure prediction and probability of each secondary structure conformation. Protein dataset used in the test is found at http://binf.ku.dk/~won/proseq.tar.gz.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Data set</p>
            </st>
            <p>The SABMark Twilight Zone data set (version 1.63) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> provides a set of representative structures. This data set consists of 2230 high quality structures partitioned into 236 folds. Although many proteins in the data set share a common fold, no pair of protein sequences can be aligned with a BLAST E-value below 1 or a sequence identity above 25%. For the proteins with a common fold in the data set, it is not possible to identify a traceable evolutionary common origin.</p>
            <p>Structures that caused problems with the DSSP program (see below) or that had chain breaks were removed, which resulted in a final data set of 1662 structures belonging to 234 fold groups. Two fold groups are removed by this process because no structures remained in these groups. With these 234 groups we performed a five fold cross-validation test. In order to create a stringent test set we made sure that proteins with a common fold do not appear in both the training and test sets.</p>
            <p>The secondary structure was calculated using the program DSSP <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. DSSP assigns secondary structure to eight different classes: <it>&#945;</it>-helix (H), isolated <it>&#946;</it>-bridge (B), <it>&#946;</it>-strand (E), 3<sub>10</sub>-helix (G), &#928;-helix (I), turn (T), bend (S) and other. In this study, we used three classes: helix (consisting of DSSP classes H and G), strand (classes B and E) and coil (all other classes). The DSSP results were retrieved using the DSSP front end in the Biopython toolkit <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Training with Block-HMM</p>
            </st>
            <p>We have used a hybrid GA with traditional GA operators to explore the space of HMM topologies in combination with Baum-Welch optimization of the transition and emission probabilities.</p>
            <p>To obtain suitable HMM architectures we tested various numbers of blocks between 26 and 35. Labels are allocated randomly to each of the blocks. The size of the block, that is the number of states in a block, is randomly assigned between 1 and 4. Table <tblr tid="T5">5</tblr> shows the parameters used in the simulation.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Block-HMM parameters used in the experiment</p>
               </caption>
               <tblbdy cols="2">
                  <r>
                     <c ca="left">
                        <p>Parameter</p>
                     </c>
                     <c ca="center">
                        <p>value</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Population size</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Iteration</p>
                     </c>
                     <c ca="center">
                        <p>400</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of blocks in an HMM</p>
                     </c>
                     <c ca="center">
                        <p>26&#8211;35</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>The initial length of a block</p>
                     </c>
                     <c ca="center">
                        <p>1&#8211;4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of crossovers per iteration</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of mutations per iteration</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Number of type-mutations per iteration</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>To find an HMM that does not overfit the training data, we divide our training set into a set used for the Baum-Welch training (5/7 of the data) and a set for fitness evaluation (2/7 of the data). The fitness value is calculated from the fitness evaluation set only. Given an HMM (with parameters &#920;), we take the reciprocal of the negative log-likelihood as the fitness value:</p>
            <p>
               <display-formula id="M2">
                  <m:math name="1471-2105-8-357-i3" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:msub>
                              <m:mi>E</m:mi>
                              <m:mi>&#956;</m:mi>
                           </m:msub>
                           <m:mo>=</m:mo>
                           <m:mfrac>
                              <m:mn>1</m:mn>
                              <m:mrow>
                                 <m:mo>&#8722;</m:mo>
                                 <m:mstyle displaystyle="true">
                                    <m:msub>
                                       <m:mo>&#8721;</m:mo>
                                       <m:mi>i</m:mi>
                                    </m:msub>
                                    <m:mrow>
                                       <m:mi>log</m:mi>
                                       <m:mo>&#8289;</m:mo>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:mi>P</m:mi>
                                       <m:mo stretchy="false">(</m:mo>
                                       <m:msub>
                                          <m:mi>x</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                       <m:mo>|</m:mo>
                                       <m:msub>
                                          <m:mi>&#920;</m:mi>
                                          <m:mi>&#956;</m:mi>
                                       </m:msub>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo stretchy="false">)</m:mo>
                                       <m:mo>/</m:mo>
                                       <m:msub>
                                          <m:mi>l</m:mi>
                                          <m:mi>i</m:mi>
                                       </m:msub>
                                    </m:mrow>
                                 </m:mstyle>
                              </m:mrow>
                           </m:mfrac>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaacqWGfbqrdaWgaaWcbaacciGae8hVd0gabeaakiabg2da9maalaaabaGaeGymaedabaGaeyOeI0YaaabeaeaacyGGSbaBcqGGVbWBcqGGNbWzcqGGOaakcqWGqbaucqGGOaakcqWG4baEdaWgaaWcbaGaemyAaKgabeaakiabcYha8jabfI5arnaaBaaaleaacqWF8oqBaeqaaOGaeiykaKIaeiykaKIaei4la8IaemiBaW2aaSbaaSqaaiabdMgaPbqabaaabaGaemyAaKgabeqdcqGHris5aaaaaaa@4A39@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>l</it><sub><it>i </it></sub>is the length of a sequence <it>x</it><sub><it>i </it></sub>and <it>&#956; </it>labels the different HMMs (with parameters &#920;<sub><it>&#956;</it></sub>) of the population. A member of the population is selected with a Boltzmann probability</p>
            <p>
               <display-formula id="M3">
                  <m:math name="1471-2105-8-357-i4" xmlns:m="http://www.w3.org/1998/Math/MathML">
                     <m:semantics>
                        <m:mrow>
                           <m:mtable>
                              <m:mtr>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>F</m:mi>
                                          <m:mi>&#956;</m:mi>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:mfrac>
                                          <m:mrow>
                                             <m:msub>
                                                <m:mi>m</m:mi>
                                                <m:mi>&#956;</m:mi>
                                             </m:msub>
                                          </m:mrow>
                                          <m:mrow>
                                             <m:mstyle displaystyle="true">
                                                <m:msubsup>
                                                   <m:mo>&#8721;</m:mo>
                                                   <m:mrow>
                                                      <m:mi>&#957;</m:mi>
                                                      <m:mo>=</m:mo>
                                                      <m:mn>1</m:mn>
                                                   </m:mrow>
                                                   <m:mi>N</m:mi>
                                                </m:msubsup>
                                                <m:mrow>
                                                   <m:msub>
                                                      <m:mi>m</m:mi>
                                                      <m:mi>&#957;</m:mi>
                                                   </m:msub>
                                                </m:mrow>
                                             </m:mstyle>
                                          </m:mrow>
                                       </m:mfrac>
                                       <m:mo>,</m:mo>
                                    </m:mrow>
                                 </m:mtd>
                                 <m:mtd>
                                    <m:mrow>
                                       <m:msub>
                                          <m:mi>m</m:mi>
                                          <m:mi>&#956;</m:mi>
                                       </m:msub>
                                       <m:mo>=</m:mo>
                                       <m:msup>
                                          <m:mtext>e</m:mtext>
                                          <m:mrow>
                                             <m:mi>s</m:mi>
                                             <m:msub>
                                                <m:mi>E</m:mi>
                                                <m:mi>&#956;</m:mi>
                                             </m:msub>
                                             <m:mo>/</m:mo>
                                             <m:mi>&#963;</m:mi>
                                          </m:mrow>
                                       </m:msup>
                                    </m:mrow>
                                 </m:mtd>
                              </m:mtr>
                           </m:mtable>
                        </m:mrow>
                        <m:annotation encoding="MathType-MTEF">
 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaqabeGadaaakeaafaqabeqacaaabaGaemOray0aaSbaaSqaaGGaciab=X7aTbqabaGccqGH9aqpdaWcaaqaaiabd2gaTnaaBaaaleaacqWF8oqBaeqaaaGcbaWaaabmaeaacqWGTbqBdaWgaaWcbaGae8xVd4gabeaaaeaacqWF9oGBcqGH9aqpcqaIXaqmaeaacqWGobGta0GaeyyeIuoaaaGccqGGSaalaeaacqWGTbqBdaWgaaWcbaGae8hVd0gabeaakiabg2da9iabbwgaLnaaCaaaleqabaGaem4CamNaemyrau0aaSbaaWqaaiab=X7aTbqabaWccqGGVaWlcqWFdpWCaaaaaaaa@4BEF@</m:annotation>
                     </m:semantics>
                  </m:math>
               </display-formula>
            </p>
            <p>where <it>&#963; </it>is the standard deviation of the fitness in the population and <it>s </it>is a constant that controls the strength of the selection. In the work reported here, we used a value of <it>s </it>equal to 0.3.</p>
            <p>The best member of a population is always selected, and a subset of other members are selected by using stochastic universal sampling <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Some of the members are mutated or subjected to crossover. Then, all the members of the generation undergo Baum-Welch optimization using the training data set.</p>
            <p>We saved the best HMM at each of the 400 generations, <it>i.e. </it>during the whole run of the GA. At the end of the run, the best HMM is selected and trained again with the Baum-Welch algorithm, this time using all the sequences used for training and evaluation. This is done because the last HMM is not always the best HMM generated during the whole GA run. Finally, the HMM is trained further using the discriminative training method <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The Baum-Welch algorithm maximizes the likelihood of the training sequences (in our case containing amino acid and secondary structure labels). However, we are more interested in maximizing the probability of obtaining correct secondary structure labels for the amino acid sequences (rather than maximizing the probability of the full sequences themselves). Discriminative training is used to increase the probability of obtaining correct labels given the sequences and a specific HMM structure.</p>
         </sec>
         <sec>
            <st>
               <p>Incorporating evolutionary information</p>
            </st>
            <p>Secondary structure prediction rates can be boosted by using evolutionary information. In most systems, the position specific scoring matrix (PSSM) is used as an input of the predictor. Instead of using PSSM, we ran our predictor on a set of homologous sequences and then combined the results. To obtain the homologous sequences we ran PSI-BLAST <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> against the UniProt 90 protein sequence database <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> downloaded on Feb. 17th 2005. We used 3 iterations of PSI-BLAST and an E-value threshold of 0.001. The posterior label probabilities (PLPs) were calculated by decoding each of the homologous sequences against the trained HMM. After aligning the decoding results, we calculated the weight of each sequence according to the position-based sequence weight <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>The second (structure-to-structure) layer</p>
            </st>
            <p>To improve the performance even further we used a 3-layer perceptron consisting of 3 input nodes, 3 hidden nodes and 3 output nodes. This network is shown in figure <figr fid="F11">11</figr>. The profile averaged PLPs of the HMM are used directly as input to the neural network. This network is quite simple compared to other structure-to-structure layers published in the literature. To train the neural networks the gradient descent method with a momentum term was used <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
            <fig id="F11">
               <title>
                  <p>Figure 11</p>
               </title>
               <caption>
                  <p>The structure-to-structure layer</p>
               </caption>
               <text>
                  <p><b>The structure-to-structure layer</b>. The structure-to-structure layer is composed of simple 3-layer neural networks.</p>
               </text>
               <graphic file="1471-2105-8-357-11"/>
            </fig>
            <p>To increase the prediction rate further we used an ensemble of three independently trained HMM predictors. The three HMM structures are different because they were found by different runs of Block-HMM. This approach improves the prediction rate more than combining HMMs that have the same structure but different parameters. The outputs of the structure-to-structure layer are summed up and the dominant label is used as our final prediction of the secondary structure. The final predictor is shown in figure <figr fid="F12">12</figr>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Competing interests</p>
         </st>
         <p>The author(s) declares that there are no competing interests.</p>
         <fig id="F12">
            <title>
               <p>Figure 12</p>
            </title>
            <caption>
               <p>Overview of protein secondary structure predictor</p>
            </caption>
            <text>
               <p><b>Overview of protein secondary structure predictor</b>. Schematic overview of predicting secondary structure with three HMMs evolved with Block-HMM.</p>
            </text>
            <graphic file="1471-2105-8-357-12"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>KJW implemented the algorithms, performed all tests, and made all images. TH constructed the protein data sets, provided advice on protein structure and contributed to the analysis of the results. APB and AK conceived of the algorithm and participated in its design and coordination. KJW, TH, APB and AK wrote the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank L.G.T. Joergensen for providing his nice HMM structure drawing tool and Drs. V.A. Simossis for providing YASPIN code. Specially, we would like to thank Dr. Kuang Lin for kindly training YASPIN with the data set we provided. KJW was supported by a grant from the Novo Nordisk Foundation. TH is supported by a Marie Curie Intra-European Fellowship within the 6th European Community Framework Programme.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Algorithms for prediction of alpha helices and structural regions in globular proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Lim</snm>
                  <fnm>VI</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1974</pubdate>
            <volume>88</volume>
            <fpage>873</fpage>
            <lpage>894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(74)90405-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">4427384</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Prediction of the secondary structure of proteins from their amino acid sequence</p>
            </title>
            <aug>
               <au>
                  <snm>Chow</snm>
                  <fnm>PY</fnm>
               </au>
               <au>
                  <snm>Fasman</snm>
                  <fnm>GD</fnm>
               </au>
            </aug>
            <source>Adv Enzymol</source>
            <pubdate>1978</pubdate>
            <volume>47</volume>
            <fpage>45</fpage>
            <lpage>148</lpage>
            <xrefbib>
               <pubid idtype="pmpid">364941</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Analysis and implications of simple methods for predicting the secondary structure of globular proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Garnier</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Osguthorpe</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Robson</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1978</pubdate>
            <volume>120</volume>
            <fpage>97</fpage>
            <lpage>120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(78)90297-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">642007</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Predicting the secondary structure of globular proteins using neural network models</p>
            </title>
            <aug>
               <au>
                  <snm>Qian</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Sejnowski</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>202</volume>
            <fpage>865</fpage>
            <lpage>884</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90564-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">3172241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Predicting the secondary structure of globular proteins using neural network models</p>
            </title>
            <aug>
               <au>
                  <snm>Bohr</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Bohr</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Cotterill</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lautrup</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>N&#248;rskov</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Petersen</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1988</pubdate>
            <volume>202</volume>
            <fpage>865</fpage>
            <lpage>884</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(88)90564-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">3172241</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Prediction of protein secondary structure at better than 70% accuracy</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1993</pubdate>
            <volume>232</volume>
            <fpage>584</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1993.1413</pubid>
                  <pubid idtype="pmpid" link="fulltext">8345525</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices</p>
            </title>
            <aug>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1999</pubdate>
            <volume>292</volume>
            <fpage>195</fpage>
            <lpage>202</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1999.3091</pubid>
                  <pubid idtype="pmpid" link="fulltext">10493868</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Exploiting the past and the future in protein secondary structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Baldi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Frasconi</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Soda</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pollastri</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1999</pubdate>
            <volume>15</volume>
            <issue>11</issue>
            <fpage>937</fpage>
            <lpage>946</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/15.11.937</pubid>
                  <pubid idtype="pmpid" link="fulltext">10743560</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Recurrent Neural Networks and Profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Pollastri</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Przybylski</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Baldi</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2002</pubdate>
            <volume>47</volume>
            <fpage>228</fpage>
            <lpage>235</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10082</pubid>
                  <pubid idtype="pmpid" link="fulltext">11933069</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A simple and fast secondary structure prodiction method using hidden neural networks</p>
            </title>
            <aug>
               <au>
                  <snm>Lin</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Simossis</snm>
                  <fnm>VA</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Heringa</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>2</issue>
            <fpage>152</fpage>
            <lpage>159</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth487</pubid>
                  <pubid idtype="pmpid" link="fulltext">15377504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>A Novel Method of Protein Secondary Structure Prediction with High Segment Overlap Measure: Support Vector Machine Approach</p>
            </title>
            <aug>
               <au>
                  <snm>Hua</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>308</volume>
            <fpage>397</fpage>
            <lpage>407</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2001.4580</pubid>
                  <pubid idtype="pmpid" link="fulltext">11327775</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Secondary structure prediction with support vector machines</p>
            </title>
            <aug>
               <au>
                  <snm>Ward</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>McGuffin</snm>
                  <fnm>LJ</fnm>
               </au>
               <au>
                  <snm>Buxton</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>13</issue>
            <fpage>1650</fpage>
            <lpage>1655</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg223</pubid>
                  <pubid idtype="pmpid" link="fulltext">12967961</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>A Novel Method for Protein Secondary Structure Prediction Using Dual-Layer SVM and Profiles</p>
            </title>
            <aug>
               <au>
                  <snm>Guo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>54</volume>
            <fpage>738</fpage>
            <lpage>743</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10634</pubid>
                  <pubid idtype="pmpid" link="fulltext">14997569</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Nucl Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>24</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Cuff</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barton</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2000</pubdate>
            <volume>40</volume>
            <fpage>502</fpage>
            <lpage>511</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/1097-0134(20000815)40:3&lt;502::AID-PROT170>3.0.CO;2-Q</pubid>
                  <pubid idtype="pmpid" link="fulltext">10861942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Simple consensus procedures are effective and sufficient in secondary structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Albrecht</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tosatto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lengauer</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Valle</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>2003</pubdate>
            <volume>16</volume>
            <issue>7</issue>
            <fpage>459</fpage>
            <lpage>462</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/gzg063</pubid>
                  <pubid idtype="pmpid" link="fulltext">12915722</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Prediction of protein secondary structure by the hidden Markov model</p>
            </title>
            <aug>
               <au>
                  <snm>Asai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hayamnizu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Handa</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <fpage>141</fpage>
            <lpage>146</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8481815</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Prediction of Protein Structure Classes and Secondary Structure by Means of Hidden Markov Models</p>
            </title>
            <aug>
               <au>
                  <snm>Yoshikawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ikeguchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shimizu</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Doi</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Systems and Computers in Japan</source>
            <pubdate>1999</pubdate>
            <volume>30</volume>
            <issue>13</issue>
            <fpage>13</fpage>
            <lpage>22</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/(SICI)1520-684X(19991130)30:13&lt;13::AID-SCJ2>3.0.CO;2-7</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>HMMSTR: a Hidden Markov Model for Local Sequence-Structure Correlations in Proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Bystroff</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thorsson</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>301</volume>
            <fpage>173</fpage>
            <lpage>190</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.3837</pubid>
                  <pubid idtype="pmpid" link="fulltext">10926500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Hidden Neural Networks</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Riis</snm>
                  <fnm>SK</fnm>
               </au>
            </aug>
            <source>Neural Computation</source>
            <pubdate>1999</pubdate>
            <volume>11</volume>
            <fpage>541</fpage>
            <lpage>563</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1162/089976699300016764</pubid>
                  <pubid idtype="pmpid">9950743</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Training HMM Structure with Genetic Algorithms for Biological Sequence Analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Won</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Pr&#252;gel-Bennett</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>18</issue>
            <fpage>3613</fpage>
            <lpage>3627</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth454</pubid>
                  <pubid idtype="pmpid" link="fulltext">15297297</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Evolving the Structure of Hidden Markov Models</p>
            </title>
            <aug>
               <au>
                  <snm>Won</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Pr&#252;gel-Bennett</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>IEEE Transactions on Evolutionary Computation</source>
            <pubdate>2006</pubdate>
            <volume>10</volume>
            <fpage>39</fpage>
            <lpage>49</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TEVC.2005.851271</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Bayesian Learning of Probabilistic Language Models</p>
            </title>
            <aug>
               <au>
                  <snm>Stolcke</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>University of California at Berkeley</publisher>
            <pubdate>1994</pubdate>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Stochastic motif extraction using hidden Markov model</p>
            </title>
            <aug>
               <au>
                  <snm>Fujiwara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Asogawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Konagaya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>ISMB1994</source>
            <pubdate>1994</pubdate>
            <fpage>121</fpage>
            <lpage>129</lpage>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Motif Extraction using an Improved Iterative Duplication Method for HMM Topology Learning</p>
            </title>
            <aug>
               <au>
                  <snm>Fujiwara</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Asogawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Konagaya</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Pacific Symposium on Biocumputing '96</source>
            <pubdate>1995</pubdate>
            <fpage>713</fpage>
            <lpage>714</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>DNA Sequence Analysis using Hidden Markov Model and Genetic Algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Yada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Asai</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Genome Informatics</source>
            <pubdate>1994</pubdate>
            <volume>5</volume>
            <fpage>178</fpage>
            <lpage>179</lpage>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Yada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Totoki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Asai</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakai</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>317</fpage>
            <lpage>325</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.4.317</pubid>
                  <pubid idtype="pmpid" link="fulltext">9632826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Generation of hidden Markov model describing complex motif in DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Yada</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>IPSJ Trans</source>
            <pubdate>1995</pubdate>
            <volume>40</volume>
            <fpage>750</fpage>
            <lpage>767</lpage>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Stochastic models representing DNA sequence data construction algorithms and their applications to prediction of gene structure and function</p>
            </title>
            <aug>
               <au>
                  <snm>Yada</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>PhD thesis</source>
            <publisher>University of Tokyo</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Evolving the Topology of Hidden Markov Models Using Evolutionary Algorithms</p>
            </title>
            <aug>
               <au>
                  <snm>Thomsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>LNCS</source>
            <pubdate>2002</pubdate>
            <volume>2439</volume>
            <fpage>861</fpage>
            <lpage>870</lpage>
         </bibl>
         <bibl id="B31">
            <title>
               <p>EVA: large-scale analysis of secondary structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Eyrich</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2001</pubdate>
            <volume>5</volume>
            <fpage>192</fpage>
            <lpage>199</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.10051</pubid>
                  <pubid idtype="pmpid" link="fulltext">11835497</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Protein secondary structure prediction for a single-sequence using hidden semi-Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Aydin</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Altunbasak</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>178</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1479840</pubid>
                  <pubid idtype="pmpid" link="fulltext">16571137</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-7-178</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Analysis of an optimal hidden Markov model for secondary structure prediction</p>
            </title>
            <aug>
               <au>
                  <snm>Martin</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gibrat</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Rodolphe</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>BMC Structural Biology</source>
            <pubdate>2006</pubdate>
            <volume>6</volume>
            <fpage>25</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1769381</pubid>
                  <pubid idtype="pmpid" link="fulltext">17166267</pubid>
                  <pubid idtype="doi">10.1186/1472-6807-6-25</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>SABmark &#8211; a benchmark for sequence alignment that covers the entire known fold space</p>
            </title>
            <aug>
               <au>
                  <snm>Van Walle</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Lasters</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Wyns</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>7</issue>
            <fpage>1267</fpage>
            <lpage>1268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth493</pubid>
                  <pubid idtype="pmpid" link="fulltext">15333456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features</p>
            </title>
            <aug>
               <au>
                  <snm>Kabsch</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Biopolymers</source>
            <pubdate>1983</pubdate>
            <volume>22</volume>
            <fpage>2577</fpage>
            <lpage>2637</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/bip.360221211</pubid>
                  <pubid idtype="pmpid">6667333</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>PDB file parser and structure class implemented in Python</p>
            </title>
            <aug>
               <au>
                  <snm>Hamelryck</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Manderick</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <fpage>2308</fpage>
            <lpage>2310</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg299</pubid>
                  <pubid idtype="pmpid" link="fulltext">14630660</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Reducing Bias and Inefficiency in the Selection Algorithm</p>
            </title>
            <aug>
               <au>
                  <snm>Baker</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Proceedings of the Second International Conference on Genetic Algorithms</source>
            <publisher>Lawrence Erlbaum Associates (Hillsdale)</publisher>
            <pubdate>1987</pubdate>
            <fpage>14</fpage>
            <lpage>21</lpage>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Two methods for improving performance of an HMM and their application for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proceedings of the Fifth International Conference on Intelligent Systems for Molecular Biology</source>
            <pubdate>1997</pubdate>
            <fpage>179</fpage>
            <lpage>186</lpage>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Uniprot archive</p>
            </title>
            <aug>
               <au>
                  <snm>Leinonen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Diez</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>17</issue>
            <fpage>3236</fpage>
            <lpage>3237</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth191</pubid>
                  <pubid idtype="pmpid" link="fulltext">15044231</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Position-based Sequence Weights</p>
            </title>
            <aug>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Henikoff</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>243</volume>
            <fpage>574</fpage>
            <lpage>578</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(94)90032-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">7966282</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Learning representations by back-propagating error</p>
            </title>
            <aug>
               <au>
                  <snm>Rumelhart</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hinton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1986</pubdate>
            <volume>323</volume>
            <fpage>533</fpage>
            <lpage>536</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1038/323533a0</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
