<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-9-226</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Kurgan</snm>
               <fnm>Lukasz</fnm>
               <insr iid="I1"/>
               <email>lkurgan@ece.ualberta.ca</email>
            </au>
            <au id="A2">
               <snm>Cios</snm>
               <fnm>Krzysztof</fnm>
               <insr iid="I2"/>
               <email>kcios@vcu.edu</email>
            </au>
            <au id="A3">
               <snm>Chen</snm>
               <fnm>Ke</fnm>
               <insr iid="I1"/>
               <email>kchen1@ece.ualberta.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Electrical and Computer Engineering, University of Alberta, ECEFR, 9701 116 Street, Edmonton, AB, T6G 2V4, Canada </p>
            </ins>
            <ins id="I2">
               <p>Department of Computer Science, Virginia Commonwealth University, 601 West Main Street, Room 204, Richmond, Virginia 23284-3068, USA </p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2008</pubdate>
         <volume>9</volume>
         <issue>1</issue>
         <fpage>226</fpage>
         <url>http://www.biomedcentral.com/1471-2105/9/226</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18452616</pubid>
               <pubid idtype="doi">10.1186/1471-2105-9-226</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>31</day>
               <month>10</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>01</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>01</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Kurgan et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Protein structures are predicted to provide answers to key questions related to protein function, regulation, and interactions <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. The solved structures are increasingly useful for structure modeling/prediction for unsolved protein sequences that have a closely related (similar) sequence with a known structure <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Homology modeling, one of the most successful paradigms used to predict the structure, is based on the assumption that similar sequences share similar folding patterns <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Sequence alignment which allows for finding similar sequences among the known structures <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp> usually does not perform well when no sequences with high identity are available. At the same time, structurally similar proteins that share low sequence identity with the sequences used for prediction can be found based on coarse grained classifications such as those provided in Structural Classification of Proteins (SCOP) database <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. This database implements a hierarchy of relations between known protein and protein domain structures, in which the first level is known as the structural class. Prediction of structural classes is based on identifying folding patterns based on thousands of already categorized proteins and using these patterns for millions of proteins with unknown structures but known amino acid (AA) sequences. There are four major structural classes: all-&#945;, all-&#946;, &#945;/&#946;, and &#945;+&#946;. The all-&#945; and all-&#946; classes represent structures that consist of mainly &#945;-helices and &#946;-strands, respectively. The &#945;/&#946; and &#945;+&#946; classes contain both &#945;-helices and &#946;-strands which are mainly interspersed and segregated, respectively <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. SCOP also defines three additional classes, i.e., multi-domain, membrane and cell surface, and small proteins, as well as four supplementary categories, i.e., coiled coil, designed, and low resolution proteins and peptides. The proposed method targets the four main classes due to two factors: (1) about 90% of SCOP entries belong to the four classes, and most of the existing structural class prediction methods also target these classes <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. At the same time, the growing number of proteins that are categorized into the other classes motivates development of the corresponding predictive methodologies. We note that the CATH (Class, Architecture, Topology and Homologous superfamily) database <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> defines three main structural classes: mainly-&#945;, mainly-&#946;, and mixed (the fourth class includes irregular proteins that are composed mostly of coils), which approximately correspond to the all-&#945;, all-&#946;, and combination of the &#945;/&#946; and &#945;+&#946; classes in SCOP. We address the SCOP based classification, as it further subdivides the mixed proteins and since most of the existing structural class prediction methods are also based on this definition of the structural classes. Currently, the structural classes in SCOP are assigned manually based on the known structures, while in the past several automated assignment methods were proposed. They include a method proposed by Chou <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> and another by Eisenhaber and colleagues <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, see Table <tblr tid="T1">1</tblr>. We note that the first assignment method requires knowledge of structure (to distinguish between parallel and antiparallel sheets) and the second one is based purely on the content of the two secondary structures and merges &#945;/&#946; and &#945;+&#946; classes into a mixed class. At the same time, the assignment performed in the SCOP database is more complex and governed by different rules for the &#945;/&#946; and &#945;+&#946; classes. The classification of protein structures in SCOP is performed manually by experts and is based on evolutionary relationships and on the principles that govern their three-dimensional structure <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. The structural classes are defined based on grouping of the assigned folds, which in turn are categorized based on similarities in spatial arrangement of the protein structure. The folds are assigned to the classes on the basis of the secondary structures, in terms of both their content and spatial arrangement, of which they are composed. In case of all-&#945; and all-&#946; classes they include folds composed mostly of &#945;-helices and &#946;-sheets, respectively. The &#945;+&#946; class includes folds in which &#945;-helices and &#946;-strands that are largely interspersed, while the in case of &#945;/&#946; class which are segregated <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Therefore, the assignment into the latter two classes requires the knowledge of the spatial arrangement of the &#945;-helices and &#946;-strands. Since this manual procedure cannot be directly traced using the input sequence or even its corresponding secondary sequence, a variety of methods that predict the structural class based on the protein sequence were developed to facilitate automated, high-throughput assignment. We note that the manual assignment of structural classes in SCOP does not use the features or model applied in the proposed method, as the SCOP assignment is based on spatial arrangement of secondary structure segments, while our method is based on the flat secondary structure sequence.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Rules for assignment of structural classes based on the content of the corresponding secondary structures.</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c ca="left">
                     <p>reference</p>
                  </c>
                  <c ca="center">
                     <p>structural class</p>
                  </c>
                  <c ca="center">
                     <p>&#945;-helix amount</p>
                  </c>
                  <c ca="center">
                     <p>&#946;-strand amount</p>
                  </c>
                  <c ca="left">
                     <p>additional constrains</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>[13]</p>
                  </c>
                  <c ca="center">
                     <p>all-&#945;</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 40%</p>
                  </c>
                  <c ca="center">
                     <p>&#8804; 5%</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>all-&#946;</p>
                  </c>
                  <c ca="center">
                     <p>&#8804; 5%</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 40%</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>&#945;+&#946;</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 15%</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 15%</p>
                  </c>
                  <c ca="left">
                     <p>more than 60% antiparallel &#946;-sheets</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>&#945;/&#946;</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 15%</p>
                  </c>
                  <c ca="center">
                     <p>&#8805; 15%</p>
                  </c>
                  <c ca="left">
                     <p>more than 60% parallel &#946;-sheets</p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="left">
                     <p>[14]</p>
                  </c>
                  <c ca="center">
                     <p>all-&#945;</p>
                  </c>
                  <c ca="center">
                     <p>> 15%</p>
                  </c>
                  <c ca="center">
                     <p>&lt; 10%</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>all-&#946;</p>
                  </c>
                  <c ca="center">
                     <p>&lt; 15%</p>
                  </c>
                  <c ca="center">
                     <p>> 10%</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>mixed</p>
                  </c>
                  <c ca="center">
                     <p>> 15%</p>
                  </c>
                  <c ca="center">
                     <p>> 10%</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>Prediction of the structural classes is performed in two steps: 1) the AA sequences are transformed into a fixed-length feature vectors; 2) the feature vectors are fed into a classification algorithm to generate a prediction outcome. Numerous in-silico structural class prediction methods were developed. Majority of them use relatively simple features such as composition vector, auto-correlation functions based on non-bonded residue energy, polypeptide composition, pseudo AA composition <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and complexity measure factor <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. Several recent methods use more advanced feature vectors that either combine physicochemical properties and sequence composition, or optimize a selected type of the features <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>. Predictions are performed using a wide range of classification algorithms such as fuzzy clustering <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, neural network <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, Bayesian classification <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, rough sets <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, component-coupled <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, information discrepancy <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>, logistic regression <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, decision tree <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B34">34</abbr></abbrgrp>, and support vector machine <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B34">34</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr></abbrgrp>. In recent works complex classification models such as ensembles <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, bagging <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, and boosting <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B37">37</abbr></abbrgrp> were explored. Unfortunately, some of these methods were tested on small datasets, often with relatively high sequence identity, which resulted in high prediction accuracy <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. A recent review by Chou provides further details and motivation for development of structural class prediction methods <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. A feasible alternative for above methods is to use the predicted secondary structure, which can be obtained with accuracy of over 80% for highly similar sequences <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, to assign the corresponding structural classes, e.g., by using one of the abovementioned assignment methods. The main drawback is that in this case the prediction would concern only the all-&#945;, all-&#946; and mixed (which combines &#945;/&#946; and &#945;+&#946; classes) classes.</p>
         <p>Development of high quality prediction methods for sequences that are characterized by low identity with sequence used to the prediction continues to be a challenging task. Majority of current secondary structure prediction methods use sequence alignment that requires at least ~30% identity between the query sequence and sequence(s) used to predict its structure <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. More than 95% of protein chains characterized by a lower, 20&#8211;25%, pairwise identity, also referred as the twilight-zone similarity, have different structures <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, which substantially reduces accuracy of the corresponding predictions. For instance, recent research shows that the accuracy of the secondary structure prediction methods trained and tested on protein set in which any pair of sequences shares twilight-zone similarity drops to 65&#8211;68% <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Similarly, although structural class prediction accuracies for datasets in which training and test sequences share high pairwise sequence identity reach over 90%, they drop to 57&#8211;63% when training and testing is performed using datasets in which any pair of sequences has twilight-zone similarity <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B29">29</abbr><abbr bid="B32">32</abbr></abbrgrp>. At the same time, about 40% of sequences for which the tertiary structure was deposited to Protein Data Bank (PDB) in 2005 share twilight-zone pairwise similarity with any sequence deposited in the PDB before 2005 <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, which motivates development of the prediction methods for these challenging sequences. Most importantly, pairs of sequences with low identity can share similar folding patterns or overall structure <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp> and can be used to predict tertiary structure <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Research also shows that finding similar folding patterns among the proteins characterized by low sequence identity is beneficial for reconstruction of the tertiary structure <abbrgrp><abbr bid="B45">45</abbr><abbr bid="B46">46</abbr></abbrgrp>.</p>
         <p>Large number of proteins chains that are of interest to the biologists (which are being deposited to PDB) and that share twilight-zone pairwise identity with the chains for which the structure is known, and the potential structural similarities between these protein sequences that can be exploited to build more accurate structure prediction methods serve as our motivation. One solution to improve predictions for sequences that share twilight-zone pairwise identity with sequences used to perform predictions is to use a large library of reference functional sequence motifs to build a feature vector that can provide higher accuracy. Such method that uses 7785 features was proposed in <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. Our goal is to introduce a novel in-silico method that uses a compact and intuitive feature vector to provide accurate prediction of the structural classes for the sequences that have twilight-zone pairwise identity with the sequences used to perform predictions, which in turn could be used to find structurally similar protein that share low sequence similarity.</p>
         <p>The proposed method, named SCPRED, uses a custom-designed feature vector that includes 9 features and a support vector machine classifier to generate predictions. Our method is based on the fact that the structural classes are defined based on the secondary structure, although we note that the assignment in SCOP is based on the spatial arrangement of the secondary structure, while our method uses only the secondary structure sequence. We use the secondary structure predicted from the protein sequence by the PSI-PRED <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp> to develop a novel set of successful features that allow accurately classifying all four structural classes. These features together with a comprehensive set of features used in prior research are used to carefully design, by using feature selection, a compact and well performing feature vector. We also demonstrate that SCPRED can be applied to improve performance of other related prediction methods. Our tests show that coupling of the proposed method as a post-processing filter with state-of-the-art fold classification methods such as PFP <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and PFRES <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> improves their performance.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>The experimental evaluation was performed using 10-fold cross validation and jackknife tests to avoid overfitting and assure statistical validity of the results <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B26">26</abbr><abbr bid="B52">52</abbr></abbrgrp>. The tests were performed on the 25PDB dataset, which includes 1673 sequences which share twilight-zone pairwise similarity, i.e., any pairs of sequence in this set shares twilight-zone similarity. We also use another low-identity dataset, FC699, to evaluate value added of using SCPRED's predictions to improve accuracy of protein fold predictions performed with PFP and PFRES methods. The reported results include the overall accuracy (the number of correct predictions divided by the total number of test sequences), accuracy for each structural class (number of correct predictions for a given class divided by the number of sequences from this class), Matthews's correlation coefficient (MCC) for each structural class, and generalized squared correlation (GC<sup>2</sup>). The MCC values range between -1 and 1, where 0 represents random correlation, and bigger positive (negative) values indicate better (lower) prediction quality for a given class. Since MCC works only for binary classification, we also reported GC<sup>2</sup>, which is based on &#967;<sup>2 </sup>statistics. The GC<sup>2 </sup>values range between 0 and 1, where 0 corresponds to the worst classification (no correct predictions) and 1 corresponds to perfect classification. MCC and GC<sup>2 </sup>are described in <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         <p>We note that current secondary structure prediction methods achieve the average accuracy close to 80%, e.g., EVA server reports that PSI-PRED provides the average accuracy of 77.9% for 224 proteins (tested between Apr 2001 and Sept 2005) <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Since the average accuracy of PSI-PRED predictions was 77.9% and 77.5% for the 25PDB and FC699 datasets, respectively, we believe that the presented results provide a reliable estimate of the future performance of the proposed method.</p>
         <sec>
            <st>
               <p>Comparison with structural class prediction methods</p>
            </st>
            <p>The SCPRED was comprehensively compared with over a dozen of competing structural class methods which use various feature vectors and classifiers. The comparison includes three groups of modern methods:</p>
            <p>- methods that apply optimized feature vectors <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>,</p>
            <p>- advanced multi-classifier methods including boosting <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, ensembles <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, and bagging <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>,</p>
            <p>- methods that use the best performing SVM <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and information discrepancy based classifiers <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B24">24</abbr></abbrgrp>.</p>
            <p>Classification results for the competing methods and the SCPRED are compared in Table <tblr tid="T2">2</tblr>. The SCPRED, which uses only 9 features, obtained 80% accuracy for both out-of-sample tests. The second best method, which was also designed using 25PDB dataset (in which training and test sequence share twilight-zone identity) <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, obtained 63% accuracy. The remaining, competing methods obtain accuracies that range between 35% and 60%. The relatively low accuracies obtained by the competing methods are due to using a challenging 25PDB dataset <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. We note that some of these methods <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B26">26</abbr><abbr bid="B32">32</abbr><abbr bid="B34">34</abbr></abbrgrp> were originally testes on datasets characterized by higher sequence similarity, which resulted in higher reported accuracies. The methods that reach 60% accuracy are based on a custom-designed feature vectors that includes sequence composition and physicochemical properties <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. We observe that the usage of simple, composition-based features results in lower accuracy. The results also show that the SVM and logistic regression classifiers perform well on this challenging problem.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Experimental comparison between SCPRED and competing structural class prediction methods.</p>
               </caption>
               <tblbdy cols="14">
                  <r>
                     <c ca="center">
                        <p>Test type</p>
                     </c>
                     <c ca="left">
                        <p>Algorithm</p>
                     </c>
                     <c ca="left">
                        <p>Feature vector (# features)</p>
                     </c>
                     <c ca="center">
                        <p>Reference</p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>Accuracy</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>MCC</p>
                     </c>
                     <c ca="center">
                        <p>GC<sup>2</sup></p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="9">
                        <hr/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>all-&#945;</p>
                     </c>
                     <c ca="center">
                        <p>all-&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;/&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;+&#946;</p>
                     </c>
                     <c ca="center">
                        <p>overall</p>
                     </c>
                     <c ca="center">
                        <p>all-&#945;</p>
                     </c>
                     <c ca="center">
                        <p>all-&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;/&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;+&#946;</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="14">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>Jackknife</p>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[36]</p>
                     </c>
                     <c ca="center">
                        <p>68.6</p>
                     </c>
                     <c ca="center">
                        <p>59.6</p>
                     </c>
                     <c ca="center">
                        <p>59.8</p>
                     </c>
                     <c ca="center">
                        <p>28.6</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>53.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>0.15</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.17</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LogicBoost with decision tree</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[23]</p>
                     </c>
                     <c ca="center">
                        <p>56.9</p>
                     </c>
                     <c ca="center">
                        <p>51.5</p>
                     </c>
                     <c ca="center">
                        <p>45.4</p>
                     </c>
                     <c ca="center">
                        <p>30.2</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>46.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.10</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Bagging with random tree</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[34]</p>
                     </c>
                     <c ca="center">
                        <p>58.7</p>
                     </c>
                     <c ca="center">
                        <p>47.0</p>
                     </c>
                     <c ca="center">
                        <p>35.5</p>
                     </c>
                     <c ca="center">
                        <p>24.7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>41.8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                     <c ca="center">
                        <p>0.26</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.06</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LogitBoost with decision stump</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>62.8</p>
                     </c>
                     <c ca="center">
                        <p>52.6</p>
                     </c>
                     <c ca="center">
                        <p>50.0</p>
                     </c>
                     <c ca="center">
                        <p>32.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>49.4</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.49</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.13</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (3<sup>rd </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>61.2</p>
                     </c>
                     <c ca="center">
                        <p>53.5</p>
                     </c>
                     <c ca="center">
                        <p>57.2</p>
                     </c>
                     <c ca="center">
                        <p>27.7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>49.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.11</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.13</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom dipeptides (16)</p>
                     </c>
                     <c ca="center">
                        <p>[28]</p>
                     </c>
                     <c ca="center">
                        <p>56.2</p>
                     </c>
                     <c ca="center">
                        <p>44.5</p>
                     </c>
                     <c ca="center">
                        <p>41.3</p>
                     </c>
                     <c ca="center">
                        <p>18.8</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>40.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>0.20</p>
                     </c>
                     <c ca="center">
                        <p>0.31</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.05</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Information discrepancy<sup>1</sup></p>
                     </c>
                     <c ca="left">
                        <p>dipeptides (400)</p>
                     </c>
                     <c ca="center">
                        <p>[22, 24]</p>
                     </c>
                     <c ca="center">
                        <p>59.6</p>
                     </c>
                     <c ca="center">
                        <p>54.2</p>
                     </c>
                     <c ca="center">
                        <p>47.1</p>
                     </c>
                     <c ca="center">
                        <p>23.5</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>47.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.40</p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.04</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.12</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Information discrepancy<sup>1</sup></p>
                     </c>
                     <c ca="left">
                        <p>tripeptides (8000)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>45.8</p>
                     </c>
                     <c ca="center">
                        <p>48.5</p>
                     </c>
                     <c ca="center">
                        <p>51.7</p>
                     </c>
                     <c ca="center">
                        <p>32.5</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>44.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.11</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c ca="center">
                        <p>[27]</p>
                     </c>
                     <c ca="center">
                        <p>71.1</p>
                     </c>
                     <c ca="center">
                        <p>65.3</p>
                     </c>
                     <c ca="center">
                        <p>66.5</p>
                     </c>
                     <c ca="center">
                        <p>37.3</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>60.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.25</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM with RBF kernel</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>69.7</p>
                     </c>
                     <c ca="center">
                        <p>62.1</p>
                     </c>
                     <c ca="center">
                        <p>67.1</p>
                     </c>
                     <c ca="center">
                        <p>39.3</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>59.5</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.25</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>StackingC ensemble</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>74.6</p>
                     </c>
                     <c ca="center">
                        <p>67.9</p>
                     </c>
                     <c ca="center">
                        <p>70.2</p>
                     </c>
                     <c ca="center">
                        <p>32.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>61.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.62</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.26</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (66)</p>
                     </c>
                     <c ca="center">
                        <p>[26]</p>
                     </c>
                     <c ca="center">
                        <p>69.1</p>
                     </c>
                     <c ca="center">
                        <p>61.6</p>
                     </c>
                     <c ca="center">
                        <p>60.1</p>
                     </c>
                     <c ca="center">
                        <p>38.3</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>57.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.21</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.21</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (1<sup>st </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>autocorrelation (30)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>50.1</p>
                     </c>
                     <c ca="center">
                        <p>49.4</p>
                     </c>
                     <c ca="center">
                        <p>28.8</p>
                     </c>
                     <c ca="center">
                        <p>29.5</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>34.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.02</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (1<sup>st </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>custom (58)</p>
                     </c>
                     <c ca="center">
                        <p>[29]</p>
                     </c>
                     <c ca="center">
                        <p>77.4</p>
                     </c>
                     <c ca="center">
                        <p>66.4</p>
                     </c>
                     <c ca="center">
                        <p>61.3</p>
                     </c>
                     <c ca="center">
                        <p>45.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>62.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.65</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.28</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Linear logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (58)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>75.2</p>
                     </c>
                     <c ca="center">
                        <p>67.5</p>
                     </c>
                     <c ca="center">
                        <p>62.1</p>
                     </c>
                     <c ca="center">
                        <p>44.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>62.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.63</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.27</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>PSI-PRED based (13)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>92.6</p>
                     </c>
                     <c ca="center">
                        <p>79.8</p>
                     </c>
                     <c ca="center">
                        <p>74.9</p>
                     </c>
                     <c ca="center">
                        <p>69.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>79.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.68</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.55</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>custom (8 PSI-PRED based)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>92.6</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>73.4</p>
                     </c>
                     <c ca="center">
                        <p>68.5</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>79.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.54</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>SCPRED</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>custom (9)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>92.6</p>
                     </c>
                     <c ca="center">
                        <p>80.1</p>
                     </c>
                     <c ca="center">
                        <p>74.0</p>
                     </c>
                     <c ca="center">
                        <p>71.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>79.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.55</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="14">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>10-fold cross validation</p>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[36]</p>
                     </c>
                     <c ca="center">
                        <p>67.9</p>
                     </c>
                     <c ca="center">
                        <p>59.1</p>
                     </c>
                     <c ca="center">
                        <p>58.1</p>
                     </c>
                     <c ca="center">
                        <p>27.7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>53.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.42</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>0.14</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.16</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LogicBoost with decision tree</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[23]</p>
                     </c>
                     <c ca="center">
                        <p>51.9</p>
                     </c>
                     <c ca="center">
                        <p>53.7</p>
                     </c>
                     <c ca="center">
                        <p>46.5</p>
                     </c>
                     <c ca="center">
                        <p>32.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>46.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.38</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.31</p>
                     </c>
                     <c ca="center">
                        <p>0.07</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.10</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Bagging with random tree</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c ca="center">
                        <p>[34]</p>
                     </c>
                     <c ca="center">
                        <p>53.5</p>
                     </c>
                     <c ca="center">
                        <p>51.0</p>
                     </c>
                     <c ca="center">
                        <p>37.6</p>
                     </c>
                     <c ca="center">
                        <p>22.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>41.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>0.30</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>0.04</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.06</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>LogitBoost with decision stump</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>63.2</p>
                     </c>
                     <c ca="center">
                        <p>53.5</p>
                     </c>
                     <c ca="center">
                        <p>50.9</p>
                     </c>
                     <c ca="center">
                        <p>32.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>50.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.36</p>
                     </c>
                     <c ca="center">
                        <p>0.36</p>
                     </c>
                     <c ca="center">
                        <p>0.12</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.14</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (3<sup>rd </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>CV (20)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>61.4</p>
                     </c>
                     <c ca="center">
                        <p>54.0</p>
                     </c>
                     <c ca="center">
                        <p>55.2</p>
                     </c>
                     <c ca="center">
                        <p>27.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>49.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                     <c ca="center">
                        <p>0.35</p>
                     </c>
                     <c ca="center">
                        <p>0.37</p>
                     </c>
                     <c ca="center">
                        <p>0.10</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.13</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom dipeptides (16)</p>
                     </c>
                     <c ca="center">
                        <p>[28]</p>
                     </c>
                     <c ca="center">
                        <p>56.9</p>
                     </c>
                     <c ca="center">
                        <p>44.2</p>
                     </c>
                     <c ca="center">
                        <p>42.2</p>
                     </c>
                     <c ca="center">
                        <p>17.7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>40.2</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.24</p>
                     </c>
                     <c ca="center">
                        <p>0.20</p>
                     </c>
                     <c ca="center">
                        <p>0.32</p>
                     </c>
                     <c ca="center">
                        <p>0.04</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.06</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c ca="center">
                        <p>[27]</p>
                     </c>
                     <c ca="center">
                        <p>69.9</p>
                     </c>
                     <c ca="center">
                        <p>65.3</p>
                     </c>
                     <c ca="center">
                        <p>66.5</p>
                     </c>
                     <c ca="center">
                        <p>38.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>60.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>0.23</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.25</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM with RBF kernel</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>70.2</p>
                     </c>
                     <c ca="center">
                        <p>61.6</p>
                     </c>
                     <c ca="center">
                        <p>67.6</p>
                     </c>
                     <c ca="center">
                        <p>39.6</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>59.8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.60</p>
                     </c>
                     <c ca="center">
                        <p>0.49</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                     <c ca="center">
                        <p>0.22</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.25</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>StackingC ensemble</p>
                     </c>
                     <c ca="left">
                        <p>custom (34)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>73.4</p>
                     </c>
                     <c ca="center">
                        <p>67.3</p>
                     </c>
                     <c ca="center">
                        <p>69.1</p>
                     </c>
                     <c ca="center">
                        <p>29.8</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>59.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.59</p>
                     </c>
                     <c ca="center">
                        <p>0.52</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.18</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.25</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Multinomial logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (66)</p>
                     </c>
                     <c ca="center">
                        <p>[26]</p>
                     </c>
                     <c ca="center">
                        <p>69.1</p>
                     </c>
                     <c ca="center">
                        <p>60.5</p>
                     </c>
                     <c ca="center">
                        <p>59.5</p>
                     </c>
                     <c ca="center">
                        <p>38.1</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>56.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>0.44</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                     <c ca="center">
                        <p>0.20</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.21</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (1<sup>st </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>autocorrelation (30)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>52.4</p>
                     </c>
                     <c ca="center">
                        <p>49.7</p>
                     </c>
                     <c ca="center">
                        <p>0.3</p>
                     </c>
                     <c ca="center">
                        <p>30.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>35.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.18</p>
                     </c>
                     <c ca="center">
                        <p>0.16</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="center">
                        <p>0.06</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.02</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (1<sup>st </sup>order polyn. kernel)</p>
                     </c>
                     <c ca="left">
                        <p>custom (58)</p>
                     </c>
                     <c ca="center">
                        <p>[29]</p>
                     </c>
                     <c ca="center">
                        <p>77.7</p>
                     </c>
                     <c ca="center">
                        <p>66.8</p>
                     </c>
                     <c ca="center">
                        <p>60.7</p>
                     </c>
                     <c ca="center">
                        <p>45.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>62.8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.28</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.28</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Linear logistic regression</p>
                     </c>
                     <c ca="left">
                        <p>custom (58)</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>74.7</p>
                     </c>
                     <c ca="center">
                        <p>66.4</p>
                     </c>
                     <c ca="center">
                        <p>62.7</p>
                     </c>
                     <c ca="center">
                        <p>45.8</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>62.4</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.63</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>0.27</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.28</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>PSI-PRED based (13)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>93.2</p>
                     </c>
                     <c ca="center">
                        <p>79.5</p>
                     </c>
                     <c ca="center">
                        <p>75.7</p>
                     </c>
                     <c ca="center">
                        <p>69.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>79.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.55</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.55</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>SVM (Gaussian kernel)</p>
                     </c>
                     <c ca="left">
                        <p>custom (8 PSI-PRED based)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>92.5</p>
                     </c>
                     <c ca="center">
                        <p>80.4</p>
                     </c>
                     <c ca="center">
                        <p>73.7</p>
                     </c>
                     <c ca="center">
                        <p>68.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>79.0</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.54</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>SCPRED</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>custom (9)</p>
                     </c>
                     <c ca="center">
                        <p>this paper</p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>74.3</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>80.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>0.56</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p><sup>1</sup>This method was not originally tested using 10-fold cross validation and thus we also did not report these results</p>
               </tblfn>
            </tbl>
            <p>The most accurate predictions are obtained for the all-&#945; class (nearly 92% accuracy), while the best results for the all-&#946; and &#945;/&#946; classes are 81% and 75%, respectively. 70% accuracy is obtained for the &#945;+&#946; class. Similar trend is observed for all tested methods although the corresponding accuracies are lower. The main reason for good performance for the all-&#945; class is that these sequences are &#945;-helix rich and the helical structures are the easiest to predict, i.e., a helix consists of a single segment and is characterized by a repetitive structure.</p>
            <p>Table <tblr tid="T2">2</tblr> also shows prediction results where the same SVM classifier as in the proposed SCPRED method is applied, but only using the features based on the secondary structure predicted with PSI-PRED ("SVM (Gaussian kernel); PSI-PRED based (13)" rows in Table <tblr tid="T2">2</tblr>). In this case, the input vector for SVM includes 13 features. We observe that SCPRED that uses features based on sequence and secondary structure is characterized by a smaller feature set and slightly higher prediction accuracy, i.e., the improvement equals 0.4%. The differences are small, and they clearly indicate that the primary source of the information that assures the accurate predictions is the secondary structure predicted with PSI-PRED.</p>
            <p>We also performed an experiment where only the 8 PSI-PRED based features from the sequence representation used by SCPRED were used for the prediction ("SVM (Gaussian kernel); custom (8 PSI-PRED based)" rows in Table <tblr tid="T2">2</tblr>). In this case, the prediction accuracy deteriorated by less than 1% when compared with SCPRED, which again confirms that predicted secondary structure provides the bulk of useful information for the proposed prediction method. The main difference concerns &#945;+&#946; class where SCPRED obtains better results due to the use of the <it>CV</it><sub><it>L</it>---<it>G </it></sub>feature (see Analysis of the Proposed Feature Vector section for more details).</p>
            <p>The results show that the proposed feature vector results in significantly improved ability of the classifier to separate structural classes and that SCPRED method provides better predictions when compared with modern, competing methods.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with predictions based on secondary structure predicted with PSI-PRED</p>
            </st>
            <p>Since the SCPRED's predictions use the predicted secondary structure, we also compared our method with the assignment methods that are based on the secondary structure. We note that the assignment method by Chou <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> requires knowledge of the tertiary structure to differentiate between &#945;/&#946; and &#945;+&#946; classes, and the method by Eisenhaber and colleagues <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> combines these two classes into mixed class. Therefore, the assignment was performed assuming only three structural classes: the all-&#945;, all-&#946; and mixed class (&#945;/&#946; and &#945;+&#946; classes combined). The two assignment methods were applied with the PSI-PRED predicted secondary structure, which is also used to compute features of the proposed SCPRED method. The corresponding predictions on the 25PDB dataset are compared in Table <tblr tid="T3">3</tblr>. Since the assignment methods only use the predicted secondary structure, i.e., there is no model to train, they do not require out-of-sample testing.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Experimental comparison between SCPRED and structural class assignment methods based on the secondary structure predicted with PSI-PRED.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Prediction/assignment method</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>Accuracy</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>all-&#945;</p>
                     </c>
                     <c ca="center">
                        <p>all-&#946;</p>
                     </c>
                     <c ca="center">
                        <p>mixed</p>
                     </c>
                     <c ca="center">
                        <p>overall</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[13]</p>
                     </c>
                     <c ca="center">
                        <p>78.8</p>
                     </c>
                     <c ca="center">
                        <p>30.2</p>
                     </c>
                     <c ca="center">
                        <p>66.7</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>60.3</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>[14]</p>
                     </c>
                     <c ca="center">
                        <p>91.6</p>
                     </c>
                     <c ca="center">
                        <p>73.1</p>
                     </c>
                     <c ca="center">
                        <p>86.8</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>84.5</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SCPRED (10-fold cross validation)</p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>89.2</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>87.9</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SCPRED (jackknife)</p>
                     </c>
                     <c ca="center">
                        <p>92.6</p>
                     </c>
                     <c ca="center">
                        <p>80.1</p>
                     </c>
                     <c ca="center">
                        <p>88.9</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>87.6</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The results show that the SCPRED provides more accurate predictions, i.e., 15.5% error rate of the more accurate assignment proposed by Eisenhaber and colleagues was reduced by 3.4/15.5 = 21% in case of using SCPRED. This corresponds to 260 incorrect predictions for the automated assignment, while SCPRED made only 203 mistakes. At the same time, SCPRED is capable of predicting &#945;/&#946; and &#945;+&#946; classes while the automated assignment combines these two classes together.</p>
         </sec>
         <sec>
            <st>
               <p>Analysis of the proposed feature vector</p>
            </st>
            <p>The proposed vector uses 8 features based on the secondary structure predicted with PSI-PRED, and one based on collocation of Leucine and Glycine (see Materials and Methods for details). Each feature was further analyzed to focus our discussion on the most significant features. We performed prediction on 25PDB dataset using each feature individually and using all but one feature at the time, see Table <tblr tid="T4">4</tblr>. The removal of individual features results in prediction accuracies that are relatively similar to the accuracy when using all 9 features. The corresponding degradation of the accuracy ranges between 0.5% (when excluding PSIPRED-<it>CMV</it><sub><it>H</it></sub><sup>1 </sup>feature) and 1.4% (when excluding PSIPRED-<it>NAvgSeg</it><sub><it>E </it></sub>feature) showing that the remaining features still provide good quality predictions. The results obtained when using individual features show that PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8 </sup>and PSIPRED-<it>CV</it><sub><it>E </it></sub>features provide the highest overall accuracy and are among the top two features with respect to prediction of all-&#946; and &#945;+&#946;, and all-&#945; and &#945;/&#946; classes, respectively. They also describe different secondary structures and as such are complementary to each other.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Comparison of accuracy when predicting the structural classes using all features, each feature individually, and when excluding one features at the time.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c cspan="2" ca="left">
                        <p>Features</p>
                     </c>
                     <c cspan="5" ca="center">
                        <p>Accuracy when predicting with one feature</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>all-&#945;</p>
                     </c>
                     <c ca="center">
                        <p>all-&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;/&#946;</p>
                     </c>
                     <c ca="center">
                        <p>&#945;+&#946;</p>
                     </c>
                     <c ca="center">
                        <p>overall</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="left">
                        <p>All features included</p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>74.3</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>80.1</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>using only one feature</p>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>6</sup></p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.6</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>58.7</p>
                     </c>
                     <c ca="center">
                        <p>32.9</p>
                     </c>
                     <c ca="center">
                        <p>46.7</p>
                     </c>
                     <c ca="center">
                        <p>58.4</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>PSIPRED-<it>NCount</it></b>
                           <sub>
                              <it>H</it>
                           </sub>
                           <sup>8</sup>
                        </p>
                     </c>
                     <c ca="center">
                        <p>81.9</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>78.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>53.8</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>58.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>69.0</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>CMV</it><sub><it>H</it></sub><sup>1</sup></p>
                     </c>
                     <c ca="center">
                        <p>76.3</p>
                     </c>
                     <c ca="center">
                        <p>74.5</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>55.8</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>48.8</p>
                     </c>
                     <c ca="center">
                        <p>64.3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NAvgSeg</it><sub><it>H</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>49.9</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>83.3</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>47.8</p>
                     </c>
                     <c ca="center">
                        <p>47.9</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NCount</it><sub><it>E</it></sub><sup>5</sup></p>
                     </c>
                     <c ca="center">
                        <p>85.8</p>
                     </c>
                     <c ca="center">
                        <p>59.1</p>
                     </c>
                     <c ca="center">
                        <p>50.9</p>
                     </c>
                     <c ca="center">
                        <p>47.4</p>
                     </c>
                     <c ca="center">
                        <p>61.4</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>PSIPRED-<it>CV</it></b>
                           <sub>
                              <it>E</it>
                           </sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.9</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>71.3</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>60.7</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>51.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>68.4</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>MaxSeg</it><sub><it>E</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>83.1</p>
                     </c>
                     <c ca="center">
                        <p>48.8</p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>67.1</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>52.6</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NAvgSeg</it><sub><it>E</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>79.2</p>
                     </c>
                     <c ca="center">
                        <p>33.9</p>
                     </c>
                     <c ca="center">
                        <p>3.2</p>
                     </c>
                     <c ca="center">
                        <p>42.4</p>
                     </c>
                     <c ca="center">
                        <p>41.8</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>
                              <it>CV</it>
                           </b>
                           <sub><it>L</it>---<it>G</it></sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>73.8</p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>54.3</p>
                     </c>
                     <c ca="center">
                        <p>7.7</p>
                     </c>
                     <c ca="center">
                        <p>32.8</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>excluding the listed feature</p>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>6</sup></p>
                     </c>
                     <c ca="center">
                        <p>92.1</p>
                     </c>
                     <c ca="center">
                        <p>79.5</p>
                     </c>
                     <c ca="center">
                        <p>71.7</p>
                     </c>
                     <c ca="center">
                        <p>70.8</p>
                     </c>
                     <c ca="center">
                        <p>78.9</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8</sup></p>
                     </c>
                     <c ca="center">
                        <p>93.0</p>
                     </c>
                     <c ca="center">
                        <p>79.5</p>
                     </c>
                     <c ca="center">
                        <p>73.1</p>
                     </c>
                     <c ca="center">
                        <p>70.3</p>
                     </c>
                     <c ca="center">
                        <p>79.3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>CMV</it><sub><it>H</it></sub><sup>1</sup></p>
                     </c>
                     <c ca="center">
                        <p>92.5</p>
                     </c>
                     <c ca="center">
                        <p>80.8</p>
                     </c>
                     <c ca="center">
                        <p>72.5</p>
                     </c>
                     <c ca="center">
                        <p>71.0</p>
                     </c>
                     <c ca="center">
                        <p>79.6</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NAvgSeg</it><sub><it>H</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>92.5</p>
                     </c>
                     <c ca="center">
                        <p>81.0</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>68.7</p>
                     </c>
                     <c ca="center">
                        <p>78.8</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NCount</it><sub><it>E</it></sub><sup>5</sup></p>
                     </c>
                     <c ca="center">
                        <p>90.7</p>
                     </c>
                     <c ca="center">
                        <p>80.1</p>
                     </c>
                     <c ca="center">
                        <p>73.4</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>79.3</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>CV</it><sub><it>E</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>91.9</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>72.5</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>79.5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>MaxSeg</it><sub><it>E</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>79.7</p>
                     </c>
                     <c ca="center">
                        <p>73.1</p>
                     </c>
                     <c ca="center">
                        <p>69.8</p>
                     </c>
                     <c ca="center">
                        <p>79.2</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>PSIPRED-<it>NAvgSeg</it><sub><it>E</it></sub></p>
                     </c>
                     <c ca="center">
                        <p>92.3</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>71.1</p>
                     </c>
                     <c ca="center">
                        <p>69.2</p>
                     </c>
                     <c ca="center">
                        <p>78.7</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>CV</it>
                           <sub><it>L</it>---<it>G</it></sub>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>80.6</p>
                     </c>
                     <c ca="center">
                        <p>73.4</p>
                     </c>
                     <c ca="center">
                        <p>68.9</p>
                     </c>
                     <c ca="center">
                        <p>79.3</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Bold font shows the top two highest accuracies when using individual features, and features selected for further analysis.</p>
               </tblfn>
            </tbl>
            <p>Figure <figr fid="F1">1</figr> shows scatter plots in which x-axis corresponds to PSIPRED-<it>CV</it><sub><it>E </it></sub>and y-axis shows PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8</sup>. The Figure shows that the values of the two features form relatively compact clusters for each of the structural classes. These clusters are also characterized by a small degree of spatial overlap, and thus the classifier can achieve good separation between all four structural classes. In other words, certain characteristics of the secondary structure that is predicted with PSI-PRED, which include composition, the count of secondary structure segments, and average/maximal size of the segments, provide information that differentiates between structural classes. For example, most proteins in all-&#945; class include low number of residues that form &#946;-strands (low value of PSIPRED-<it>CV</it><sub><it>E</it></sub>) and high number of &#945;-helix segments that are built of at least 8 AAs (high value of PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8</sup>).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Scatter plots of PSIPRED-<it>CV</it><sub><it>E </it></sub>(x-axis) and PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8 </sup>(y-axis) features</p>
               </caption>
               <text>
                  <p><b>Scatter plots of PSIPRED-<it>CV</it><sub><it>E </it></sub>(x-axis) and PSIPRED-<it>NCount</it><sub><it>H</it></sub><sup>8 </sup>(y-axis) features</b>. Top-left plot corresponds to sequences belonging to all-&#945; class, top-right for all-&#946; class, bottom-left for &#945;/&#946;, and bottom-right got &#945;+&#946; class.</p>
               </text>
               <graphic file="1471-2105-9-226-1"/>
            </fig>
            <p>We also analyzed the <it>CV</it><sub><it>L</it>---<it>G </it></sub>feature, which counts the number of occurrences of the LxxxG motif, where x is any AA. We found that higher number of these motifs in the sequences correlates with the &#945;/&#946; class. The corresponding minimal count of LxxxG motifs (count of all sequences that have at least that many motifs and belong to &#945;/&#946; class/count of all sequences that have at least that many motifs) in 25PDB dataset follows: 0 (346/1673), 1 (291/970), 2 (188/445), 3 (114/199), 4 (53/86), 5 (23/34), and 6 and higher (11/14). This shows that if a given sequence contains at least 5 LxxxG motifs, there is at least 68% probability that it belongs to &#945;/&#946; class. To show whether this motif is significant with respect to the structural class classification, we compute the expected number of motifs that are characterized by the same properties, i.e., they occur at least 5 times in at least 34 proteins and the corresponding probability of the most frequent class associated with that motif equals at least 68%, given that the structural classes are randomized. After randomly scrambling the class labels 10 times (we use the same proportion of class labels as in the original dataset), the expected value equals zero, and the average (over the 10 runs) highest probability obtained for such motifs equals 42.4. Given that the original class labels are used, two motifs that satisfy the above conditions are found, LxxxG and AxxL (with probability of 69% for &#945;/&#946; class), while their corresponding average (over the 10 runs) probabilities for the scrambled class labels equal 33.8 and 32.1, respectively. We note that although other, similar motifs (such as AxxL) that allow to differentiate between structural classes could be found (and were considered by our method), only LxxxG motif was found to be complementary to the remaining 8 features. A recent study that discusses other motifs that could be successfully used to address prediction of structural classes can be found in <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Analysis of the structures formed by the LxxxG motif reveals that many of them form either terminal end of an &#945;-helix or a &#946;-strand that folds into a parallel &#946;-sheet. The two proteins that include the highest number of these motifs are (1) 1ofda2 domain that includes 9 motifs (6 form terminal end of an &#945;-helix, 1 forms a parallel &#946;-sheet and 2 form coils), and (2) 1r66 protein that includes 8 motifs (3 form terminal end of an &#945;-helix, 2 form a parallel &#946;-sheet and 3 form coils. This motif could serve as a signature for some proteins that belong to the &#945;/&#946; class.</p>
         </sec>
         <sec>
            <st>
               <p>Application to fold classification</p>
            </st>
            <p>The SCPRED was coupled, as a post-processing filter, with two modern fold classification methods: PFP <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> and PFRES <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Fold classification aims at prediction of a fold for a given protein sequence, where multiple fold types are defined for each structural class. This means that each predicted fold can be automatically assigned to a corresponding structural class. Among the 27 folds predicted by PFP and PFRES, 6 (globin-like, cytochrome c, DNA/RNA-binding 3-helical bundle, four-helical up-and-down bundle, 4-helical cytokines, and EF Hand-like) belong to the all-&#945; class, 9 (immunoglobulin-like beta-sandwich, cupredoxin-like, viral coat and capsid proteins, concanavalin A-like lectins/glucanases, SH3-like barrel, OB-fold, beta-Trefoil, trypsin-like serine proteases, and lipocalins) to the all-&#946; class, 9 (TIM beta/alpha-barrel, FAD/NAD (P)-binding domain, flavodoxin-like, NAD (P)-binding Rossmann-fold domains, P-loop containing nucleoside triphosphate hydrolases, thioredoxin fold, ribonuclease H-like motif, alpha/beta-Hydrolases, and periplasmic binding protein-like I) to the &#945;/&#946; class, and 2 (beta-Grasp and Ferredoxin-like) to &#945;+&#946; class. The remaining fold concerns small proteins and thus was removed from our tests. The post processing was based on removing all predictions for which SCPRED and a given fold classification method predicted different structural classes, i.e., the predicted fold belonged to a different structural class than the class predicted with SCPRED. This approach is motivated by a hypothesis that if both methods provide consistent predictions (at the level of the structural classes) than the confidence in the fold prediction should be higher than in the case when the two methods provide different predictions. The accuracies of SCPRED, both fold classification methods and the coupled methods for the FC699 dataset (which was originally used to test both PFP and PRES), the sequences in FC699 that were kept (the same structural classes were predicted), and the removed sequences (different classes were predicted) are shown in Table <tblr tid="T5">5</tblr>.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Comparisons of accuracies obtained by PFRES, PFP and coupled PFRES+SCPRED and PFP+SCPRED methods on FC699 dataset.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c cspan="3" ca="left">
                        <p>Entire FC699 dataset</p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>Only kept sequences</p>
                     </c>
                     <c cspan="2" ca="left">
                        <p>Only removed sequences</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p>PFRES</p>
                     </c>
                     <c ca="center">
                        <p>SCPRED</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>PFRES + SCPRED</p>
                     </c>
                     <c ca="center">
                        <p>Coverage (% kept)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>65.6</p>
                     </c>
                     <c ca="center">
                        <p>92.1</p>
                     </c>
                     <c ca="center">
                        <p>87.5</p>
                     </c>
                     <c ca="center">
                        <p>68.6</p>
                     </c>
                     <c ca="center">
                        <p>96.7</p>
                     </c>
                     <c ca="center">
                        <p>45.7</p>
                     </c>
                     <c ca="center">
                        <p>62.8</p>
                     </c>
                     <c ca="center">
                        <p>86.6%</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c cspan="2" ca="center">
                        <p>PFP</p>
                     </c>
                     <c ca="center">
                        <p>SCPRED</p>
                     </c>
                     <c cspan="4" ca="center">
                        <p>PFP + SCPRED</p>
                     </c>
                     <c ca="center">
                        <p>Coverage (% kept)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c ca="center">
                        <p>fold</p>
                     </c>
                     <c ca="center">
                        <p>class</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="center">
                        <p>30.9</p>
                     </c>
                     <c ca="center">
                        <p>65.8</p>
                     </c>
                     <c ca="center">
                        <p>87.5</p>
                     </c>
                     <c ca="center">
                        <p>47.3</p>
                     </c>
                     <c ca="center">
                        <p>97.0</p>
                     </c>
                     <c ca="center">
                        <p>3.8</p>
                     </c>
                     <c ca="center">
                        <p>14.1</p>
                     </c>
                     <c ca="center">
                        <p>62.4%</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The "Entire FC699 dataset" column shows accuracies for PFRES, SCPRED and PFP methods for class/fold prediction on the FC699 dataset. The "Only kept sequences" column show accuracies obtained by the PFRES and PFP methods for sequences for which SCPRED predicted the same structural class as PFRES and PFP, respectively. The "Only removed sequences" column show accuracies obtained by the PFRES and PFP methods for sequences for which SCPRED predicted different structural class when compared with predictions of PFRES and PFP, respectively. The "Coverage" column shows the percentage of sequences for which the SCPRED and PFRES/PFP predicted the same structural class.</p>
               </tblfn>
            </tbl>
            <p>SCPRED obtains 87.5% accuracy on this dataset with sequences sharing pairwise twilight-zone similarity, which confirms high quality of our method. The PFRES and PFP method predict structural class with 92.1% and 65.8% accuracy, respectively. Although PFRES obtains higher accuracy than SCPRED, this method is more complex (uses 36 features and an ensemble classifier) and its predictions are complementary to the predictions of the SCPRED. Namely, the post-processing with SCPRED improved the structural class prediction accuracy by 4.6% and the fold classification accuracy by 3.1% as a trade-off for removing only 13.4% of the predictions. The structural class/fold prediction accuracy of the coupled method equals 68.6% and 96.7%, respectively. The removed sequences were characterized by much lower prediction quality, i.e., 45.7% for fold and 62.8% for the class predictions. When comparing the accuracies of the PFRES fold predictions before and after post-processing using predictions of SCPRED, the accuracies for 7 fold types were improved (the biggest 33.8% improvement was obtained for ferredoxin-like fold, and the second biggest, 8.6%, for SH3-like barrel fold), for 6 fold types they deteriorated (the biggest 19% loss was observed for ribonuclease H-like motif fold, and the second biggest, 16.3%, for FAD/NAD (P)-binding domain fold), and for the remaining 13 fold typed the accuracies did not change.</p>
            <p>The improvements for the PFP method were more substantial. Post processing improved the fold prediction accuracy by 16.4% and the class prediction accuracy by 9.5% while removing 37.6% of predictions. The removed sequences were characterized by poorer predictions, i.e., 3.8% and 14.1% accuracies. Coupling of PFP with SCPRED a