<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-330</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Protein subcellular localization prediction based on compartment-specific features and structure conservation</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Su</snm>
               <mnm>Chia-Yu</mnm>
               <fnm>Emily</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>cysu@iis.sinica.edu.tw</email>
            </au>
            <au id="A2">
               <snm>Chiu</snm>
               <fnm>Hua-Sheng</fnm>
               <insr iid="I3"/>
               <email>huasheng@iis.sinica.edu.tw</email>
            </au>
            <au id="A3">
               <snm>Lo</snm>
               <fnm>Allan</fnm>
               <insr iid="I1"/>
               <insr iid="I4"/>
               <email>allanlo@iis.sinica.edu.tw</email>
            </au>
            <au id="A4">
               <snm>Hwang</snm>
               <fnm>Jenn-Kang</fnm>
               <insr iid="I2"/>
               <email>jkhwang@cc.nctu.edu.tw</email>
            </au>
            <au id="A5">
               <snm>Sung</snm>
               <fnm>Ting-Yi</fnm>
               <insr iid="I3"/>
               <email>tsung@iis.sinica.edu.tw</email>
            </au>
            <au id="A6" ca="yes">
               <snm>Hsu</snm>
               <fnm>Wen-Lian</fnm>
               <insr iid="I3"/>
               <email>hsu@iis.sinica.edu.tw</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei, Taiwan</p>
            </ins>
            <ins id="I2">
               <p>Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan</p>
            </ins>
            <ins id="I3">
               <p>Bioinformatics Lab., Institute of Information Science, Academia Sinica, Taipei, Taiwan</p>
            </ins>
            <ins id="I4">
               <p>Department of Life Sciences, National Tsing Hua University, Hsinchu, Taiwan</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>330</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/330</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17825110</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-330</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>25</day>
               <month>4</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>08</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>08</day>
               <month>9</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Su et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Protein subcellular localization is crucial for genome annotation, protein function prediction, and drug discovery. Determination of subcellular localization using experimental approaches is time-consuming; thus, computational approaches become highly desirable. Extensive studies of localization prediction have led to the development of several methods including composition-based and homology-based methods. However, their performance might be significantly degraded if homologous sequences are not detected. Moreover, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We propose a hybrid prediction method for Gram-negative bacteria that combines a one-versus-one support vector machines (SVM) model and a structural homology approach. The SVM model comprises a number of binary classifiers, in which biological features derived from Gram-negative bacteria translocation pathways are incorporated. In the structural homology approach, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. The hybrid method achieves overall accuracy of 93.7% and 93.2% using ten-fold cross-validation on the benchmark data sets. In the assessment of the evaluation data sets, our method also attains accurate prediction accuracy of 84.0%, especially when testing on sequences with a low level of homology to the training data. A three-way data split procedure is also incorporated to prevent overestimation of the predictive performance. In addition, we show that the prediction accuracy should be approximately 85% for non-redundant data sets of sequence identity less than 30%.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results demonstrate that biological features derived from Gram-negative bacteria translocation pathways yield a significant improvement. The biological features are interpretable and can be applied in advanced analyses and experimental designs. Moreover, the overall accuracy of combining the structural homology approach is further improved, which suggests that structural conservation could be a useful indicator for inferring localization in addition to sequence homology. The proposed method can be used in large-scale analyses of proteomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>The prediction of protein subcellular localization (PSL) focuses on determining localization sites of unknown proteins in a cell. The study of PSL is important for elucidating protein functions involved in various cellular processes. Despite recent technical advances, experimental determination of PSL remains time-consuming and labor-intensive. In addition, researches in the post-genomic era have yielded a tremendous amount of sequence data. Given the size and complexity of the data, many researchers would prefer to use prediction systems to identify and screen possible candidates for further analyses. Hence, computational approaches have become increasingly important.</p>
         <sec>
            <st>
               <p>Previous works</p>
            </st>
            <p>Extensive studies of PSL prediction have led to the development of several methods, which can be classified as follows.</p>
            <p>1. <it>Amino acid composition-based methods </it>These methods utilize machine learning techniques, including neural networks <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and support vector machines (SVM) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. This category includes methods like P-CLASSIFIER <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and CELLO <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>, which utilize <it>n</it>-peptide composition-based SVM approaches.</p>
            <p>2. <it>Methods that integrate various protein characteristics </it>Several methods including expert systems <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, <it>k</it>-nearest neighbor <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>, SVM <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, support vector data description <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, and Bayesian networks <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>, integrate various biological features that influence localization. The features that characterize a protein can be extracted from biological literature, public databases, and related prediction systems. Both PSORTb <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp> and PSLpred <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> integrate different analytical modules and demonstrate that the hybrid approaches perform better than each individual module.</p>
            <p>3. <it>Sequence homology-based methods </it>It has been suggested that PSL is an evolutionary conserved trait <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Efforts to address the relationship between evolutionary information and localization identity have relied heavily on exploiting sequence similarity to infer PSL. Such methods include phylogenetic profiling <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>, domain projection <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, and a sequence homology-based method <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Several other methods, such as PSORTb and PSLpred, also incorporate such sequence homology-based components in their analyses.</p>
         </sec>
         <sec>
            <st>
               <p>Our contributions</p>
            </st>
            <p>The prediction of PSL presents several challenges. First, the performance of amino acid composition-based and sequence homology-based methods might be significantly degraded if homologous sequences are not detected. Second, the results of these two methods are generally difficult to interpret; therefore, it is difficult to determine which biological features should be used to identify specific PSL and why they work well for prediction. If the features were biologically interpretable, the resultant knowledge could help in designing artificial proteins with the desired properties. Meanwhile, methods that integrate various features could suffer from the problem of low coverage in high-throughput proteomic analyses due to the lack of information to characterize unknown proteins. Finally, many PSL methods are implemented on redundant training sets, which might lead to overestimation of the predictive performance. Thus, the performance would be significantly lower if redundant sequences were meticulously removed.</p>
            <p>In this study, we propose a hybrid method that combines a one-versus-one (1-v-1) SVM model referred to as PSL101 (<ul>P</ul>rotein <ul>S</ul>ubcellular <ul>L</ul>ocalization prediction by <ul>1</ul>-<ul>O</ul>n-<ul>1</ul> classifiers) and a structural homology approach called PSLsse (<ul>P</ul>rotein <ul>S</ul>ubcellular <ul>L</ul>ocalization prediction by <ul>s</ul>econdary <ul>s</ul>tructure <ul>e</ul>lement alignment) to predict the PSL for Gram-negative bacteria. PSL101 comprises a number of binary classifiers, where compartment-specific biological features derived from Gram-negative bacteria translocation pathways are incorporated. In PSLsse, we employ secondary structure alignment for structural similarity comparison and assign the known localization of the top-ranked protein as the predicted localization of a query protein. Experiment results show that PSL101 achieves high prediction accuracy, which demonstrates that biological features derived from Gram-negative bacteria translocation pathways significantly enhance the performance. Moreover, since the selected features are biologically interpretable, they can be easily applied to advanced analyses and experimental designs. Most notably, the overall accuracy of combining PSL101 and PSLsse is further improved to 93.7%, which is a 2.5% improvement over the second best method. Our analysis suggests that, in addition to sequence homology, structural homology can also be an effective indicator for inferring PSL. Lastly, since sequence redundancy in the training data often leads to overestimation of prediction accuracy, we present an evaluation using non-redundant data sets. It is also known that cross-validation may overestimate the predictive performance when parameters are optimized repeatedly on the same test data. Therefore, we adopt a three-way data split procedure for evaluating the non-redundant data sets. The results suggest that these techniques can prevent overestimation of the performance such that the general performance of PSL prediction should be approximately 85%. In the assessment of the evaluation data sets, our hybrid method also provides accurate prediction, especially for those sequences of low homology to the training set.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Data sets</p>
            </st>
            <p>To assess our method, we utilize several data sets of Gram-negative bacteria proteins that have been used in previous works <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr><abbr bid="B14">14</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. Gram-negative bacteria have five major PSL sites: the cytoplasm (CP), inner membrane (IM), periplasm (PP), outer membrane (OM), and extracellular space (EC). Table <tblr tid="T1">1</tblr> lists the number of proteins in different localization sites in the data sets, which are detailed in Table 1S of the supplementary material [see Additional file <supplr sid="S1">1</supplr>].</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Number of proteins distributed in different localization sites in the data sets.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Benchmark</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Non-redundant</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Evaluation</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>PS1302</p>
                     </c>
                     <c ca="center">
                        <p>PS1444</p>
                     </c>
                     <c ca="center">
                        <p>NR755</p>
                     </c>
                     <c ca="center">
                        <p>NR828</p>
                     </c>
                     <c ca="center">
                        <p>EV90_high</p>
                     </c>
                     <c ca="center">
                        <p>EV153_low</p>
                     </c>
                     <c ca="center">
                        <p>EV243_all</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cytoplasm (CP)</p>
                     </c>
                     <c ca="center">
                        <p>248</p>
                     </c>
                     <c ca="center">
                        <p>278</p>
                     </c>
                     <c ca="center">
                        <p>206</p>
                     </c>
                     <c ca="center">
                        <p>229</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                     <c ca="center">
                        <p>124</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Inner membrane (IM)</p>
                     </c>
                     <c ca="center">
                        <p>268</p>
                     </c>
                     <c ca="center">
                        <p>309</p>
                     </c>
                     <c ca="center">
                        <p>182</p>
                     </c>
                     <c ca="center">
                        <p>205</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>26</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Periplasm (PP)</p>
                     </c>
                     <c ca="center">
                        <p>244</p>
                     </c>
                     <c ca="center">
                        <p>276</p>
                     </c>
                     <c ca="center">
                        <p>147</p>
                     </c>
                     <c ca="center">
                        <p>161</p>
                     </c>
                     <c ca="center">
                        <p>13</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Outer membrane (OM)</p>
                     </c>
                     <c ca="center">
                        <p>352</p>
                     </c>
                     <c ca="center">
                        <p>391</p>
                     </c>
                     <c ca="center">
                        <p>134</p>
                     </c>
                     <c ca="center">
                        <p>148</p>
                     </c>
                     <c ca="center">
                        <p>19</p>
                     </c>
                     <c ca="center">
                        <p>9</p>
                     </c>
                     <c ca="center">
                        <p>28</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Extracellular space (EC)</p>
                     </c>
                     <c ca="center">
                        <p>190</p>
                     </c>
                     <c ca="center">
                        <p>190</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>85</p>
                     </c>
                     <c ca="center">
                        <p>4</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>1302</p>
                     </c>
                     <c ca="center">
                        <p>1444</p>
                     </c>
                     <c ca="center">
                        <p>755</p>
                     </c>
                     <c ca="center">
                        <p>828</p>
                     </c>
                     <c ca="center">
                        <p>90</p>
                     </c>
                     <c ca="center">
                        <p>153</p>
                     </c>
                     <c ca="center">
                        <p>243</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Protein subcellular localization prediction based on compartment-specific features and structure conservation (Supplementary Data). The supplementary material of this study.</p>
               </text>
               <file name="1471-2105-8-330-S1.pdf">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <p>1. Benchmark data sets: Derived from the first release of ePSORTdb <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, the first data set, referred to as PS1302, consists of proteins with experimentally determined localizations. The second data set, PS1444, is an expanded version of PS1302.</p>
            <p>2. Non-redundant data sets: To assess the predictive performance of non-homologous proteins, we utilize CD-HIT <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, a redundancy filtering program, to eliminate sequences that share greater or equal to 30% sequence identity in the PS1302 and PS1444 data sets, which yields the NR755 and NR828 data sets, respectively.</p>
            <p>3. Evaluation data sets: Recently, a new data set <abbrgrp><abbr bid="B22">22</abbr></abbrgrp> comprised of 299 proteins was created for comparison of different methods. We first apply ClustalW <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> to divide the new set into two subsets according to the sequence identity of each protein pair between the 299 proteins and proteins in the known training sets (i.e., PS1302 and PS1444) with a cutoff of 30%. Then, redundant sequences are removed from each subset by CD-HIT with a 30% threshold; the resultant non-redundant data sets are called EV90_high (&#8807;30%) and EV153_low (&lt;30%). The combination of both sets is referred to as the EV243_all data set.</p>
         </sec>
         <sec>
            <st>
               <p>Effect of biological features derived from Gram-negative bacteria translocation pathways</p>
            </st>
            <p>Since it is impractical to try all possible feature combinations in different classifiers, heuristics guided by biological insights are used to determine a small subset of feature sets specific to each classifier. Starting with an empty subset, a sequential forward search algorithm <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> keeps adding the best feature sets that improve the accuracy. The process terminates when adding a feature set no longer makes any improvement. The performance of PSL101 evaluated by ten-fold cross-validation for the benchmark data sets is shown in the leftmost column of Table <tblr tid="T2">2</tblr>. PSL101 attains overall accuracy of 92.7% and 91.6% for the PS1302 and PS1444 data sets, respectively. Most notably, CP and IM proteins attain accurate prediction performance in terms of both accuracy and <it>MCC</it>, which can be explained by the fact that proteins localized in CP and IM are characterized by several well-known biological features in our method.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Comparison of different hybrid approaches using cross-validation for the benchmark data sets.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6" ca="center">
                        <p>PS1302</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSL101</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLseq+PSL101</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLsse+PSL101</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>97.2 (94.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.91 (0.89)</p>
                     </c>
                     <c ca="center">
                        <p>96.4 (94.4)</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (0.89)</p>
                     </c>
                     <c ca="center">
                        <p>95.6 (94.4)</p>
                     </c>
                     <c ca="center">
                        <p>0.90 (0.90)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>94.4 (92.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.95 (0.94)</p>
                     </c>
                     <c ca="center">
                        <p>93.3 (91.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.95 (0.93)</p>
                     </c>
                     <c ca="center">
                        <p>93.3 (91.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (0.93)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>87.7 (88.1)</p>
                     </c>
                     <c ca="center">
                        <p>0.86 (0.84)</p>
                     </c>
                     <c ca="center">
                        <p>88.9 (88.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.86 (0.85)</p>
                     </c>
                     <c ca="center">
                        <p>91.4 (91.0)</p>
                     </c>
                     <c ca="center">
                        <p>0.88 (0.88)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>94.3 (93.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (0.91)</p>
                     </c>
                     <c ca="center">
                        <p>95.5 (95.7)</p>
                     </c>
                     <c ca="center">
                        <p>0.96 (0.93)</p>
                     </c>
                     <c ca="center">
                        <p>96.3 (96.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.96 (0.95)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>87.9 (83.2)</p>
                     </c>
                     <c ca="center">
                        <p>0.87 (0.84)</p>
                     </c>
                     <c ca="center">
                        <p>89.5 (85.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.89 (0.87)</p>
                     </c>
                     <c ca="center">
                        <p>90.0 (87.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.89 (0.89)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>92.7 (91.2)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>93.1 (91.9)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>93.7 (92.9)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6" ca="center">
                        <p>PS1444</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSL101</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLseq+PSL101</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLsse+PSL101</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>96.0 (94.2)</p>
                     </c>
                     <c ca="center">
                        <p>0.91 (0.90)</p>
                     </c>
                     <c ca="center">
                        <p>94.6 (92.8)</p>
                     </c>
                     <c ca="center">
                        <p>0.89 (0.88)</p>
                     </c>
                     <c ca="center">
                        <p>95.0 (93.5)</p>
                     </c>
                     <c ca="center">
                        <p>0.91 (0.90)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>94.5 (92.6)</p>
                     </c>
                     <c ca="center">
                        <p>0.95 (0.94)</p>
                     </c>
                     <c ca="center">
                        <p>93.5 (91.6)</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (0.93)</p>
                     </c>
                     <c ca="center">
                        <p>93.5 (91.6)</p>
                     </c>
                     <c ca="center">
                        <p>0.94 (0.93)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>85.1 (88.0)</p>
                     </c>
                     <c ca="center">
                        <p>0.82 (0.83)</p>
                     </c>
                     <c ca="center">
                        <p>87.0 (88.4)</p>
                     </c>
                     <c ca="center">
                        <p>0.84 (0.83)</p>
                     </c>
                     <c ca="center">
                        <p>90.2 (91.7)</p>
                     </c>
                     <c ca="center">
                        <p>0.86 (0.87)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>94.9 (93.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.93 (0.91)</p>
                     </c>
                     <c ca="center">
                        <p>95.9 (95.7)</p>
                     </c>
                     <c ca="center">
                        <p>0.95 (0.93)</p>
                     </c>
                     <c ca="center">
                        <p>96.7 (96.4)</p>
                     </c>
                     <c ca="center">
                        <p>0.96 (0.95)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>82.6 (83.2)</p>
                     </c>
                     <c ca="center">
                        <p>0.83 (0.85)</p>
                     </c>
                     <c ca="center">
                        <p>87.9 (86.3)</p>
                     </c>
                     <c ca="center">
                        <p>0.87 (0.88)</p>
                     </c>
                     <c ca="center">
                        <p>87.4 (87.9)</p>
                     </c>
                     <c ca="center">
                        <p>0.87 (0.89)</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>91.6 (91.1)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>92.4 (91.6)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>93.2 (92.8)</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>&#167; The performance of incorporating a three-way data split procedure is indicated in the parentheses.</p>
               </tblfn>
            </tbl>
            <p>The features selected from PSL101 for the PS1302 data set using cross-validation are shown in Figure <figr fid="F1">1</figr>; the same set of features is used in the corresponding training and testing scheme for the PS1444 data set. The experiment results demonstrate that our feature selection not only yields a significant improvement in the performance, but also correlates well with biological insights. For example, in Figure <figr fid="F1">1</figr>, PSL101 selects signal peptides, transmembrane <it>a</it>-helices, and relevant solvent accessibility (i.e. SIG, TMA, and RSA) as the optimal features to distinguish CP and IM proteins. In addition, di-peptide composition, signal peptides, and transmembrane <it>&#946;</it>-barrels (i.e. DP, SIG, and TMB) are used in the discrimination of CP and OM proteins. The combination of general and compartment-specific features works well in differentiating between any two compartments in each classifier; accordingly, the overall accuracy of the combined predictions of each classifier is improved. The results support our assumption that compartment-specific biological features derived from Gram-negative bacteria translocation pathways can significantly enhance the performance of PSL prediction. Moreover, the selected features are biologically interpretable and can be easily applied in further analyses.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Feature combinations derived from the PS1302 data set using cross-validation</p>
               </caption>
               <text>
                  <p><b>Feature combinations derived from the PS1302 data set using cross-validation</b>. Selected general and compartment-specific features are represented by filled circles and triangles, respectively.</p>
               </text>
               <graphic file="1471-2105-8-330-1"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Effect of sequence and structure conservation</p>
            </st>
            <p>We now explore the relationship between sequence and structural similarity and localization identity. Both sequence and structural homology approaches, referred to as PSLseq and PSLsse, are developed to infer localization based on sequence alignment using ClustalW and secondary structure alignment using SSEA <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, respectively. Figure <figr fid="F2">2</figr> shows that when the structural similarity is greater or equal to 80%, PSLsse performs slightly better than PSL101; otherwise, PSL101 is significantly better. Thus, we propose a hybrid approach that combines PSLsse and PSL101, called PSLsse+PSL101. For each query protein, if the top-rank aligned protein shares an 80% or greater structural similarity with any of the proteins in the training set, the localization is predicted by PSLsse; otherwise, it is predicted by PSL101. In addition, we implement another hybrid approach, called PSLseq+PSL101, which uses a cutoff of 30% sequence identity <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> to combine PSLseq and PSL101.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>The distribution of the prediction accuracy as a function of secondary structure similarity</p>
               </caption>
               <text>
                  <p><b>The distribution of the prediction accuracy as a function of secondary structure similarity</b>. The blue line and the red line indicate the distribution of the prediction accuracy as a function of secondary structure similarity for PSL101 and PSLsse using cross-validation for the PS1444 data set, respectively.</p>
               </text>
               <graphic file="1471-2105-8-330-2"/>
            </fig>
            <p>Table <tblr tid="T2">2</tblr> compares the performance of different hybrid approaches using ten-fold cross-validation for the benchmark data sets. Compared with PSL101, the performance of the two hybrid approaches, PSLseq+PSL101 and PSLsse+PSL101, is significantly enhanced in terms of the overall accuracy, as well as the accuracy and <it>MCC </it>of most localization sites. Most notably, the accuracy of EC proteins in both data sets is improved by 1.6%~5.3%, which suggests that homology-based approaches can compensate for the performance of PSL101 and thereby enhance the prediction of EC proteins. Moreover, PSLsse+PSL101 achieves an overall accuracy of 93.7% and 93.2% in the PS1302 and PS1444 data sets, respectively, which are 0.6%~0.8% improvements over PSLseq+PSL101. We show that homology approaches based on sequence and structure conservation work well in PSL prediction; in fact, structural homology could be effective for prediction in addition to sequence homology. Thus, it could also be a useful indicator for inferring PSL.</p>
         </sec>
         <sec>
            <st>
               <p>Performance comparison of <it>n</it>-fold cross validation and three-way data split</p>
            </st>
            <p>The performance of the three-way data split experiments is shown in parentheses in Table <tblr tid="T2">2</tblr>. The features selected from PSL101 for the PS1302 data set using three-way data split are shown in Figure 1S in the supplementary material [see Additional file <supplr sid="S1">1</supplr>]; the same set of features is used in the corresponding training and testing scheme for the PS1444 data set. The overall accuracy of PSL101, PSLseq+PSL101, and PSLsse+PSL101 drop 0.4%~1.5% for both the PS1302 and PS1444 data sets. Specifically, the accuracy and <it>MCC </it>of the same localization sites are consistent across the two different data sets. Moreover, the performance of the two data sets evaluated using three-way data split is more consistent than that assessed by ten-fold cross-validation. This suggests that a three-way data split procedure could avoid overestimation of the predictive performance; therefore, it should be considered in PSL prediction.</p>
         </sec>
         <sec>
            <st>
               <p>Comparison with other approaches using the benchmark data sets</p>
            </st>
            <p>Table <tblr tid="T3">3</tblr> compares the performance of PSLsse+PSL101, referred to as HYBRID, with other prediction methods using cross-validation on the benchmark data sets. HYBRID attains the best overall accuracy of 93.7% and 93.2% for the PS1302 and PS1444 data sets, respectively. In both sets, HYBRID achieves improvements of 2.5%~3.2% in overall accuracy compared to the second best approaches in each data set. With respect to accuracy and <it>MCC</it>, HYBRID performs better than the other approaches in most localization sites. HYBRID ranks the best in terms of accuracy for CP, IM, and OM proteins, in which more biological features are incorporated than the other localization sites. The high predictive performance for CP, IM, and OM proteins demonstrates that biological features derived from Gram-negative bacteria translocation pathways are effective for PSL prediction. Most notably, it outperforms the second best approaches for IM proteins by 4.9~14.6% and 0.9~3.5% in terms of accuracy for the PS1302 and PS1444 data sets, respectively. This is a particular strength of HYBRID because IM proteins constitute the key components of various cellular processes and serve as important targets for drug discovery <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. In addition, it is interesting to note that the accuracy of IM proteins is significantly improved from 78.7% in PSORTb v.1.1 to 92.6% in PSORTb v.2.0, in which an expanded homology module is incorporated. This also lends support on our assumption that sequence and structural homology approaches could be effective indicators for inferring PSL.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Performance comparison of different approaches using cross-validation for the benchmark data sets.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>PS1302</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HYBRID</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CELLO</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSORTb v.1.1</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLpred</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>P-CLASSIFIER</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>95.6</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.90</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>90.7</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                     <c ca="center">
                        <p>69.4</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>90.7</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>94.6</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>93.3</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.94</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>88.4</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>78.7</p>
                     </c>
                     <c ca="center">
                        <p>0.85</p>
                     </c>
                     <c ca="center">
                        <p>86.8</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>87.1</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>91.4</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>86.9</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>57.6</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                     <c ca="center">
                        <p>90.3</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.90</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>85.9</p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>96.3</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.96</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>94.6</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>90.3</p>
                     </c>
                     <c ca="center">
                        <p>0.93</p>
                     </c>
                     <c ca="center">
                        <p>95.2</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>93.6</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>90.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.89</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>78.9</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>70.0</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>90.6</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                     <c ca="center">
                        <p>86.0</p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>93.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>88.9</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>74.8</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>91.2</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>89.8</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>PS1444</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HYBRID</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CELLO II</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSORTb v.2.0</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLpred</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>P-CLASSIFIER</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>95.0</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.91</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>95.3</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.89</p>
                     </c>
                     <c ca="center">
                        <p>70.1</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>93.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.94</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>90.0</p>
                     </c>
                     <c ca="center">
                        <p>0.91</p>
                     </c>
                     <c ca="center">
                        <p>92.6</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>90.2</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.86</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>87.7</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>69.2</p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>96.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.96</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92.8</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                     <c ca="center">
                        <p>94.9</p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>87.4</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.87</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>79.5</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                     <c ca="center">
                        <p>78.9</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>93.2</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>90.0</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>82.6</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>&#167; The best performance of overall and individual localization sites is underlined.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Comparison with other approaches using the evaluation data sets</p>
            </st>
            <p>The evaluation data sets were submitted to the web servers of each prediction method. The predictive performance is shown in Table <tblr tid="T4">4</tblr>. CELLO II and P-CLASSIFIER achieve consistent overall accuracy in the range of 71.9%~77.8% for the EV90_high and EV153_low data sets. PSLpred attains overall accuracy of 72.5% and 88.9% for the EV153_low and EV90_high sets, respectively. PSORTb v.2.0 performs very well for the EV90_high set, but poorly for the EV153_low set. HYBRID yields the best predictions for proteins of low sequence similarity and ranks second best for highly homologous sequences. This demonstrates that when no homologous sequences are detected, biological features derived from Gram-negative bacteria translocation pathways yield accurate prediction; on the other hand, the incorporation of structural homology approach further improves the predictive performance for highly homologous sequences. When both data sets are evaluated on the EV243_all set, HYBRID achieves an overall accuracy of 84.0%, which is a 5.4% improvement over the second best method. This suggests that HYBRID could enhance the robustness of PSL prediction, especially when highly homologous sequences are not detected.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Predictive performance of different prediction methods for the evaluation data sets.</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>EV153_low</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HYBRID</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CELLO II</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSORTb v.2.0</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLpred</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>P-CLASSIFIER</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>91.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>91.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.70</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>63.5</p>
                     </c>
                     <c ca="center">
                        <p>-0.61</p>
                     </c>
                     <c ca="center">
                        <p>89.6</p>
                     </c>
                     <c ca="center">
                        <p>0.59</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>91.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>65.4</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.73</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>46.2</p>
                     </c>
                     <c ca="center">
                        <p>0.64</p>
                     </c>
                     <c ca="center">
                        <p>46.2</p>
                     </c>
                     <c ca="center">
                        <p>-0.58</p>
                     </c>
                     <c ca="center">
                        <p>38.5</p>
                     </c>
                     <c ca="center">
                        <p>0.41</p>
                     </c>
                     <c ca="center">
                        <p>30.8</p>
                     </c>
                     <c ca="center">
                        <p>0.48</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>45.5</p>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>81.8</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.49</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>00.0</p>
                     </c>
                     <c ca="center">
                        <p>-0.03</p>
                     </c>
                     <c ca="center">
                        <p>54.5</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>81.8</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.49</ul>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>44.4</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.58</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>33.3</p>
                     </c>
                     <c ca="center">
                        <p>0.34</p>
                     </c>
                     <c ca="center">
                        <p>22.2</p>
                     </c>
                     <c ca="center">
                        <p>-0.46</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>44.4</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.58</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>22.2</p>
                     </c>
                     <c ca="center">
                        <p>0.17</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>27.3</p>
                     </c>
                     <c ca="center">
                        <p>0.43</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>45.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>09.1</p>
                     </c>
                     <c ca="center">
                        <p>-0.29</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>45.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.54</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>27.3</p>
                     </c>
                     <c ca="center">
                        <p>0.33</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>76.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>76.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>49.7</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>72.5</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>71.9</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>EV90_high</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HYBRID</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CELLO II</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSORTb v.2.0</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLpred</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>P-CLASSIFIER</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.95</p>
                     </c>
                     <c ca="center">
                        <p>92.9</p>
                     </c>
                     <c ca="center">
                        <p>0.83</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>1.00</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>096.4</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>92.9</p>
                     </c>
                     <c ca="center">
                        <p>0.78</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>96.2</p>
                     </c>
                     <c ca="center">
                        <p>0.97</p>
                     </c>
                     <c ca="center">
                        <p>73.1</p>
                     </c>
                     <c ca="center">
                        <p>0.75</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>1.00</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92.3</p>
                     </c>
                     <c ca="center">
                        <p>0.92</p>
                     </c>
                     <c ca="center">
                        <p>80.8</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.96</p>
                     </c>
                     <c ca="center">
                        <p>61.5</p>
                     </c>
                     <c ca="center">
                        <p>0.58</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>1.00</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>92.3</p>
                     </c>
                     <c ca="center">
                        <p>0.83</p>
                     </c>
                     <c ca="center">
                        <p>46.2</p>
                     </c>
                     <c ca="center">
                        <p>0.46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>94.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.97</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>73.7</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>94.7</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.97</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>68.4</p>
                     </c>
                     <c ca="center">
                        <p>0.79</p>
                     </c>
                     <c ca="center">
                        <p>73.7</p>
                     </c>
                     <c ca="center">
                        <p>0.69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>75.0</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>75.0</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>1.00</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>100.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.81</p>
                     </c>
                     <c ca="center">
                        <p>75.0</p>
                     </c>
                     <c ca="center">
                        <p>0.54</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>96.7</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>77.8</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>98.9</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>88.9</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>77.8</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10" ca="center">
                        <p>EV243_all</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>HYBRID</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>CELLO II</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSORTb v.2.0</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>PSLpred</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>P-CLASSIFIER</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>93.5</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.80</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>91.9</p>
                     </c>
                     <c ca="center">
                        <p>0.77</p>
                     </c>
                     <c ca="center">
                        <p>71.8</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>91.1</p>
                     </c>
                     <c ca="center">
                        <p>0.72</p>
                     </c>
                     <c ca="center">
                        <p>91.9</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>80.8</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.85</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>59.6</p>
                     </c>
                     <c ca="center">
                        <p>0.70</p>
                     </c>
                     <c ca="center">
                        <p>73.1</p>
                     </c>
                     <c ca="center">
                        <p>0.80</p>
                     </c>
                     <c ca="center">
                        <p>65.4</p>
                     </c>
                     <c ca="center">
                        <p>0.68</p>
                     </c>
                     <c ca="center">
                        <p>55.8</p>
                     </c>
                     <c ca="center">
                        <p>0.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>75.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.56</p>
                     </c>
                     <c ca="center">
                        <p>70.8</p>
                     </c>
                     <c ca="center">
                        <p>0.51</p>
                     </c>
                     <c ca="center">
                        <p>54.2</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.66</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>75.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>62.5</p>
                     </c>
                     <c ca="center">
                        <p>0.45</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>78.6</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.85</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>60.7</p>
                     </c>
                     <c ca="center">
                        <p>0.58</p>
                     </c>
                     <c ca="center">
                        <p>71.4</p>
                     </c>
                     <c ca="center">
                        <p>0.83</p>
                     </c>
                     <c ca="center">
                        <p>60.7</p>
                     </c>
                     <c ca="center">
                        <p>0.73</p>
                     </c>
                     <c ca="center">
                        <p>57.1</p>
                     </c>
                     <c ca="center">
                        <p>0.53</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>40.0</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>53.3</p>
                     </c>
                     <c ca="center">
                        <p>0.50</p>
                     </c>
                     <c ca="center">
                        <p>33.3</p>
                     </c>
                     <c ca="center">
                        <p>0.57</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>60.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>0.62</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>40.0</p>
                     </c>
                     <c ca="center">
                        <p>0.39</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>
                           <ul>84.0</ul>
                        </p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>77.0</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>67.9</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>78.6</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>74.1</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>&#167; The best performance of overall and individual localization sites is underlined. HYBRID is trained on the PS1444 data set.</p>
               </tblfn>
            </tbl>
         </sec>
         <sec>
            <st>
               <p>Performance of non-redundant data sets</p>
            </st>
            <p>In both benchmark data sets, proteins sharing up to 30% sequence identity comprise approximately 42% of the sets. One drawback of a high level of redundancy in data sets is that it could lead to poor generalization for a predictor, since the predictor might fail to assign a correct PSL, especially for those sequences of low homology to the training set. For this reason, the construction of non-redundant data sets is necessary when evaluating the performance of PSL prediction.</p>
            <p>Here, we present performance assessments using non-redundant sequences from Gram-negative bacteria data sets. Using the same features derived from the PS1302 set by cross-validation, we use HYBRID to train and evaluate the two non-redundant sets via ten-fold cross-validation. The performance is shown in Table <tblr tid="T5">5</tblr>. The overall accuracy declines markedly by approximately 8% using the non-redundant sets compared with those using the redundant sets. The <it>MCC </it>for individual localization sites also drops by 0.04~0.26. These results indicate that the general performance of PSL prediction for Gram-negative bacteria is approximately 85% for non-redundant data sets. Methods that are less dependent on homology detection should be developed if highly homologous sequences are removed completely.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Performance of non-redundant data sets.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>Localization</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>NR755</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>NR828</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>Acc </it>(%)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>MCC</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>CP</p>
                     </c>
                     <c ca="center">
                        <p>95.6</p>
                     </c>
                     <c ca="center">
                        <p>0.86</p>
                     </c>
                     <c ca="center">
                        <p>97.8</p>
                     </c>
                     <c ca="center">
                        <p>0.87</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>IM</p>
                     </c>
                     <c ca="center">
                        <p>88.5</p>
                     </c>
                     <c ca="center">
                        <p>0.88</p>
                     </c>
                     <c ca="center">
                        <p>88.8</p>
                     </c>
                     <c ca="center">
                        <p>0.90</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PP</p>
                     </c>
                     <c ca="center">
                        <p>81.0</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                     <c ca="center">
                        <p>80.7</p>
                     </c>
                     <c ca="center">
                        <p>0.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OM</p>
                     </c>
                     <c ca="center">
                        <p>85.1</p>
                     </c>
                     <c ca="center">
                        <p>0.84</p>
                     </c>
                     <c ca="center">
                        <p>83.8</p>
                     </c>
                     <c ca="center">
                        <p>0.82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>EC</p>
                     </c>
                     <c ca="center">
                        <p>64.0</p>
                     </c>
                     <c ca="center">
                        <p>0.65</p>
                     </c>
                     <c ca="center">
                        <p>57.6</p>
                     </c>
                     <c ca="center">
                        <p>0.61</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Overall</p>
                     </c>
                     <c ca="center">
                        <p>85.6</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>85.6</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>In this paper, we have proposed a hybrid method for predicting PSL for Gram-negative bacteria based on a combination of a 1-v-1 SVM model using compartment-specific biological features and a structural homology approach using secondary structure alignment. Experiment results show that the SVM model achieves high prediction accuracy for both benchmark data sets, thus supporting the assumption that biological features derived from Gram-negative bacteria translocation pathways could significantly improve the performance. The overall accuracy of combining the SVM model and the structural homology approach is further improved, which indicates that structural homology, like sequence homology, could also be a useful indicator for inferring PSL. A three-way data split procedure is incorporated to prevent overfitting of the parameters and features. In addition, non-redundant data sets have been used for the evaluation of Gram-negative bacteria. The results suggest that the performance could be overestimated if redundant sequences are considered. In the assessment of the evaluation data sets, our hybrid method provides accurate predictions, especially when sequences of low sequence similarity to the training data are detected. The proposed method can be used in large-scale analyses of proteomes and is freely available for public use at <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
         <p>There are still some challenges to be addressed in PSL prediction. In our work, we only consider proteins with single localization sites. However, proteins with multiple localization sites are not a rarity, especially in higher order species <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp>. In our future development, we will consider those proteins localized to multiple compartments. In addition, better accuracy and coverage are needed, particularly for several poorly predicted localization sites. We will also extend our method to combine more biological features, analyze multiple compartment proteins, and incorporate proteins of more species, including those of humans.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Gram-negative bacteria translocation pathways</p>
            </st>
            <p>Proteins synthesized in the cytosol must be targeted and transported to their designated compartments in Gram-negative bacteria through one of the translocation pathways <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Gram-negative bacteria have five major PSL sites, which are the CP, IM, PP, OM, and EC. Figure <figr fid="F3">3</figr> shows some of the translocation pathways in Gram-negative bacteria. Translocations through the IM are targeted, both co-translationally and post-translationally, to the SecYEG translocase via the signal recognition particle (SRP)-dependent pathways and the SecB-dependent pathways, respectively. Alternatively, proteins localized to the PP can cross the IM by the twin arginine translocation pathway. PP proteins can be inserted or translocated across the OM through five secretory pathways, including Type I <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> and Type II <abbrgrp><abbr bid="B37">37</abbr></abbrgrp> export systems. Regardless of the mode of translocation, the process is largely substrate specific, and therefore requires one or more signals in order to cross a membrane. For example, non-cytoplasmic proteins contain signal sequences that direct them to translocate through the IM. Furthermore, many proteins localized to a compartment have characteristic structures and amino acid compositions. Integral IM proteins contain mainly transmembrane <it>a</it>-helices, in which their cores are populated by hydrophobic residues. Therefore, we model the prediction system according to the translocation pathways by identifying signals that influence the targeting and compartment-specific features that correlate with various localization sites.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Diversity of Gram-negative bacteria translocation pathways</p>
               </caption>
               <text>
                  <p><b>Diversity of Gram-negative bacteria translocation pathways</b>. 1, 2, 3, 4, 5, and 6 represent translocation pathways from CP to IM, CP to PP, CP to EC, IM to PP, PP to OM, and PP to EC, respectively. SRP, signal recognition particle, SecB, export-specific cytoplasmic chaperone, SecA, preprotein translocase SecA subunit, SecYEG, preprotein translocase complex, lep, leader peptidase, TAT, twin argine translocase, Gsp complex, general secretion pathway complex, Omp85, outer membrane protein assembly factor, ABC transporter, ATP-binding cassette transporter, TolC, Type I secretion outer membrane protein. [Modified from Wickner and Schekman (2005) with permission]</p>
               </text>
               <graphic file="1471-2105-8-330-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>System architecture of PSL101</p>
            </st>
            <p>The system architecture of PSL101, shown in Figure <figr fid="F4">4</figr>, comprises ten binary 1-v-1 SVM <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> classifiers for the prediction of five localization sites of Gram-negative bacteria. Each translocation step across compartments <it>i </it>and <it>j </it>is represented by a binary classifier <it>C</it><sub><it>i</it>,<it>j </it></sub>in which different biological features intrinsic to the proteins in compartments <it>i </it>and <it>j </it>are incorporated. All translocations in Figure <figr fid="F3">3</figr>, i.e., translocation pathways 1 to 6, can be modelled in this way by using six binary classifiers. The remaining four classifiers, although not biologically occurring, are still constructed with compartment-specific features and combined with the above classifiers for an integrated prediction. For each query protein, a predicted class and its corresponding probability are returned by each classifier. To determine the predicted localization site of the protein, we combine the results of the ten binary classifiers based on majority vote. In the case of a tie, the localization site with the highest average probability is assigned as the final prediction result.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>System architecture of PSL101</p>
               </caption>
               <text>
                  <p>System architecture of PSL101.</p>
               </text>
               <graphic file="1471-2105-8-330-4"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Feature extraction and representation of PSL101</p>
            </st>
            <p>We consider the following biological features to distinguish between proteins translocated to different compartments, and construct our classification framework to mimic the translocation process of Gram-negative bacterial secretory pathways. Since some of these features may not be readily available, we utilize several web services to predict them.</p>
            <sec>
               <st>
                  <p>General biological features</p>
               </st>
               <p>1. Amino acid (AA) composition: Protein descriptors based on <it>n</it>-peptide compositions or their variations have proved effective in PSL prediction <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. If <it>n </it>= 1, then the <it>n</it>-peptide composition reduces to amino acid composition, which generates a 21 dimensional feature vector (i.e., 20 amino acid types plus a symbol 'X', for others) that represents the occurrence frequency of amino acids in a protein sequence.</p>
               <p>2. Di-peptide (DP) composition: Similar to amino acid composition, if <it>n </it>= 2, the di-peptide composition gives a fixed length of 21 &#215; 21 di-peptides, which represent the occurrence frequency of amino acid pairs in a protein sequence.</p>
               <p>3. Relative solvent accessibility (RSA): Proteins in different compartments have various buried and exposed residue compositions <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. For example, CP proteins have a balance of acidic and basic surface residues, while EC proteins have a slight excess of acidic surface residues <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. We use amino acid compositions of both buried and exposed residues, with a cutoff of 25% <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, to represent the results derived by SABLE II <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, a relative solvent accessibility prediction method.</p>
               <p>4. Secondary structure elements encoding scheme 1 (SSE1): Transmembrane <it>a</it>-helices are frequently observed in IM proteins, while transmembrane <it>&#946;</it>-barrels are primarily found in OM proteins <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Secondary structure elements (SSE) are crucial for detecting proteins localized in the IM and OM. We compute the amino acid compositions of three SSEs <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B40">40</abbr></abbrgrp>, <it>a</it>-helix, <it>&#946;</it>-strand, and loop, based on the predictions of HYPROSP II <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, a knowledge-based SSE prediction approach.</p>
               <p>5. Secondary structure elements encoding scheme 2 (SSE2): SSE1 alone cannot discriminate proteins that share similar SSE compositions and localize in different compartments. For example, the SSE compositions of OM proteins might be similar to proteins localized in other compartments, but OM proteins are characterized by <it>&#946;</it>-strand repeats throughout the transmembrane domains. To further depict such properties in a protein, three descriptors, composition, transition, and distribution, are used to encode predictions of HYPROSP II. Composition describes the global composition of a given SSE type in a protein. Transition characterizes the percentage frequency that amino acids of a particular SSE type are followed by a different type. Distribution measures the chain length within which the first, 25, 50, 75 and 100% of the amino acids of a particular SSE type are located <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. An example is shown in Figure 2S in the supplementary material [see Additional file <supplr sid="S1">1</supplr>].</p>
            </sec>
            <sec>
               <st>
                  <p>Compartment-specific biological features</p>
               </st>
               <p>1. Signal peptides (SIG): Signal peptides are N-terminal peptides, typically between 15 and 40 amino acids long, which target proteins for translocation through the general secretory pathway <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The presence of a signal peptide suggests that the protein does not reside in the CP and several prediction methods have been developed <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp>. We employ SignalP 3.0 <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, a neural network- and hidden Markov model-based method, to predict the presence and location of signal peptide cleavage sites.</p>
               <p>2. Transmembrane <it>a</it>-helices (TMA): Integral IM proteins are characterized by <it>a</it>-helices, typically 20&#8211;25 amino acids in length, which traverse the IM. The presence of one or more transmembrane <it>a</it>-helices implies that the protein is located in the IM. We apply TMHMM 2.0 <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>, a hidden Markov model-based method, to identify potential transmembrane <it>a</it>-helices.</p>
               <p>3. Twin-arginine translocase (TAT) motifs: The twin-arginine translocase system exports proteins from the CP to the PP. The proteins translocated by twin-arginine translocase bear a unique twin-arginine motif <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, the presence of which is a useful feature for distinguishing between PP and non-PP proteins. We use TatP 1.0 <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>, a neural network-based method, to predict the presence of twin-arginine translocase motifs.</p>
               <p>4. Transmembrane <it>&#946;</it>-barrels (TMB): A large number of proteins residing in the OM are characterized by <it>&#946;</it>-barrel structures; thus, they could be candidate features for detecting OM proteins. We adopt TMB-Hunt <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>, a method that uses a <it>k</it>-nearest neighbor algorithm, to distinguish between transmembrane <it>&#946;</it>-barrels and non-transmembrane <it>&#946;</it>-barrels.</p>
               <p>5. Non-classical protein secretion (SEC): For a long time, it was believed that an N-terminal signal peptide was absolutely necessary to export a protein to the extracellular space. However, recent studies have shown that several EC proteins can be secreted without a classical N-terminal signal peptide <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Identification of non-classical protein secretion could be a potential discriminator for CP and EC proteins. Predictions from SecretomeP 2.0 <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>, a non-classical protein secretion prediction method, are incorporated in our method.</p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Sequence and structure conservation</p>
            </st>
            <p>Because PSL tends to be evolutionary conserved, the known localization sites of homologous sequences could be useful indicators of the actual localization of an unknown protein. We apply both sequence and structural homology approaches to infer localization. For the sequence homology approach, we develop a prediction method, called PSLseq, which is based on pairwise sequence alignment of ClustalW. In the structural homology approach, we employ secondary structural similarity comparison, referred to as PSLsse. Based on secondary structure elements predicted by HYPROSP II, we use SSEA to perform pairwise secondary structure alignment. In the sequence and structural homology approaches, the known localization of the top-rank aligned protein is assigned to the query protein as its predicted localization.</p>
         </sec>
         <sec>
            <st>
               <p>Performance assessment</p>
            </st>
            <p>For comparison with other approaches, we follow the measures used in previous works <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr