<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-56</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>TMB-Hunt: An amino acid composition based method to screen proteomes for beta-barrel transmembrane proteins</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Garrow</snm>
               <mi>G</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
               <email>bmbagg@bmb.leeds.ac.uk</email>
            </au>
            <au id="A2">
               <snm>Agnew</snm>
               <fnm>Alison</fnm>
               <insr iid="I1"/>
               <email>A.M.Agnew@leeds.ac.uk</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Westhead</snm>
               <mi>R</mi>
               <fnm>David</fnm>
               <insr iid="I1"/>
               <email>D.R.Westhead@leeds.ac.uk</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>School of Biochemistry and Microbiology, University of Leeds, Leeds, LS2 9JT, UK</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>56</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/56</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15769290</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-56</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>01</day>
               <month>11</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>15</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>15</day>
               <month>3</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Garrow et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Beta-barrel transmembrane (bbtm) proteins are a functionally important and diverse group of proteins expressed in the outer membranes of bacteria (both gram negative and acid fast gram positive), mitochondria and chloroplasts. Despite recent publications describing reasonable levels of accuracy for discriminating between bbtm proteins and other proteins, screening of entire genomes remains troublesome as these molecules only constitute a small fraction of the sequences screened. Therefore, novel methods are still required capable of detecting new families of bbtm protein in diverse genomes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We present TMB-Hunt, a program that uses a <it>k</it>-Nearest Neighbour (<it>k</it>-NN) algorithm to discriminate between bbtm and non-bbtm proteins on the basis of their amino acid composition. By including differentially weighted amino acids, evolutionary information and by calibrating the scoring, an accuracy of 92.5% was achieved, with 91% sensitivity and 93.8% positive predictive value (PPV), using a rigorous cross-validation procedure.</p>
               <p>A major advantage of this approach is that because it does not rely on beta-strand detection, it does not require resolved structures and thus larger, more representative, training sets could be used. It is therefore believed that this approach will be invaluable in complementing other, physicochemical and homology based methods. This was demonstrated by the correct reassignment of a number of proteins which other predictors failed to classify. We have used the algorithm to screen several genomes and have discussed our findings.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>TMB-Hunt achieves a prediction accuracy level better than other approaches published to date. Results were significantly enhanced by use of evolutionary information and a system for calibrating <it>k</it>-NN scoring. Because the program uses a distinct approach to that of other discriminators and thus suffers different liabilities, we believe it will make a significant contribution to the development of a consensus approach for bbtm protein detection.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <sec>
            <st>
               <p>Beta-barrel transmembrane proteins</p>
            </st>
            <p>The beta-barrel is one of only two membrane spanning structural motifs currently identified <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. It is proven with high resolution structures for many proteins expressed within the outer membranes of gram negative bacteria and is also widely expected for several proteins expressed in the outer membranes of mitochondria <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and chloroplasts <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In addition, the structure of a protein found spanning the outer membrane of <it>Mycobacteria </it>(an acid fast gram positive bacterium) was recently resolved revealing two consecutive membrane spanning beta-barrels <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. As with alpha-helical transmembrane (ahtm) proteins, beta-barrel transmembrane (bbtm) proteins play both functionally important and diverse roles <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p>
            <p>Currently, over 92 bbtm protein structures are present in the protein databank <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, including 23 families as defined in PDB_TM <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. They are classified in the SCOP hierarchy, in 3 different folds <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, the transmembrane beta-barrels (described as not a true fold, but a gathering of beta-barrel membrane proteins), the integral outer membrane protein TolC fold and the Leukocidin (pore forming toxins) fold. The transmembrane beta-barrels consist of four SCOP superfamilies; OmpA-like, OmpT-like, OmpLA and the Porins; and include channels, enzymes and receptors. These superfamilies vary in numbers of subunits, where each subunit contributes a single barrel. The TolC fold, consists of one SCOP superfamily and includes proteins involved in secretion and expression of outer membrane proteins (OMPs) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. These proteins are trimeric with each subunit contributing four strands to a single barrel, and contain large stretches of alpha-helix, which stretch across the periplasm. Finally, the Leukocidin fold consists of heptameric pore forming toxins with each subunit contributing 2 strands to the barrel. TolC, Leukocidin and the <it>Mycobacterial </it>porin MspA (which is not yet classified within SCOP) can thus be considered "non-typical" bbtm proteins. From the diversity of bbtm proteins in different SCOP folds, it seems likely that these proteins have multiple evolutionary origins.</p>
            <p>These structures have helped reveal a number of features concerning transmembrane (TM) beta-strands and their organisation <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. TM beta-strands show an inside-outside dyad repeat motif of alternating residues facing the lipid bilayer and the inside of the barrel. Outside (lipid bilayer facing) residues are typically hydrophobic whilst inside (facing inside of barrel) residues are of intermediate polarity. TM beta-strands are often flanked by a layer of aromatic residues, believed to be involved in maintaining the protein's stability within the membrane <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Structures have also revealed an even number of strands, with N and C termini on the same side of the membrane. Strands form an antiparallel beta-meander topology with alternating long and short loops. The number of TM beta-strands in a barrel has been shown to range from 8&#8211;22 strands, with a range of 6&#8211;22 (most frequently 12) residues.</p>
            <p>In contrast to ahtm proteins, which are easy to identify through TM alpha-helices composed of 20 or more hydrophobic residues <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, the short and cryptic nature of TM beta-strands makes their discrimination difficult. Prediction is complicated further with beta-strands of some globular proteins superficially resembling those of bbtm proteins.</p>
         </sec>
         <sec>
            <st>
               <p>BBTM protein discriminators</p>
            </st>
            <p>Despite these difficulties, numerous methods have recently been published for the identification of these proteins, most commonly focusing on identification of TM beta-strands. Methods include rule based approaches <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, an architecture based approach <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, Hidden Markov Models (HMMs) <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>, a neural network based method <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, a combined neural network and support vector machine <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, composition of transmembrane beta strands combined with secondary structure prediction <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and an approach based on architecture <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> combined with isoleucine and asparagine abundance <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Of these, the first two give no indication of discriminatory accuracy, but the others range from 80 to 90%.</p>
            <p>Whilst this level of accuracy may seem acceptable if analysing a particular sequence of interest, problems will occur when screening an entire genome for potential bbtm proteins, owing to the fact that a large number of sequences are being tested of which these molecules only constitute a small fraction. There is therefore a need for programs with higher accuracy and in particular higher specificity, in order to minimise the false discovery rate.</p>
         </sec>
         <sec>
            <st>
               <p>Amino acid composition based protein classification</p>
            </st>
            <p>This paper describes TMB-Hunt, an amino acid composition based program for the identification of bbtm proteins. Amino acid composition has been analysed for bbtm proteins <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, however whole sequence composition has not previously been used for discrimination. Many previous studies have shown how amino acid composition can be successfully applied to protein sequence analysis, including prediction of structural class <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, discrimination of intra- and extra cellular proteins <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> and distinguishing between membrane protein type <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Amino acid composition is often used for prediction of subcellular location, as an alternative to signal detection based methods <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp> which are prone to errors in automated gene prediction at the 5' end <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. The limitation of this technique, however, is that the correlation of cell location with amino acid composition is not absolute. It was suggested that composition differences are a consequence of different requirements for protein folding, stability and transportation <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B26">26</abbr></abbrgrp>. Subsequently it has been shown that amino acid composition differences correlate most strongly with surface residues <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Thus, composition has been particularly useful in discriminating between ntm and ahtm proteins, which consist of large numbers of hydrophobic amino acids in contact with the lipid bilayer. This feature has enabled algorithms to be developed capable of distinguishing between the two classes with >97% accuracy <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, based on identification of the TM alpha-helices.</p>
            <p>Because TMB-Hunt puts no emphasis on identification of TM beta-strands, we were not dependent on sequences with resolved structures and training sets could be much larger than those used for other predictors <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. As a result, bbtm proteins with structures more diverse than those used by other predictors were included, resulting in a greater degree of sensitivity. TMB-Hunt is at least as accurate as other predictors, but its major advantage is that it adopts a completely different approach to other methods and is likely therefore to be valuable in consensus approaches, which should be much more successful at hunting for new families of candidate bbtm proteins in diverse proteomes.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <sec>
            <st>
               <p>Training sets</p>
            </st>
            <p>Training sets for bbtm, ahtm and non-TM (ntm) proteins were gathered from a number of manually curated and published sources. The PDB accessions of 3159 ntm proteins were acquired from PDB-REPRDB via the Papia database <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>, and respective sequences were extracted.</p>
            <p>Sequences of ahtm proteins were downloaded from a test set available at the Sanger centre <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Four datasets were available of varying quality. Dataset A comprised 37 sequences where structural information was available. Dataset B contained 23 sequences with very good biochemical characterisation from at least two complementary methods. Dataset C contained 129 sequences with some biochemical characterisation and where annotation was only reliable for part of the sequence. Dataset D contained sequences with no biochemical characterisation and only hydrophobicity or an alignment as a basis for their characterisation. Datasets A, B and C were used.</p>
            <p>Beta-barrel transmembrane protein sequences were downloaded from a number of resources including:</p>
            <p>957 from UniProt <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> using a keyword search for 'Transmembrane' and 'Outer Membrane' and taxonomy filter for only bacteria</p>
            <p>134 from the transporter classification (TC) database <abbrgrp><abbr bid="B35">35</abbr></abbrgrp></p>
            <p>35 extracted from the PDB files of beta-barrel outer membrane proteins in SCOP <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
            <p>All these datasets were manually created and rechecked to ensure no obvious spurious sequences were present. Sequences of less than 120 residues were removed from the training set. Sequences were next grouped into clusters using BLASTclust and a sequence similarity threshold of 23%. Amino acid composition profiles were produced for each group using evolutionary information, as described below. Dataset details are summarised in Table <tblr tid="T1">1</tblr>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Sequence datasets used to generate training sets.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Training dataset</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sources</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Initial number sequences</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Sequences >120 AA</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Size after redundancy removal</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ntm</p>
                     </c>
                     <c ca="left">
                        <p>PDB-REPRDB [32]</p>
                     </c>
                     <c ca="left">
                        <p>3159</p>
                     </c>
                     <c ca="left">
                        <p>2290</p>
                     </c>
                     <c ca="left">
                        <p>1763</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ahtm</p>
                     </c>
                     <c ca="left">
                        <p>Sanger all-alpha membrane datasets A, B and C [33]</p>
                     </c>
                     <c ca="left">
                        <p>189</p>
                     </c>
                     <c ca="left">
                        <p>166</p>
                     </c>
                     <c ca="left">
                        <p>132</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>bbtm</p>
                     </c>
                     <c ca="left">
                        <p>TC-DB [35], Uniprot [34] and PDB [5]</p>
                     </c>
                     <c ca="left">
                        <p>1126</p>
                     </c>
                     <c ca="left">
                        <p>1107</p>
                     </c>
                     <c ca="left">
                        <p>196</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Three training datasets were generated using sequences from various sources. Datasets were filtered for sequences of &lt;120 AA and clustered to remove redundancy.</p>
               </tblfn>
            </tbl>
            <p>The final dataset included numerous types of bbtm protein not included in the training sets of other predictors. Inclusion of such a diverse range of proteins was important as it covers a wide range of evolutionary origins and physicochemical adaptations. TolC, Alpha-hemolysin and the Mycobacterial Porin Family are bbtm proteins with resolved structures, not used by other predictors, either because of their unusual structure or because their structure was resolved after the predictor had been completed. Fimbrial, pili and flagellar associated proteins were also included, as were non-bacterial proteins e.g. the mitochondrial porin (VDAC), plastid bbtm proteins (e.g. OEP24) and chloroplast porins (Toc75).</p>
            <p>Sequences used for proteome screening were downloaded from the NCBI FTP site <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Sequences used for annotation comparison were downloaded via SRS <abbrgrp><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp> from Uniprot <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p><it>k</it>-nearest neighbour algorithm</p>
            </st>
            <p>The <it>k</it>-nearest neighbour algorithm is a simple instance-based learning method for performing general, non-parametric classification <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. Each object or instance (a protein in this case) is associated with a class which can be unknown (class 0), bbtm (1), ahtm (2) or ntm (3). For query proteins of unknown class, predictions are made by using information from a training set of proteins where the class is known. The prediction is made on the basis of a set of <it>k </it>objects from the training set which are most similar (in the sense described below) to the query protein. This technique is thus a local approximation, focusing on the neighbourhood of the query instance. A major advantage of this algorithm is that it is robust to noisy data (given a large dataset), as taking the weighted average of the nearest neighbours smoothes out isolated training instances.</p>
            <p>Proteins are represented by <it>x </it>= (<it>f</it><sub><it>a </it></sub>(<it>x</it>), <it>a </it>&#8712; A; <it>c</it>(<it>x</it>)), where <it>c</it>(<it>x</it>) represents the class <it>c </it>&#8712; {0,1,2,3} as defined above, A is the set of naturally occurring amino acids and <it>f</it><sub><it>a</it></sub>(<it>x</it>) denotes the relative frequency of the amino acid <it>a</it>. The distance between two proteins <it>x</it><sub><it>i </it></sub>and <it>x</it><sub><it>j </it></sub>in this representation is measured by the standard Euclidean metric.</p>
            <p>
               <graphic file="1471-2105-6-56-i1.gif"/>
            </p>
            <p>Given a query protein <it>x</it><sub><it>q</it></sub>, the algorithm first finds the <it>k </it>closest instances in the training set according to this metric, and then assigns a score S(<it>x</it><sub><it>q</it></sub>, <it>c</it>) for each possible class <it>c</it>,</p>
            <p>
               <graphic file="1471-2105-6-56-i2.gif"/>
            </p>
            <p>where <it>&#948;</it>(<it>c</it><sub>1</sub>, <it>c</it><sub>2</sub>) = 1 if the classes <it>c</it><sub>1 </sub>and <it>c</it><sub>2 </sub>are equal and zero otherwise. Thus the score for each class is a sum of positive contributions from each of the nearest neighbours from that class, where the contribution is weighted according to the reciprocal square distance between query instance and neighbour with closer neighbours contributing more strongly.</p>
            <p>Since we are very often concerned with binary classification problems (e.g. distinguishing bbtm proteins from proteins in any other class), it is also useful to define a discrimination score,</p>
            <p>
               <graphic file="1471-2105-6-56-i3.gif"/>
            </p>
            <p>which is the score from one class (e.g. bbtm proteins) minus the scores from other classes.</p>
         </sec>
         <sec>
            <st>
               <p>Calibration and scoring</p>
            </st>
            <p>In making predictions a standard nearest neighbour algorithm would simply predict the class of <it>x</it><sub><it>q </it></sub>to be the class <it>c </it>with the highest score <it>S</it>(<it>x</it><sub><it>q</it></sub>, <it>c</it>). However, this procedure is problematical in cases such as this where the training set is unbalanced, containing many more ntm proteins than either of the other two classes. Statistical chance means that the <it>k</it>-nearest neighbour sets tend to contain more proteins from the dominant class, leading to this class as the dominant prediction even in the presence of substantial evidence for membership of one the other classes in the nearest neighbour set. One approach to this problem would be to reduce representation of the dominant class to produce a balanced training set, but this procedure involves wasting useful information. It would also be possible to down-weight information from the dominant class, but we found that a more effective approach was to use the distributions of <it>D</it>(<it>x</it>,<it>c</it>) scores in the training set proteins, divided between proteins in class <it>c</it>, and proteins in other classes from which they are to be distinguished. For clarity, in the remainder of this section we will consider <it>c </it>= 1, where the classification problem is to distinguish bbtm proteins from any others, and <it>D </it>will denote the discrimination score <it>D</it>(<it>x</it>,<it>c </it>= 1) for an arbitrary protein <it>x</it>.</p>
            <p>Empirical cumulative probability distributions for <it>D </it>in the case above are shown in Figure <figr fid="F1">1</figr>. As expected, plots showed a higher mean discrimination score for bbtm (mean = 0.078, standard deviation = 0.115) than other proteins (mean = -0.206, standard deviation = 0.171). These distributions do not deviate significantly from the normal distribution. Using these distributions it is possible to convert discrimination scores into a convenient log likelihood ratio (beta-barrel score),</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Probabilities used for development of a calibrated score</p>
               </caption>
               <text>
                  <p><b>Probabilities used for development of a calibrated score. </b>Probability (y-axis), p(D'&#8805;D), for observing a score D' greater than or equal to D (x-axis) for either bbtm (&#9632;) or ntm (&#9650;) proteins. Plots were made by calculating the frequencies of bbtm and ntm proteins identified above certain discrimination scores (using weighted amino acids, no evolutionary information and a 'leave homologues out' cross-validation).</p>
               </text>
               <graphic file="1471-2105-6-56-1"/>
            </fig>
            <p><it>R</it>(D) = log(<it>p</it>(<it>bbtm</it>|<it>D</it>)/<it>p</it>(<it>other</it>|<it>D</it>)),</p>
            <p>where <it>p</it>(<it>bbtm</it>|<it>D</it>) denotes the probability of a bbtm protein obtaining a score of <it>at least D</it>, and <it>p</it>(<it>other</it>|<it>D</it>) denotes the probability of a protein from the other class obtaining a score of <it>D or greater</it>. Negative values of <it>R </it>indicate a query protein more likely to come from the other class, and positive values indicate a protein more likely to come from the bbtm class.</p>
            <p>An alternative probabilistic interpretation of the <it>D </it>score is the expected number of proteins from the other class scoring <it>D </it>or greater, <it>E</it>(<it>D</it>) = <it>Np</it>(<it>other</it>|<it>D</it>), where N indicates the number of query sequences tested. This measure takes account of the multiple testing involved in screening large numbers of sequences in a genome, and is related to the standard Bonferroni correction. It is directly analogous to the E-values reported by the popular sequence search programs FASTA <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> and BLAST <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Differential dimension weightings</p>
            </st>
            <p>To account for some dimensions contributing information more valuable to classification than others, weights were applied to each of the dimensions used in calculating Euclidean distances. The modified Euclidean distance calculation was:</p>
            <p>
               <graphic file="1471-2105-6-56-i4.gif"/>
            </p>
            <p>where <it>g</it><sub><it>a </it></sub>is the weight applied to amino acid <it>a</it>.</p>
            <p>A genetic algorithm was employed to calculate the optimal weightings for each dimension. Genetic algorithms are an optimisation approach, based on Darwinian principles, which assume that given a population of individuals, environmental pressures cause natural selection thus increasing the overall fitness of the population <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Application of a genetic algorithm requires a population of solutions, termed chromosomes, whose fitness can be measured using an objective function. Based on fitness, the better candidates are chosen to seed the next generation through a combination of crossover and/or mutation. This will result in the evolution of successively better solutions. The process is carried out until an optimal solution or time limit is reached.</p>
            <p>The algorithm initiates by constructing a random population of chromosomes (i.e. potential solutions), represented as vectors, with each element of the vector termed a gene, representing a weight for a particular dimension of the Euclidean space. Fitness for chromosomes was measured by the Matthews Correlation Coefficient (MCC) value returned from a 'leave homologues out' cross-validation analysis (see below) using a fixed set of 100 bbtm proteins and 100 ntm proteins. Once fitness for each of the chromosomes within a generation was determined, the fittest were used to create offspring through a process of crossover and mutation. Crossovers involve the construction of a new vector, using random genes taken from two or more parents. Mutations involved randomly mutating 1 in 8 genes.</p>
         </sec>
         <sec>
            <st>
               <p>Inclusion of evolutionary information</p>
            </st>
            <p>Random noise in amino acid composition was reduced by inclusion of evolutionary information. Evolutionary information was included by building a feature vector using both the query sequence, as well as a number of close homologues (as determined by a BLAST query against Uniprot/SwissProt with an E-value threshold of 0.0001, and a maximum of 25 homologues) to calculate an average amino acid composition vector for the sequence and its close evolutionary relatives. A weighted average composition was used, with more distant homologues contributing more to the average (since the more distant sequences contain more new information). Weights were assigned by first carrying out all-against-all alignments within the set using BLAST, then weighting sequences according to their average distance to other sequences. The weights were calculated as</p>
            <p>
               <graphic file="1471-2105-6-56-i5.gif"/>
            </p>
            <p>where <it>W</it><sub><it>k </it></sub>denotes the weight applied to sequence <it>k</it>, and <it>p</it><sub><it>k </it></sub>the average percentage difference (100 minus the percentage identity) from sequence <it>k </it>to other sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Performance</p>
            </st>
            <p>Cross-validation studies were used to assess performance. Two approaches were used, 'leave-one out' cross-validations and 'leave-homologues out' cross-validations. The first of these methods involved removing in turn profiles from the training set and seeing if the algorithm could correctly reassign one of the sequences used to build the profile. Removal of profiles and their construction using sequences in clusters of >23% identity meant that sequences should not then be correctly reassigned due to 'self-detection' by a close homolog. However, even sequences of &lt;23% identity can be homologues and show significant similarity e.g. over shorter fragments of the sequence, therefore a 'leave homologues out' cross-validation was used as a stricter alternative. This meant pre-computing sequences similar (with a BLAST E-value threshold &lt;1) to each query sequence, and leaving these out of the training set when testing. This procedure eliminates any homolog whose sequence is sufficiently similar to be detected with BLAST.</p>
            <p>Performance was measured using sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy and MCC, which are defined in terms of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN).</p>
            <p>Sensitivity is a measure of the percentage of bbtm proteins correctly classified and is calculated with, 100*TP/(TP+FN). Specificity is the percentage of non-bbtm correctly classified as is calculated as 100*TN/(TN+FP). The PPV is the percentage of predicted bbtm proteins that are correct and is calculated by, 100*TP/(TP+FP). The NPV is the percentage of predicted non-bbtm proteins that are correct and is calculated using 100*TN/(TN+FN). Accuracy is a measure of the total number of correctly assigned proteins and is measured by, 100*(TP+TN)/<it>t</it>, where <it>t </it>is the total number of sequences queried. However this statistic can be misleading in circumstances with bias in the test set composition. Therefore, the Matthews Coefficient Correlation (MCC) is an alternative measure that accounts for both under and over predictions.</p>
            <p>
               <graphic file="1471-2105-6-56-i6.gif"/>
            </p>
            <p>This returns a value between -1 and 1, with 1 meaning everything is correctly assigned and -1 meaning everything is incorrectly assigned. Given two prediction classes (e.g. bbtm and ntm) and a random probability of assigning queries to either, a score of 0 would be expected by random classification.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>TMB-Hunt uses a <it>k</it>-Nearest Neighbour (<it>k</it>-NN) algorithm to classify query instances, using the class (bbtm, ahtm or ntm) of their nearest neighbours, as defined by differences in amino acid composition. A number of steps were involved in optimisation, including selection of the numbers of neighbours used (<it>k</it>), amino acid weightings and scoring statistics. Once optimised, performance of the program was assessed and it was applied to the screening of several genomes.</p>
         <sec>
            <st>
               <p><it>K</it>-values</p>
            </st>
            <p>An optimal <it>k</it>-value was chosen using a series of cross-validation tests. These were computed with a range of parameters and, consistently, the program found that accuracy showed a weak peak at <it>k </it>= 5 and gradually declined thereafter. However performance was generally insensitive to the precise value of <it>k</it>, with similar performance shown for moderate values &#8805; 5.</p>
         </sec>
         <sec>
            <st>
               <p>Differential amino acid weightings</p>
            </st>
            <p>A genetic algorithm was used to calculate optimal amino acid weightings for differentiating between bbtm and ntm proteins. The results are shown in Figure <figr fid="F2">2</figr>, alongside weights derived from average compositional differences between the classes. Amino acids contributing the most to classification include Cys, Phe, His, Met, Asn, Gln and Thr. Those contributing the least include Glu, Pro and Tyr. The greatest contributing amino acid, Phe contributed 3.76 times more than the lowest, Pro.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Comparison between GA weightings and difference ratios</p>
               </caption>
               <text>
                  <p><b>Comparison between GA weightings and difference ratios. </b>Relationship between GA derived weights for amino acids and weights based simply on average compositional distances between classes.</p>
               </text>
               <graphic file="1471-2105-6-56-2"/>
            </fig>
            <p>Interestingly, these weights did not completely correlate with compositional differences (Figure <figr fid="F2">2</figr>). Phe had the greatest GA weighting, with 0.077, but had a relatively small composition difference between training sets, with corresponding weight 0.042 (ranked 15<sup>th </sup>of 20) and Glu had a fairly large composition difference (ranked 7th) but lower GA weighting (ranked 16th). However, there were some correlations, with Asn, His, Cys and Met ranked 2<sup>nd</sup>, 3<sup>rd</sup>, 4<sup>th </sup>and 5<sup>th </sup>in the GA weightings and 4<sup>th</sup>, 2<sup>nd</sup>, 1<sup>st </sup>and 6<sup>th </sup>respectively in the composition difference rankings.</p>
            <p>Weights significantly differed from those used by Liu <abbrgrp><abbr bid="B21">21</abbr></abbrgrp> who found, using a Fisher's Discrimination Ratio, that the amino acids most useful for distinguishing between beta-strands of globular and membrane proteins were Gly, Val, Ile, Asn, Leu and Cys. These differences can be attributed to the fact that Liu tried to identify differences in strand residues, whereas our method identifies differences in the composition of entire sequences.</p>
         </sec>
         <sec>
            <st>
               <p>Performance</p>
            </st>
            <p>The ability of the program to discriminate between different classes was tested using a 'leave homologues out' cross-validation (see methods) and was defined in terms of PPV, sensitivity and accuracy. Figure <figr fid="F3">3</figr> shows how PPV, sensitivity and accuracy vary over a range of discrimination scores. Performance results are summarised in Tables <tblr tid="T2">2</tblr>,<tblr tid="T3">3</tblr>, with the optimal cut-off point (discrimination score giving the highest accuracy) used. Table <tblr tid="T2">2</tblr> summarises the performance difference between the program with various features, i.e. weighted amino acids and query sequence evolutionary information. Table <tblr tid="T3">3</tblr> describes the ability of the program to discriminate between various protein classes with two different settings. Without inclusion of query sequence evolutionary information, the program was better at discriminating between bbtm and ntm proteins than bbtm and ahtm, with accuracies of 85% and 77.5% respectively. This difference was reduced with the inclusion of query sequence evolutionary information and weighted amino acids, with a prediction accuracy of 92.5% for discrimination between both bbtm and ntm proteins and bbtm and ahtm proteins.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>TMB-Hunt performance over a range of discrimination scores</p>
               </caption>
               <text>
                  <p><b>TMB-Hunt performance over a range of discrimination scores. </b>Accuracy (x), sensitivity (&#9650;) and PPV(&#9632;) of the predictor at range of discrimination score thresholds. The above results were taken for the predictor discriminating between bbtm and non-bbtm proteins, using the 'leave homologues out' cross-validation, with weighted amino acids and evolutionary information for the query sequence. Similar patterns were found with all settings i.e. using weighted amino acids, no evolutionary information, 'leave homologues out' cross-validation and discriminating between bbtm and ntm proteins.</p>
               </text>
               <graphic file="1471-2105-6-56-3"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Program performance using different settings.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>BBTM vs NTM</it>
                           </b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Sensitivity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Specificity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% PPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% NPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Accuracy</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Plain</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>87</p>
                     </c>
                     <c ca="left">
                        <p>86.5</p>
                     </c>
                     <c ca="left">
                        <p>83.7</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Weighted AAs</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>84</p>
                     </c>
                     <c ca="left">
                        <p>91</p>
                     </c>
                     <c ca="left">
                        <p>90.3</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                     <c ca="left">
                        <p>87.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Evolutionary information</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>89</p>
                     </c>
                     <c ca="left">
                        <p>94</p>
                     </c>
                     <c ca="left">
                        <p>93.7</p>
                     </c>
                     <c ca="left">
                        <p>89.5</p>
                     </c>
                     <c ca="left">
                        <p>91.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Evolutionary information + weighted AAs</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>91</p>
                     </c>
                     <c ca="left">
                        <p>94</p>
                     </c>
                     <c ca="left">
                        <p>93.8</p>
                     </c>
                     <c ca="left">
                        <p>91.3</p>
                     </c>
                     <c ca="left">
                        <p>92.5</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Ability of the program to discriminate between bbtm and ntm proteins, using the 'leave homologues out' cross-validation method and with a range of different features. The plain mode indicates neither evolutionary information or weighted amino acids were included.</p>
               </tblfn>
            </tbl>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Ability of program to differentiate between various protein classes.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>A. Plain</it>
                           </b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Sensitivity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Specificity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% PPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% NPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Accuracy</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>bbtm vs ntm</it>
                           </b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>87</p>
                     </c>
                     <c ca="left">
                        <p>86.5</p>
                     </c>
                     <c ca="left">
                        <p>83.7</p>
                     </c>
                     <c ca="left">
                        <p>85</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>bbtm vs ahtm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>83</p>
                     </c>
                     <c ca="left">
                        <p>72</p>
                     </c>
                     <c ca="left">
                        <p>74.8</p>
                     </c>
                     <c ca="left">
                        <p>80.1</p>
                     </c>
                     <c ca="left">
                        <p>77.5</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>B. Evolutionary Information plus weighted AAs</it>
                           </b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Sensitivity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Specificity</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% PPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>%NPV</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% Accuracy</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>bbtm vs ntm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>91</p>
                     </c>
                     <c ca="left">
                        <p>94</p>
                     </c>
                     <c ca="left">
                        <p>93.8</p>
                     </c>
                     <c ca="left">
                        <p>91.3</p>
                     </c>
                     <c ca="left">
                        <p>92.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>bbtm vs ahtm</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>88</p>
                     </c>
                     <c ca="left">
                        <p>97</p>
                     </c>
                     <c ca="left">
                        <p>96.7</p>
                     </c>
                     <c ca="left">
                        <p>88.9</p>
                     </c>
                     <c ca="left">
                        <p>92.5</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>A shows the ability of the program to differentiate between various protein classes without inclusion of evolutionary information or differential amino acid weightings. B shows the improvements given the inclusion of these features. Performance was assessed using the 'leave homologues out' cross-validation.</p>
               </tblfn>
            </tbl>
            <p>Results reported so far have used cross-validations based on removing detectable homologues (BLAST E-value&lt;1) from the training set. The results have shown high accuracy discriminations. This indicates that amino acid composition can be used to identify bbtm proteins. It is not possible to know the extent of very distant homology in the training set, since this is often only apparent when 3D structures are determined. It is not clear therefore whether the good performance we observe results from the detection of distant homologues, or whether the composition signal is a characteristic of many evolutionary unrelated families of bbtm protein. It seems likely that both explanations contribute to the results, which indicate at the very least that composition is an important feature of these proteins that is preserved over long evolutionary distances and may be shared by unrelated bbtm proteins.</p>
            <p>The program was extremely fast, able to query 400 sequences in &lt;1 minute on a 2 Ghz Pentium processor. When using evolutionary information, speed was limited by a BLAST query against Uniprot/Swissprot, and 'all against all' BLAST runs to identify the similarities of homologues. However, even with evolutionary information TMB-Hunt is still faster than Prof-TMB, of a similar speed to Pred-TMBB and only marginally slower than BOMP.</p>
         </sec>
         <sec>
            <st>
               <p>Specific examples</p>
            </st>
            <p>Cross-validation results were reviewed specifically for a number of bbtm proteins that are non-typical, controversial, expressed in membranes other than the outer membrane of gram negative bacteria or for bbtm proteins of gram negative bacteria that have recently been structurally resolved. The aim of TMB-Hunt is identification of novel families of bbtm protein. Unfortunately a fair comparison of the abilities of various predictors to detect novel families is difficult owing to unavoidable uncertainties about training set contents and in some cases (e.g. BOMP) a lack of user control in specificity thresholds. In an attempt to make this comparison we chose examples that for the reasons given should not be well represented in the training sets of other predictors. The ability of TMB-Hunt to identify novel families is given with results coming from cross-validation tests. Table <tblr tid="T4">4</tblr> gives details of prediction results using TMB-Hunt and compares them with three other web-based bbtm protein predictors; BOMP, Prof-TMB, Pred-TMBB.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Comparison of various predictors with specific examples.</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <b>BOMP</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Prof-TMB</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Pred-TMBB</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>TMB-Hunt: Leave Homs Out</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>TMB-Hunt: Leave One Out</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NalP &#8211; Q8GKS5</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>12.32</p>
                     </c>
                     <c ca="left">
                        <p>2.92</p>
                     </c>
                     <c ca="left">
                        <p>10.73</p>
                     </c>
                     <c ca="left">
                        <p>10.73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TSX &#8211; P22786</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>10.92</p>
                     </c>
                     <c ca="left">
                        <p>2.94</p>
                     </c>
                     <c ca="left">
                        <p>4.47</p>
                     </c>
                     <c ca="left">
                        <p>4.47</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>FadL &#8211; P10384</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>9.47</p>
                     </c>
                     <c ca="left">
                        <p>2.88</p>
                     </c>
                     <c ca="left">
                        <p>0.8</p>
                     </c>
                     <c ca="left">
                        <p>0.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>BtuB &#8211; P06129</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>10.39</p>
                     </c>
                     <c ca="left">
                        <p>2.91</p>
                     </c>
                     <c ca="left">
                        <p>10.82</p>
                     </c>
                     <c ca="left">
                        <p>10.82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Secretin &#8211; P31700</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.73<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.90</p>
                     </c>
                     <c ca="left">
                        <p>5.48</p>
                     </c>
                     <c ca="left">
                        <p>5.48</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Usher &#8211; P30130</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>10.46</p>
                     </c>
                     <c ca="left">
                        <p>2.95</p>
                     </c>
                     <c ca="left">
                        <p>10.79</p>
                     </c>
                     <c ca="left">
                        <p>10.79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>60 kDA cysteine rich OMP &#8211; P26758</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.42<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.03<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>-1.70<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>-1.70<sup>&#915;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Mycobacterial Porin &#8211; Q9RLP7</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>5.65<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.84</p>
                     </c>
                     <c ca="left">
                        <p>7.74</p>
                     </c>
                     <c ca="left">
                        <p>7.74</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>TolC &#8211; P02930</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.85<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.90</p>
                     </c>
                     <c ca="left">
                        <p>6.76</p>
                     </c>
                     <c ca="left">
                        <p>10.64</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Alpha hemolysin &#8211; O68404</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>0.83<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.88</p>
                     </c>
                     <c ca="left">
                        <p>9.89</p>
                     </c>
                     <c ca="left">
                        <p>9.89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VDAC &#8211; Q60931</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>6.55</p>
                     </c>
                     <c ca="left">
                        <p>2.88</p>
                     </c>
                     <c ca="left">
                        <p>5.24</p>
                     </c>
                     <c ca="left">
                        <p>5.24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tom40 &#8211; Q18090</p>
                     </c>
                     <c ca="left">
                        <p>1</p>
                     </c>
                     <c ca="left">
                        <p>4.79<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.92</p>
                     </c>
                     <c ca="left">
                        <p>-1.04<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>-1.04<sup>&#915;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Toc75 &#8211; Q43715</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>6.50</p>
                     </c>
                     <c ca="left">
                        <p>2.99<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>-1.41<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>1.24</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>OEP24 &#8211; O49929</p>
                     </c>
                     <c ca="left">
                        <p>0<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>3.11<sup>&#915;</sup></p>
                     </c>
                     <c ca="left">
                        <p>2.87</p>
                     </c>
                     <c ca="left">
                        <p>1.55</p>
                     </c>
                     <c ca="left">
                        <p>1.55</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>All programs were run via their web interfaces, using default settings. Sequences classified as non-bbtm are marked using <sup>&#915;</sup>. BOMP [22] values indicate the number bbtm proteins predicted given the number of sequences queried. Prof-TMB [17] returns a z-score statistic for which 50% of bbtm proteins get a z-score of >= 10 at an accuracy of 80% and 35% bbtm proteins get a z-score >= 6 at an accuracy of 35%. Pred-TMBB [16] returns a threshold score, for which sequences with threshold scores >2.965 are assumed not to be bbtm proteins. Beta-barrel scores, were given for TMB-Hunt. These were calculated without inclusion of evolutionary information, using 'leave homologues out' and 'leave one out' cross-validations. Beta-barrel scores >0 indicate that there is a greater probability that the sequence is from a bbtm protein.</p>
               </tblfn>
            </tbl>
            <p>Pred-TMBB and TMB-Hunt both correctly classified non-typical bbtm proteins TolC <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> (P02930), Alpha-hemolysin <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> (P09616) and the Mycobacterial Porin <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> (Q9RLP7), whilst these were classified as non-bbtm by BOMP and Prof-TMB. The secreted pore-forming toxin, Alpha-hemolysin is difficult to classify because the majority of its beta-strands are non-membrane. Alpha-hemolysin is homoheptameric, with each subunit contributing 2 strands to a 14 strand TM barrel. In addition to the 2 TM strands, each subunit consists of 14 soluble strands which make up a cap and rim domain. The Mycobacterial Porin, has not been included in the training sets of any currently published predictors, because its structure has only recently been resolved <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and because, at 10 nm width, the outer membrane of gram positive Mycobacteria is unlike that of gram negative bacteria at 4 nm width <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. TolC has been a problem in classification because each of the three subunits contributes just 4 strands to the beta-barrel and contains large stretches of alpha-helix.</p>
            <p>To confirm that the predictor was not just selecting proteins destined for the outer membranes of gram negative bacteria, we also tested with a number of mitochondrial and chloroplast bbtm proteins. All the predictors tested were able to correctly classify the mitochondrial porin VDAC (Q9RLP7), but only BOMP and Pred-TMBB classified Tom40 (Q18090) as a bbtm protein. Only Prof-TMB and TMB-Hunt (using the 'leave-one out' cross-validation) classified Toc75 (Q43715) as a bbtm protein and only Pred-TMBB and TMB-Hunt identified OEP24 (O49929).</p>
            <p>All four predictors tested were able to correctly identify proteins with recently resolved structures i.e. Tsx <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> (P22786), FadL <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> (P10384), BtuB <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> (P06129) except BOMP which misclassified NalP <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> (Q8GKS5). BOMP was the only predictor tested which did not classify Secretin <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> (P31700) as a bbtm protein but all four classified the Usher protein <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> (P30130) as bbtm. A 60 kDa cysteine rich outer-membrane protein <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> (P26758), was the only example that was not classified as a bbtm protein by any of the predictors. However the experimental evidence that this is a genuine bbtm protein is weak and it has been suggested that it is falsely annotated <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. It should be noted that PSORT-B 2.0 <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> identified all of these examples as outer membrane proteins, including the 60 kDa rich outer membrane protein. However it classified these using strong homology to sequences within its training set and thus did not give a representation of its ability to predict novel families of bbtm proteins.</p>
            <p>Differences in the prediction results of these algorithms with these examples suggests that combined approaches could result in a higher overall accuracy.</p>
         </sec>
         <sec>
            <st>
               <p>Genome screening</p>
            </st>
            <p>Figure <figr fid="F4">4</figr> demonstrates typical results seen when screening a genome. It demonstrates that due to the large number of sequences queried, a number of sequences get scores with an E-value >1 but a beta barrel score indicative of a bbtm protein (i.e. >0). These sequences are said to be in the 'twilight zone' because it is impossible to classify them as either bbtm or not. To reduce the number of sequences within this zone, sequences without signal peptides were removed. Sequences were accepted if a signal peptide was predicted using SignalP 3.0 with either the Neural Network <abbrgrp><abbr bid="B54">54</abbr></abbrgrp> or HMM <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> modes, so as to minimise the number of potential candidates removed. Similar filtering systems have been applied in previous bbtm protein screening attempts <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B16">16</abbr><abbr bid="B56">56</abbr></abbrgrp>. Signal peptide filtering poses certain risks owing to errors in the prediction of the 5' ends of genes <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> and imperfections in signal peptide prediction algorithms, but these risks are outweighed by the reduction of FP sequences within the twilight zone.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Range of E-values and BB-scores from <it>E. coli </it>screening</p>
               </caption>
               <text>
                  <p><b>Range of E-values and BB-scores from <it>E. coli </it>screening. </b>Sequences with a predicted signal peptide from the proteome of <it>E. coli</it>, were screened using the algorithm described. Sequences were then sorted by their E-values and plotted graphically. The graph demonstrates that in proteome screening with this tool there a number of sequences will be identified with positive bb scores, but E-values >1. Sequences with these scores are described as being in the twilight zone.</p>
               </text>
               <graphic file="1471-2105-6-56-4"/>
            </fig>
            <p>A range of organisms with completed genomes were screened for bbtm proteins, including several bacteria, a protozoan, a fungus, a nematode and an angiosperm. Table <tblr tid="T5">5</tblr> shows the results of proteomes screened. <it>Plasmodium falciparum, Saccharomyces cerevisiae</it>, <it>Caenorhabditis elegans </it>and <it>Arabidopsis thaliana </it>were screened as eukaryotic tests. To date, the only predicted eukaryotic bbtm proteins are those of the mitochondrial and chloroplast outer membranes, however the possibility of other eukaryotic bbtm protein families should not be ignored. Three examples of where they could exist are i) organelles of endosymbiotic bacterial origin other than the mitochondria and chloroplasts e.g. the apicoplast of apicomplexan parasites including the malaria parasite <it>Plasmodium </it><abbrgrp><abbr bid="B57">57</abbr></abbrgrp> or ii) novel double membrane systems e.g. the outer membranes of the parasitic worm schistosomes, which contains two overlaid phospholipid bilayers <abbrgrp><abbr bid="B58">58</abbr></abbrgrp> and iii) toxins e.g. TT95 which is a pore forming molecule produced by the parasitic nematode <it>Trichuris </it><abbrgrp><abbr bid="B59">59</abbr></abbrgrp> but which does not contain any predicted TM helices.</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Proteomes screened.</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Proteins</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>No. signal peptide</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% proteins with signal peptide</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>No. bbtm protein &lt;E = 1</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% of proteins with signal peptide bbtm E&lt;= 1</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>% bbtm proteins &lt;E = 1</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Escherichia coli</p>
                     </c>
                     <c ca="left">
                        <p>5341</p>
                     </c>
                     <c ca="left">
                        <p>1032</p>
                     </c>
                     <c ca="left">
                        <p>19.32</p>
                     </c>
                     <c ca="left">
                        <p>87</p>
                     </c>
                     <c ca="left">
                        <p>8.43</p>
                     </c>
                     <c ca="left">
                        <p>1.63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>E. coli <sup>&#1064;</sup></p>
                     </c>
                     <c ca="left">
                        <p>4005</p>
                     </c>
                     <c ca="left">
                        <p>782</p>
                     </c>
                     <c ca="left">
                        <p>19.52</p>
                     </c>
                     <c ca="left">
                        <p>69</p>
                     </c>
                     <c ca="left">
                        <p>8.82</p>
                     </c>
                     <c ca="left">
                        <p>1.72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Pseudomonas aeruginosa</p>
                     </c>
                     <c ca="left">
                        <p>5567</p>
                     </c>
                     <c ca="left">
                        <p>1142</p>
                     </c>
                     <c ca="left">
                        <p>20.51</p>
                     </c>
                     <c ca="left">
                        <p>137</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>2.46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P. aeruginosa <sup>&#1064;</sup></p>
                     </c>
                     <c ca="left">
                        <p>5567</p>
                     </c>
                     <c ca="left">
                        <p>1412</p>
                     </c>
                     <c ca="left">
                        <p>25.36</p>
                     </c>
                     <c ca="left">
                        <p>137</p>
                     </c>
                     <c ca="left">
                        <p>9.7</p>
                     </c>
                     <c ca="left">
                        <p>2.46</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Staphylococcus aureus</p>
                     </c>
                     <c ca="left">
                        <p>2632</p>
                     </c>
                     <c ca="left">
                        <p>409</p>
                     </c>
                     <c ca="left">
                        <p>15.54</p>
                     </c>
                     <c ca="left">
                        <p>18</p>
                     </c>
                     <c ca="left">
                        <p>4.4</p>
                     </c>
                     <c ca="left">
                        <p>0.68</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aquifex aeolicus</p>
                     </c>
                     <c ca="left">
                        <p>1560</p>
                     </c>
                     <c ca="left">
                        <p>187</p>
                     </c>
                     <c ca="left">
                        <p>11.98</p>
                     </c>
                     <c ca="left">
                        <p>16</p>
                     </c>
                     <c ca="left">
                        <p>8.55</p>
                     </c>
                     <c ca="left">
                        <p>1.02</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Chlamydia trachomatis</p>
                     </c>
                     <c ca="left">
                        <p>895</p>
                     </c>
                     <c ca="left">
                        <p>145</p>
                     </c>
                     <c ca="left">
                        <p>16.20</p>
                     </c>
                     <c ca="left">
                        <p>17</p>
                     </c>
                     <c ca="left">
                        <p>11.7</p>
                     </c>
                     <c ca="left">
                        <p>1.89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Thermatoga maritima</p>
                     </c>
                     <c ca="left">
                        <p>1858</p>
                     </c>
                     <c ca="left">
                        <p>265</p>
                     </c>
                     <c ca="left">
                        <p>14.26</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>4.53</p>
                     </c>
                     <c ca="left">
                        <p>0.65</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Trepanoma pallidum</p>
                     </c>
                     <c ca="left">
                        <p>1036</p>
                     </c>
                     <c ca="left">
                        <p>203</p>
                     </c>
                     <c ca="left">
                        <p>19.59</p>
                     </c>
                     <c ca="left">
                        <p>12</p>
                     </c>
                     <c ca="left">
                        <p>5.91</p>
                     </c>
                     <c ca="left">
                        <p>1.16</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Bacteroides thetaiotaomicron</p>
                     </c>
                     <c ca="left">
                        <p>4778</p>
                     </c>
                     <c ca="left">
                        <p>1614</p>
                     </c>
                     <c ca="left">
                        <p>33.78</p>
                     </c>
                     <c ca="left">
                        <p>131</p>
                     </c>
                     <c ca="left">
                        <p>8.12</p>
                     </c>
                     <c ca="left">
                        <p>2.74</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Deinococcus radiodurans</p>
                     </c>
                     <c ca="left">
                        <p>3182</p>
                     </c>
                     <c ca="left">
                        <p>689</p>
                     </c>
                     <c ca="left">
                        <p>21.65</p>
                     </c>
                     <c ca="left">
                        <p>25</p>
                     </c>
                     <c ca="left">
                        <p>3.62</p>
                     </c>
                     <c ca="left">
                        <p>0.76</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Rhodopirellula baltica</p>
                     </c>
                     <c ca="left">
                        <p>7325</p>
                     </c>
                     <c ca="left">
                        <p>1584</p>
                     </c>
                     <c ca="left">
                        <p>20.66</p>
                     </c>
                     <c ca="left">
                        <p>49</p>
                     </c>
                     <c ca="left">
                        <p>3.09</p>
                     </c>
                     <c ca="left">
                        <p>0.67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Plasmodium falciparum <sup>&#1064;</sup></p>
                     </c>
                     <c ca="left">
                        <p>9178</p>
                     </c>
                     <c ca="left">
                        <p>1613</p>
                     </c>
                     <c ca="left">
                        <p>17.57</p>
                     </c>
                     <c ca="left">
                        <p>3</p>
                     </c>
                     <c ca="left">
                        <p>0.18</p>
                     </c>
                     <c ca="left">
                        <p>0.03</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Arabidopsis thaliana</p>
                     </c>
                     <c ca="left">
                        <p>28860</p>
                     </c>
                     <c ca="left">
                        <p>5569</p>
                     </c>
                     <c ca="left">
                        <p>19.30</p>
                     </c>
                     <c ca="left">
                        <p>23</p>
                     </c>
                     <c ca="left">
                        <p>0.41</p>
                     </c>
                     <c ca="left">
                        <p>0.07</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Caenorhabditis elegans <sup>&#1064;</sup></p>
                     </c>
                     <c ca="left">
                        <p>22561</p>
                     </c>
                     <c ca="left">
                        <p>5778</p>
                     </c>
                     <c ca="left">
                        <p>22.60</p>
                     </c>
                     <c ca="left">
                        <p>26</p>
                     </c>
                     <c ca="left">
                        <p>0.45</p>
                     </c>
                     <c ca="left">
                        <p>0.12</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Saccharomyces cerevisiae</p>
                     </c>
                     <c ca="left">
                        <p>5866</p>
                     </c>
                     <c ca="left">
                        <p>651</p>
                     </c>
                     <c ca="left">
                        <p>11.09</p>
                     </c>
                     <c ca="left">
                        <p>4</p>
                     </c>
                     <c ca="left">
                        <p>0.61</p>
                     </c>
                     <c ca="left">
                        <p>0.07</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Several proteomes were screened, representing the major kingdoms of life. Proteomes were first filtered for sequences with signal peptides. Remaining sequences were then each queried, returning bb scores and E-value statistics. All proteomes were downloaded from the NCBI FTP site except those denoted <sup>&#1064;</sup>, downloaded from Uniprot/SwissProt for superior annotation.</p>
               </tblfn>
            </tbl>
            <p>Screening eukarotic genomes for bbtm proteins is a more complex process than with prokaryotes owing to larger numbers of sequences queried and a wider range of targeting signals. TMB-Hunt is able to identify mitochondrial and chloroplast outer membrane bbtm proteins (Table <tblr tid="T4">4</tblr>), but these were missed during eukaryotic genome screening due to prior removal of sequences without signal peptides. Owing to the wide range of eukaryotic protein targeting pathways, eukaryotic sequences should ideally be screened without prior filtering, however this would result in much larger numbers of sequences within the twilight zone. Another alternative would be an addition to the score whenever targeting signals are detected.</p>
            <p>TMB-Hunt did not predict many bbtm proteins in eukaryotes; 3 with an E-value &lt;1 in <it>P. falciparum </it>(0.03% of all proteins screened), 4 in <it>S. cerevisiae </it>(0.07%), 23 in <it>Arabidopsis thaliana </it>(0.07%) and 26 in <it>C. elegans </it>(0.1%), with the majority of selected sequences in <it>A. thaliana </it>and <it>C. elegans </it>being closely related and described as hypothetical or putative proteins. Only 1 eukaryotic protein got an E-value &lt;0.1, a <it>P. falciparum </it>gene annotated as a serine protease with an E-value of 0.032.</p>
            <p>The mean percentage of proteins in Gram negative bacterial proteomes, with an E-value &lt;1, was 1.37%, with a range of 0.65&#8211;2.46%. The figure was highest in proteobacteria, possibly reflecting biases in the training set, with homologies to training instances enabling statistically significant scores (E-values) for many sequences. However given that the numbers of bbtm proteins in various bacterial phyla is not known, it may be that these results reflect true figures. Previous results <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> identified smaller numbers of bbtm proteins in some genomes e.g. <it>Aquifex aeolicus</it>, <it>Thermatoga maritima </it>and <it>Trepanoma palidium </it>although the numbers of sequences screened were not given.</p>
            <p><it>Escherichia coli O157:H7 </it>proteins downloaded from Uniprot were screened in order to compare results with high quality annotation (Figure <figr fid="F5">5</figr>). In total, 249 sequences got a positive beta barrel score when, given the number of sequences queried, 133 would be expected. Thus assuming the remaining 116 sequences are genuine bbtm proteins, the proteome contains (116/4005) &#215; 100 = 2.896% bbtm proteins (a number consistent with other predictions). Of these 249 sequences, 69 had an E-value&lt;1, that is 1.72% of all proteins queried. These 69 included 15 proteins described as outer membrane and TM, 40 hypothetical or putative bbtm proteins described as probable OMPs or with homology to OMPs, 6 hypothetical proteins without homology to well annotated proteins, 4 flagellar proteins, 3 lipoproteins and 1 well known ahtm protein. The 15 proteins described as outer membrane and TM should be bbtm proteins and the 40 with homology to OMPs are probably bbtm proteins. The flagellar are possible bbtm proteins as several flagellar proteins are known bbtm proteins. The 6 hypothetical proteins without homology to well annotated proteins possibly represent novel families of bbtm protein. The 3 lipoproteins are non-bbtm proteins and the 1 ahtm protein could be easily filtered using a ahtm protein predictor.</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Uniprot annotation of predicted <it>E. coli </it>bbtm proteins</p>
               </caption>
               <text>
                  <p><b>Uniprot annotation of predicted <it>E. coli </it>bbtm proteins. </b>Numbers of <it>E. coli </it>O157:H7 sequences with a TMB-Hunt E value &lt;= 1 with different categories of annotation in Uniprot.</p>
               </text>
               <graphic file="1471-2105-6-56-5"/>
            </fig>
            <p>TMB-Hunt proved successful in that Uniprot annotation suggests that the vast majority of bbtm proteins (65 of the 69 (>95%)) it predicted were probably bbtm proteins. However, several more probable bbtm proteins were found in the twilight zone, suggesting that this algorithm alone does not infallibly detect all bbtm proteins, even in organisms well represented in the training set. In comparing results with BOMP, we found it rejected the lipoproteins that TMB-Hunt incorrectly classified as bbtm (Q8XBQ1, Q7ABP6, Q7ABA4), whilst correctly classifying a number of proteins annotated as bbtm proteins which were within the TMB-Hunt twilight zone (e.g. Q7AGG6, Q7AY93). However we found that BOMP also incorrectly rejected a large number of annotated bbtm proteins that we classified with an E-value &lt;1 (e.g. Q7AAR4, Q7A9N7). Similar patterns were found with Pred-TMBB and Prof-TMB. These differences are further evidence suggesting that combining algorithms could lead to a higher overall accuracy.</p>
            <p>Because composition is correlated with physicochemical environment <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, TMB-Hunt struggles with differentiation between bbtm proteins and proteins occupying similar environments i.e. lipoproteins and periplasmic proteins. However TMB-Hunt gets a stronger signal from bbtm proteins as they effectively occupy 3 environments, the transmembrane (where there is a preference for amino acids which form TM beta-strands) and either side of it, whereas lipoproteins and periplasmic proteins will occupy only one side of the membrane. The liability of TMB-Hunt is thus different to that of topology based predictors which typically report difficulties in discriminating between beta-strands of bbtm proteins and some globular proteins.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>A program called TMB-Hunt has been described which identifies bbtm proteins using the amino acid composition of entire sequences. TMB-Hunt uses a novel method for calibration of results from the <it>k</it>-NN algorithm and uses evolutionary information from close homologues to build composition profiles. We suggest that these methods can be used to boost the accuracy of other <it>k</it>-NN and composition based classifiers.</p>
         <p>TMB-Hunt was found to have several advantages over existing methods. Firstly, a cross-validation analysis showed performance to be superior to that of other bbtm protein predictors. Secondly, unlike previous predictors which are dependent on TM beta-strand detection, this method does not require resolved structures and thus larger more representative training sets could be used. Thirdly, by adopting a novel approach, we believe that the major benefit of this program is that it has different liabilities to others. This was demonstrated by its ability to correctly classify several proteins with which previous predictors struggled. Finally, it is extremely quick, capable of screening >400 sequences per minute. TMB-Hunt has been successfully applied to the screening of several genomes, however, numerous proteins fell into the twilight zone, where it was impossible to statistically categorise them as either bbtm or not. It is therefore intended that it will be included as part of a consensus approach, which can be used to hunt for novel families of bbtm protein.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p><b>Project name: </b>TMB-Hunt</p>
         <p><b>Project home page: </b>A web server is available at <url>http://www.bioinformatics.leeds.ac.uk/betaBarrel</url>.</p>
         <p><b>Operating system: </b>LINUX</p>
         <p><b>Programming languages: </b>ANSI C and Perl</p>
         <p><b>Other requirements: </b>None</p>
         <p><b>Licence: </b>GPL</p>
         <p><b>Any restrictions to non-academics: </b>None</p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>AA &#8211; Amino acid</p>
         <p>ahtm &#8211; Alpha-helical transmembrane</p>
         <p>bbtm &#8211; Beta-barrel transmembrane</p>
         <p>BLAST &#8211; Basic Local Alignment Search Tool</p>
         <p>GA &#8211; Genetic Algorithm</p>
         <p>HMM &#8211; Hidden Markov Models</p>
         <p><it>k</it>-NN &#8211; <it>k</it>-Nearest Neighbour</p>
         <p>MCC &#8211; Matthews Correlation Coefficient</p>
         <p>ntm &#8211; Non Transmembrane</p>
         <p>OMP &#8211; Outer membrane protein</p>
         <p>PDB &#8211; Protein DataBank</p>
         <p>PPV &#8211; Positive predictive value</p>
         <p>TM &#8211; Transmembrane</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>AGG constructed the datasets, wrote and tested the programs, screened genomes and built the website. AA suggested the project and analyzed the genome screening results. DRW oversaw the construction of the programs and helped develop the methods. All authors have read and approved the final manuscript.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p>TMB-Hunt source code and training sets</p>
            </text>
            <file name="1471-2105-6-56-S1.tar">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional File 2</p>
            </title>
            <text>
               <p>E. coli O157:H7 (Uniprot sequence) query results. Various proteomes screened, examples of results and queries, help files.</p>
            </text>
            <file name="1471-2105-6-56-S2.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank the MRC for funding and three anonymous reviewers for their constructive criticism.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The versatile beta-barrel membrane protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Wimley</snm>
                  <fnm>WC</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <issue>4</issue>
            <fpage>404</fpage>
            <lpage>411</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(03)00099-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12948769</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A 3D model of the voltage-dependent anion channel (VDAC)</p>
            </title>
            <aug>
               <au>
                  <snm>Casadio</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Jacoboni</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Messina</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>De Pinto</snm>
                  <fnm>V</fnm>
               </au>
            </aug>
            <source>FEBS Lett</source>
            <pubdate>2002</pubdate>
            <volume>520</volume>
            <issue>1-3</issue>
            <fpage>1</fpage>
            <lpage>7</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0014-5793(02)02758-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12044860</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Prediction of the plant beta-barrel proteome: a case study of the chloroplast outer envelope</p>
            </title>
            <aug>
               <au>
                  <snm>Schleiff</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Eichacker</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Eckart</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mirus</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Stahl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Soll</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2003</pubdate>
            <volume>12</volume>
            <issue>4</issue>
            <fpage>748</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.0237503</pubid>
                  <pubid idtype="pmpid" link="fulltext">12649433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The structure of a mycobacterial outer-membrane channel.</p>
            </title>
            <aug>
               <au>
                  <snm>Faller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Niederweis</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Schulz</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <issue>5661</issue>
            <fpage>1189</fpage>
            <lpage>1192</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1094114</pubid>
                  <pubid idtype="pmpid" link="fulltext">14976314</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The Protein Data Bank: a computer-based archival file for macromolecular structures.</p>
            </title>
            <aug>
               <au>
                  <snm>Bernstein</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Koetzle</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Williams</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>EFJ</fnm>
               </au>
               <au>
                  <snm>Brice</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Rodgers</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Kennard</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Shimanouchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Tasumi</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1977</pubdate>
            <volume>112</volume>
            <issue>3</issue>
            <fpage>535</fpage>
            <lpage>542</lpage>
            <xrefbib>
               <pubid idtype="pmpid">875032</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Transmembrane proteins in the Protein Data Bank: identification and classification.</p>
            </title>
            <aug>
               <au>
                  <snm>Tusnady</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Dosztanyi</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Simon</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>20</volume>
            <issue>17</issue>
            <fpage>2964</fpage>
            <lpage>2972</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bth340</pubid>
                  <pubid idtype="pmpid" link="fulltext">15180935</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>SCOP: a structural classification of proteins database for the investigation of sequences and structures.</p>
            </title>
            <aug>
               <au>
                  <snm>Murzin</snm>
                  <fnm>AG</fnm>
               </au>
               <au>
                  <snm>Brenner</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Hubbard</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chothia</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1995</pubdate>
            <volume>247</volume>
            <fpage>536</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1995.0159</pubid>
                  <pubid idtype="pmpid" link="fulltext">7723011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>TolC, a macromolecular periplasmic 'chunnel'</p>
            </title>
            <aug>
               <au>
                  <snm>Postle</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Vakharia</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <issue>7</issue>
            <fpage>527</fpage>
            <lpage>530</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76726</pubid>
                  <pubid idtype="pmpid" link="fulltext">10876231</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>beta-Barrel membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Schulz</snm>
                  <fnm>GE</fnm>
               </au>
            </aug>
            <source>Curr Opin Struct Biol</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <issue>4</issue>
            <fpage>443</fpage>
            <lpage>447</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-440X(00)00120-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10981633</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The preference of tryptophan for membrane interfaces</p>
            </title>
            <aug>
               <au>
                  <snm>Yau</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Wimley</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Gawrisch</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>Biochemistry</source>
            <pubdate>1998</pubdate>
            <volume>37</volume>
            <issue>42</issue>
            <fpage>14713</fpage>
            <lpage>14718</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1021/bi980809c</pubid>
                  <pubid idtype="pmpid" link="fulltext">9778346</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information.</p>
            </title>
            <aug>
               <au>
                  <snm>Viklund</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Elofsson</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2004</pubdate>
            <volume>13</volume>
            <issue>7</issue>
            <fpage>1908</fpage>
            <lpage>1917</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.04625404</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215532</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>The beta-barrel finder (BBF) program, allowing identification of outer membrane beta-barrel proteins encoded within prokaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhai</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>MHJ</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>9</issue>
            <fpage>2196</fpage>
            <lpage>2207</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.0209002</pubid>
                  <pubid idtype="pmpid" link="fulltext">12192075</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Toward genomic identification of beta-barrel membrane proteins: composition and architecture of known structures.</p>
            </title>
            <aug>
               <au>
                  <snm>Wimley</snm>
                  <fnm>WC</fnm>
               </au>
            </aug>
            <source>Protein sci</source>
            <pubdate>2002</pubdate>
            <volume>11</volume>
            <issue>2</issue>
            <fpage>301</fpage>
            <lpage>312</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.29402</pubid>
                  <pubid idtype="pmpid" link="fulltext">11790840</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Martelli</snm>
                  <fnm>PL</fnm>
               </au>
               <au>
                  <snm>Fariselli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Casadio</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>S46</fpage>
            <lpage>53</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12169530</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>A HMM-based method to predict the transmembrane regions of beta-barrel membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>YS</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>BH</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>YX</fnm>
               </au>
            </aug>
            <source>Comput Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>27</volume>
            <issue>1</issue>
            <fpage>69</fpage>
            <lpage>76</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0097-8485(02)00051-7</pubid>
                  <pubid idtype="pmpid" link="fulltext">12798041</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>A Hidden Markov Model method, capable of predicting and discriminating beta-barrel outer membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Bagos</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Liakopoulos</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Spyropoulos</snm>
                  <fnm>IC</fnm>
               </au>
               <au>
                  <snm>Hamodrakas</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>BMC Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <issue>1</issue>
            <fpage>29</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">385222</pubid>
                  <pubid idtype="pmpid" link="fulltext">15070403</pubid>
                  <pubid idtype="doi">10.1186/1471-2105-5-29</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Predicting transmembrane beta-barrels in proteomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Bigelow</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Petrey</snm>
                  <fnm>DS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Przybylski</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>8</issue>
            <fpage>2566</fpage>
            <lpage>2577</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419468</pubid>
                  <pubid idtype="pmpid" link="fulltext">15141026</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh580</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Bagos</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Liakopoulos</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Spyropoulos</snm>
                  <fnm>IC</fnm>
               </au>
               <au>
                  <snm>Hamodrakas</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W400</fpage>
            <lpage>4</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441555</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215419</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Neural network-based prediction of transmembrane beta-strand segments in outer membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Gromiha</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Ahmad</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Suwa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Comput Chem</source>
            <pubdate>2004</pubdate>
            <volume>25</volume>
            <issue>5</issue>
            <fpage>762</fpage>
            <lpage>767</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/jcc.10386</pubid>
                  <pubid idtype="pmpid" link="fulltext">14978719</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Prediction of transmembrane regions of beta-barrel proteins using ANN- and SVM-based methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Natt</snm>
                  <fnm>NK</fnm>
               </au>
               <au>
                  <snm>Kaur</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Raghava</snm>
                  <fnm>GP</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>2004</pubdate>
            <volume>56</volume>
            <issue>1</issue>
            <fpage>11</fpage>
            <lpage>18</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.20092</pubid>
                  <pubid idtype="pmpid" link="fulltext">15162482</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Identification of beta-barrel membrane proteins based on amino acid composition properties and predicted secondary structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Zhu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Comput Biol Chem</source>
            <pubdate>2003</pubdate>
            <volume>27</volume>
            <issue>3</issue>
            <fpage>355</fpage>
            <lpage>361</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S1476-9271(02)00085-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">12927109</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>BOMP: a program to predict integral beta-barrel outer membrane proteins encoded within genomes of Gram-negative bacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Berven</snm>
                  <fnm>FS</fnm>
               </au>
               <au>
                  <snm>Flikka</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Jensen</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Eidhammer</snm>
                  <fnm>I</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>W394</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">441489</pubid>
                  <pubid idtype="pmpid" link="fulltext">15215418</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>An optimization approach to predicting protein structural class from amino acid composition</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Chou</snm>
                  <fnm>KC</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1992</pubdate>
            <volume>1</volume>
            <issue>3</issue>
            <fpage>401</fpage>
            <lpage>408</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1304347</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies</p>
            </title>
            <aug>
               <au>
                  <snm>Nakashima</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Nishikawa</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1994</pubdate>
            <volume>238</volume>
            <issue>1</issue>
            <fpage>54</fpage>
            <lpage>61</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1994.1267</pubid>
                  <pubid idtype="pmpid" link="fulltext">8145256</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Prediction of membrane protein types and subcellular locations</p>
            </title>
            <aug>
               <au>
                  <snm>Chou</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Elrod</snm>
                  <fnm>DW</fnm>
               </au>
            </aug>
            <source>Proteins</source>
            <pubdate>1999</pubdate>
            <volume>34</volume>
            <issue>1</issue>
            <fpage>137</fpage>
            <lpage>153</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/(SICI)1097-0134(19990101)34:1&lt;137::AID-PROT11>3.0.CO;2-O</pubid>
                  <pubid idtype="pmpid" link="fulltext">10336379</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Relation between amino acid composition and cellular location of proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Cedano</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Aloy</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Perez-Pons</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Querol</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>266</volume>
            <issue>3</issue>
            <fpage>594</fpage>
            <lpage>600</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1996.0804</pubid>
                  <pubid idtype="pmpid" link="fulltext">9067612</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Adaptation of protein surfaces to subcellular location</p>
            </title>
            <aug>
               <au>
                  <snm>Andrade</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>O'Donoghue</snm>
                  <fnm>SI</fnm>
               </au>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>276</volume>
            <issue>2</issue>
            <fpage>517</fpage>
            <lpage>525</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.1498</pubid>
                  <pubid idtype="pmpid" link="fulltext">9512720</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs.</p>
            </title>
            <aug>
               <au>
                  <snm>Park</snm>
                  <fnm>KJ</fnm>
               </au>
               <au>
                  <snm>Kanehisa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>13</issue>
            <fpage>1656</fpage>
            <lpage>1663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btg222</pubid>
                  <pubid idtype="pmpid" link="fulltext">12967962</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Improved prediction of signal peptides: SignalP 3.0</p>
            </title>
            <aug>
               <au>
                  <snm>Bendtsen</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2004</pubdate>
            <volume>340</volume>
            <issue>4</issue>
            <fpage>783</fpage>
            <lpage>795</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jmb.2004.05.028</pubid>
                  <pubid idtype="pmpid" link="fulltext">15223320</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>mRNA 5' region sequence incompleteness: a potential source of systematic errors in translation initiation codon assignment in human mRNAs.</p>
            </title>
            <aug>
               <au>
                  <snm>Casadei</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Strippoli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>D'Addabbo</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Canaider</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lenzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vitale</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Giannone</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Frabetti</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Facchin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Carinci</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Zannotti</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <issue>321</issue>
            <fpage>185</fpage>
            <lpage>193</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0378-1119(03)00835-7</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Topology prediction for helical transmembrane proteins at 86% accuracy</p>
            </title>
            <aug>
               <au>
                  <snm>Rost</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Fariselli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Casadio</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Protein sci</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <issue>8</issue>
            <fpage>1704</fpage>
            <lpage>1718</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8844859</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB).</p>
            </title>
            <aug>
               <au>
                  <snm>Noguchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Matsuda</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Akiyama</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>1</volume>
            <issue>29</issue>
            <fpage>219</fpage>
            <lpage>220</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1093/nar/29.1.219</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>A collection of well characterised integral membrane proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Moller</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kriventseva</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <issue>12</issue>
            <fpage>1159</fpage>
            <lpage>1160</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.12.1159</pubid>
                  <pubid idtype="pmpid" link="fulltext">11159338</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The Universal Protein Resource (UniProt)</p>
            </title>
            <aug>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Apweiler</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wu</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Boeckmann</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gasteiger</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lopez</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Magrane</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Martin</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>O'Donovan</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Redaschi</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>LS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33 Database Issue</volume>
            <fpage>D154</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">15608167</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>The transporter classification (TC) system, 2002.</p>
            </title>
            <aug>
               <au>
                  <snm>Busch</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Saier</snm>
                  <fnm>MH</fnm>
               </au>
            </aug>
            <source>Crit Rev Biochem Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>37</volume>
            <issue>5</issue>
            <fpage>287</fpage>
            <lpage>337</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10409230290771528</pubid>
                  <pubid idtype="pmpid">12449427</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>The NCBI FTP site,</p>
            </title>
            <url>ftp://ftp.ncbi.nlm.nih.gov/genomes/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The sequence retrieval system:</p>
            </title>
            <url>http://srs.ebi.ac.uk/</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>SRS--an indexing and retrieval tool for flat file data libraries.</p>
            </title>
            <aug>
               <au>
                  <snm>Etzold</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Argos</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1993</pubdate>
            <volume>9</volume>
            <fpage>49</fpage>
            <lpage>57</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8435768</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Nearest neighbour pattern classification</p>
            </title>
            <aug>
               <au>
                  <snm>Cover</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hart</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>IEEE Trans Inform theory</source>
            <pubdate>1967</pubdate>
            <volume>IT-13</volume>
            <issue>1</issue>
            <fpage>21</fpage>
            <lpage>27</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1109/TIT.1967.1053964</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>An algorithm for finding nearest neighbors </p>
            </title>
            <aug>
               <au>
                  <snm>Friedman</snm>
                  <fnm>JH</fnm>
               </au>
               <au>
                  <snm>Baskett</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Shustek</snm>
                  <fnm>LJ</fnm>
               </au>
            </aug>
            <source>IEEE Trans Inform Theory</source>
            <pubdate>1975</pubdate>
            <volume>C-24</volume>
            <issue>10</issue>
            <fpage>1000</fpage>
            <lpage>1006</lpage>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Improved tools for biological sequence comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>WR</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1988</pubdate>
            <volume>85</volume>
            <issue>8</issue>
            <fpage>2444</fpage>
            <lpage>2448</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">280013</pubid>
                  <pubid idtype="pmpid">3162770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Basic local alignment search tool.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Evolutionary computing</p>
            </title>
            <aug>
               <au>
                  <snm>Eiben</snm>
                  <fnm>AE</fnm>
               </au>
               <au>
                  <snm>Schoenauer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Information Processing Letters</source>
            <pubdate>2002</pubdate>
            <volume>82</volume>
            <fpage>1</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0020-0190(02)00204-1</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Structure of staphylococcal alpha-hemolysin, a heptameric transmembrane pore</p>
            </title>
            <aug>
               <au>
                  <snm>Song</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hobaugh</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Shustak</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cheley</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bayley</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Gouaux</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>274</volume>
            <issue>5294</issue>
            <fpage>1859</fpage>
            <lpage>1866</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.274.5294.1859</pubid>
                  <pubid idtype="pmpid" link="fulltext">8943190</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>The envelope of mycobacteria.</p>
            </title>
            <aug>
               <au>
                  <snm>Brennan</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Nikaido</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Annu Rev Biochem</source>
            <pubdate>1995</pubdate>
            <volume>64</volume>
            <fpage>29</fpage>
            <lpage>63</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.bi.64.070195.000333</pubid>
                  <pubid idtype="pmpid" link="fulltext">7574484</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Crystal structure of the bacterial nucleoside transporter Tsx</p>
            </title>
            <aug>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Van Den Berg</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Embo J</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <issue>16</issue>
            <fpage>3187</fpage>
            <lpage>3195</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">514505</pubid>
                  <pubid idtype="pmpid" link="fulltext">15272310</pubid>
                  <pubid idtype="doi">10.1038/sj.emboj.7600330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Crystal structure of the long-chain fatty acid transporter FadL</p>
            </title>
            <aug>
               <au>
                  <snm>van den Berg</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Black</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Clemons</snm>
                  <fnm>WMJ</fnm>
               </au>
               <au>
                  <snm>Rapoport</snm>
                  <fnm>TA</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>304</volume>
            <issue>5676</issue>
            <fpage>1506</fpage>
            <lpage>1509</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1097524</pubid>
                  <pubid idtype="pmpid" link="fulltext">15178802</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Substrate-induced transmembrane signaling in the cobalamin transporter BtuB</p>
            </title>
            <aug>
               <au>
                  <snm>Chimento</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Mohanty</snm>
                  <fnm>AK</fnm>
               </au>
               <au>
                  <snm>Kadner</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Wiener</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Nat Struct Biol</source>
            <pubdate>2003</pubdate>
            <volume>10</volume>
            <issue>5</issue>
            <fpage>394</fpage>
            <lpage>401</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nsb914</pubid>
                  <pubid idtype="pmpid" link="fulltext">12652322</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Structure of the translocator domain of a bacterial autotransporter</p>
            </title>
            <aug>
               <au>
                  <snm>Oomen</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Van Ulsen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Van Gelder</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Feijen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tommassen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gros</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Embo J</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <issue>6</issue>
            <fpage>1257</fpage>
            <lpage>1266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">381419</pubid>
                  <pubid idtype="pmpid" link="fulltext">15014442</pubid>
                  <pubid idtype="doi">10.1038/sj.emboj.7600148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Secretins of Pseudomonas aeruginosa: large holes in the outer membrane</p>
            </title>
            <aug>
               <au>
                  <snm>Bitter</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Arch Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>179</volume>
            <issue>5</issue>
            <fpage>307</fpage>
            <lpage>314</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12664194</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Bacterial outer membrane ushers contain distinct targeting and assembly domains for pilus biogenesis</p>
            </title>
            <aug>
               <au>
                  <snm>Thanassi</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Stathopoulos</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Geiger</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Hultgren</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2002</pubdate>
            <volume>184</volume>
            <issue>22</issue>
            <fpage>6260</fpage>
            <lpage>6269</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151958</pubid>
                  <pubid idtype="pmpid" link="fulltext">12399496</pubid>
                  <pubid idtype="doi">10.1128/JB.184.22.6260-6269.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Architecture of the cell envelope of Chlamydia psittaci 6BC</p>
            </title>
            <aug>
               <au>
                  <snm>Everett</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Hatch</snm>
                  <fnm>TP</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1995</pubdate>
            <volume>177</volume>
            <issue>4</issue>
            <fpage>877</fpage>
            <lpage>882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">176678</pubid>
                  <pubid idtype="pmpid" link="fulltext">7532170</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis</p>
            </title>
            <aug>
               <au>
                  <snm>Gardy</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Laird</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Rey</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Walsh</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Ester</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brinkman</snm>
                  <fnm>FS</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>617</fpage>
            <lpage>23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti057</pubid>
                  <pubid idtype="pmpid" link="fulltext">15501914</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Engelbrecht</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brunak</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>von Heijne</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Protein Eng</source>
            <pubdate>1997</pubdate>
            <volume>10</volume>
            <issue>1</issue>
            <fpage>1</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/protein/10.1.1</pubid>
                  <pubid idtype="pmpid" link="fulltext">9051728</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Prediction of signal peptides and signal anchors by a hidden Markov model</p>
            </title>
            <aug>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Krogh</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1998</pubdate>
            <volume>6</volume>
            <fpage>122</fpage>
            <lpage>130</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9783217</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Fishing new proteins in the twilight zone of genomes: the test case of outer membrane proteins in Escherichia coli K12, Escherichia coli O157:H7, and other Gram-negative bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Casadio</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Fariselli</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Finocchiaro</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Martelli</snm>
                  <fnm>PL</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>2003</pubdate>
            <volume>12</volume>
            <issue>6</issue>
            <fpage>1158</fpage>
            <lpage>1168</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1110/ps.0223603</pubid>
                  <pubid idtype="pmpid" link="fulltext">12761386</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>A plastid organelle as a drug target in apicomplexan parasites</p>
            </title>
            <aug>
               <au>
                  <snm>Fichera</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Roos</snm>
                  <fnm>DS</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>390</volume>
            <issue>6658</issue>
            <fpage>407</fpage>
            <lpage>409</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/37132</pubid>
                  <pubid idtype="pmpid" link="fulltext">9389481</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>The ultrastructural architecture of the adult Schistosoma japonicum tegument</p>
            </title>
            <aug>
               <au>
                  <snm>Gobert</snm>
                  <fnm>GN</fnm>
               </au>
               <au>
                  <snm>Stenzel</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>McManus</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>MK</fnm>
               </au>
            </aug>
            <source>Int J Parasitol</source>
            <pubdate>2003</pubdate>
            <volume>33</volume>
            <issue>14</issue>
            <fpage>1561</fpage>
            <lpage>1575</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0020-7519(03)00255-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">14636672</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B59">
            <title>
               <p>Isolation of a gene family that encodes the porin-like proteins from the human parasitic nematode Trichuris trichiura</p>
            </title>
            <aug>
               <au>
                  <snm>Barker</snm>
                  <fnm>GC</fnm>
               </au>
               <au>
                  <snm>Bundy</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1999</pubdate>
            <volume>229</volume>
            <issue>1-2</issue>
            <fpage>131</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(99)00039-6</pubid>
                  <pubid idtype="pmpid">10095112</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
