<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-5-22</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Methodology article</dochead>
      <bibl>
         <title>
            <p>SIGI: score-based identification of genomic islands</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Merkl</snm>
               <fnm>Rainer</fnm>
               <insr iid="I1"/>
               <email>rmerkl@gwdg.de</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Abteilung Molekulare Genetik und Pr&#228;parative Molekularbiologie, Institut f&#252;r Mikrobiologie und Genetik, Georg-August-Universit&#228;t G&#246;ttingen and G&#246;ttingen Genomics Laboratory, Grisebachstr. 8, 37077 G&#246;ttingen, Germany</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>22</fpage>
         <url>http://www.biomedcentral.com/1471-2105/5/22</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2105-5-22</pubid>
               <pubid idtype="pmpid">15113412</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>10</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>03</day>
               <month>3</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>03</day>
               <month>3</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Merkl; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Genomic islands can be observed in many microbial genomes. These stretches of DNA have a conspicuous composition with regard to sequence or encoded functions. Genomic islands are assumed to be frequently acquired <it>via </it>horizontal gene transfer. For the analysis of genome structure and the study of horizontal gene transfer, it is necessary to reliably identify and characterize these islands.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>A scoring scheme on codon frequencies</p>
               <p>
                  <it>Score_G1G2(cdn) = log(f_G2(cdn) / f_G1(cdn))</it>
               </p>
               <p>was utilized. To analyse genes of a species <it>G1 </it>and to test their relatedness to species <it>G2</it>, scores were determined by applying the formula to log-odds derived from mean codon frequencies of the two genomes. A non-redundant set of nearly 400 codon usage tables comprising microbial species was derived; its members were used alternatively at position <it>G2</it>. Genes having at least one score value above a species-specific and dynamically determined cut-off value were analysed further. By means of cluster analysis, genes were identified that comprise clusters of statistically significant size. These clusters were predicted as genomic islands. Finally and individually for each of these genes, the taxonomical relation among those species responsible for significant scores was interpreted. The validity of the approach and its limitations were made plausible by an extensive analysis of natural genes and synthetic ones aimed at modelling the process of gene amelioration.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>The method reliably allows to identify genomic island and the likely origin of alien genes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>A microbial genome is by no means a random agglomeration of genes. In addition to operons clustering functionally related genes, additional signals indicating structure can be detected: Base composition e.g. can vary strand-specifically <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> or the GC-content of a sequence may be correlated with its distance from the origin of replication <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Codon usage can be diversified depending on effects like translational efficiency <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Such parameters as well as the integration of bacteriophages or megaplasmids are responsible for structures perceptible on the genome level.</p>
         <p>In addition, genomic island may result from horizontal gene transfer (HGT), regarded as an additional evolutionary means of biochemical or environmental adaptation <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Microbial genomes contain a varying portion of genes presumably acquired <it>via </it>HGT <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. It was claimed that in some genomes this portion exceeds 20% of the genomic content. To study HGT, various methods based on the analysis of codon or amino acid sequences or the construction of phylogenetic trees were developed; reviewed e.g. in <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Each approach has its individual drawbacks and it might be that each method identifies a separate class of genes acquired in a different period of genome evolution <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Because of the mechanistic implications, the pieces of DNA captured <it>via </it>HGT frequently have a considerable length. Consequently, it has to be expected that a large fraction of alien genes occurs in clusters. This assumption is supported by biological evidence: Genes responsible for pathogenicity are often agglomerated in islands; see <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> and references therein. Huge clusters of genes expanding evolutionary fitness can also be found in non-pathogenic species. An example is the symbiotic island of size 611 kb in the genome of <it>M. loti </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>An exhaustive analysis of genomic islands has several aspects: It consists of the identification of clusters and the interpretation of gene function. For putatively alien genes (pA, acquired <it>via </it>HGT), their likely origin has to be predicted. The most reliable methods (if applied correctly) coping with the latter task rely on the construction and evaluation of phylogenetic trees. However, each such analysis requires the inference of relations within a gene family. For several reasons like the insufficient number of appropriate clades, it is still difficult to extend these phylogenetic studies to each gene of a complete genome. Therefore, methods were developed aimed at the identification of pA genes without the need for computing phylogenetic trees <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. These intrinsic methods assess (if applied to sequences) the composition on DNA or protein level and measure the deterioration from the typical case. One disadvantage of these surrogate methods <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> is that the origin of the open reading frames cannot be predicted.</p>
         <p>In the following, I introduce a novel surrogate method that has the potential of predicting the putative source of a DNA sequence. It relies on the generally accepted assumption that codon usage in phylogenetically related species is similar <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. The algorithm is integrated into the software package named SIGI and is based on scores assessing codon usage in pairwise comparisons and the taxonomic evaluation of results. It will be shown that its sensitivity in identifying genomic islands is comparable to the most advanced methods like hidden Markov models (HMM). The combination of a sensitive detector with cluster analysis as implemented here, results in the reliable identification of islands and allows to reduce the number of false positive predictions. This seems to be a problem in many studies of HGT published so far <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The validity of the predictions is made plausible by an exhaustive statistical analysis based on natural and synthetic genes. These predictions are one function of SIGI. In addition, it identifies gene clusters originating from additional signals like codon usage bias aimed at the optimisation of translational efficiency.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>The following paragraphs are organized as follows: First, the predictive power of the new approach named MPW (see Methods) is compared to methods already introduced in order to validate its ability to find compositionally atypical (CA) genes and genomic islands. Then, the performance of the algorithm in identifying the putative source of genes is studied. Finally, predictions deduced for completely sequenced genomes are presented.</p>
         <sec>
            <st>
               <p>Performance of the scoring system in identifying CA genes</p>
            </st>
            <p>An impressive number of methods for the identification of CA and pA genes was introduced so far; e.g. <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>, see also <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In order to compare the predictive power of methods and to evaluate the new approach, I used as a test set the genes annotated on chromosome two of <it>V. cholerae</it>. This chromosome contains an integron island of size 125.3 kbp, which includes genes VCA0271 to VCA0491 <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. For the analysis, two groups were formed consisting of genes VCA0010 to VCA0230 (group <it>cl</it>) and genes VCA0271 to VCA0491 (group <it>gi</it>). For each gene, codon usage contrast, &#948;* difference, dicodon difference &#8211; as defined in <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> &#8211; and <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>) as described in Methods were determined. These scores were accumulated parameter-wise in pairs of histograms <it>H</it><sub><it>cl </it></sub>and <it>H</it><sub><it>gi</it></sub>. The decision-strength of each parameter was assessed by incrementing a running cut-off <it>c_o</it><sub><it>r </it></sub>and reading from <it>H</it><sub><it>cl </it></sub>and <it>H</it><sub><it>gi </it></sub>the fraction of genes accumulated below <it>c_o</it><sub><it>r</it></sub>. Resulting curves are plotted in figure <figr fid="F1">1</figr>. The experiment clearly shows that the new algorithm outperforms the methods, which evaluate deviation from mean frequencies of codons, dicodons or dinucleotides. In addition, the plot demonstrates that codon usage contrast (in the following abbreviated as CU) is the second best indicator on the test set.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Selectivity of four methods for the identification of compositional atypical genes.</p>
               </caption>
               <text>
                  <p>Selectivity of four methods for the identification of compositional atypical genes. Two sets were analysed consisting of genes VCA0010 to VCA0230 (control group) and genes VCA0271 to VCA0491 (belonging to the integron island) from chromosome two of <it>V. cholerae</it>. For each gene, the indicators codon usage contrast (CU), &#948;* difference, dicodon usage (DC) and <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>) (as introduced here) were determined as described, the values were accumulated set-wise in histograms. Any position on a curve gives on the two axes the fraction of genes below the corresponding cut-off value.</p>
               </text>
               <graphic file="1471-2105-5-22-1"/>
            </fig>
            <p>Lawrence and Ochman <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> have developed a surrogate method that combines analysis of GC-content on the first and third codon position, of synonymous codon usage, of positional homology and of BLAST hits (in the following abbreviated as <it>CA</it><sub><it>LO</it></sub>). The results achieved for the <it>E. coli </it>K-12 genome are available <url>ftp://ftp.pitt.edu/dept/biology/lawrence</url>. In figure <figr fid="F2">2</figr>, for all genes of this genome, <it>GCB </it>scores &#8211; signalling translational efficiency <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> &#8211; were plotted vs. CU-contrast and MPW score values. 402 of the 569 genes annotated as CA with the MPW-approach were also classified as <it>CA</it><sub><it>LO</it></sub>. This number of coincidences is the 4-fold of the fraction expected to occur merely by chance and is much higher than the overlap of any two other methods tested so far; see <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Therefore, it can be deduced that a significant portion of the CA genes were acquired <it>via </it>HGT and that the periods of genomic evolution addressed by the <it>CA</it><sub><it>LO </it></sub>and the MPW approach overlap to a great extend. The plot makes also clear that putatively highly expressed (PHX, see Methods) genes have to be excluded in applying surrogate methods. At least some of the genes identified as compositional atypical with the <it>CA</it><sub><it>LO </it></sub>method were PHX. Because of the highly specific codon usage, it is unlikely that these genes have been acquired <it>via </it>HGT (see figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Plot of <it>GCB</it>-scores versus CU-contrast values for all genes of <it>E. coli </it>K-12 and the classification of compositional atypical genes.</p>
               </caption>
               <text>
                  <p>Plot of <it>GCB</it>-scores versus CU-contrast values for all genes of <it>E. coli </it>K-12 and the classification of compositional atypical genes. For all genes of the genomic data set, the two parameters were determined, converted to z-values and plotted as small dots. A high <it>GCB</it>-score is an indicator for adaptation to translational efficiency. Genes annotated as putatively alien according to the classification <it>CA</it><sub><it>LO </it></sub>and/or by using the MPW approach were labelled. The set <it>CA</it><sub><it>LO </it></sub>AND MPW consists of those genes identified as compositional atypical by both methods.</p>
               </text>
               <graphic file="1471-2105-5-22-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Fraction of CA genes and their location</p>
            </st>
            <p>It is known that the number of pA genes varies significantly among microbial genomes <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Some reasons explaining these differences are the nature and the efficiency of the transformational system or the assortment of the ecological niche the species occupy. In table <tblr tid="T1">1</tblr>, for a number of species the fraction of CA genes is given and compared to values published for HGT. For most of the genomes listed in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, the fractions of genes identified as compositional atypical are similarly high. The MPW approach identified for the genomes of <it>Synechocystis </it>(<it>CA</it><sub><it>LO </it></sub>= 16.6% / MPW = 5.6%) and <it>A. aeolicus </it>(9.6% / 3.3%) a lower and for <it>A. pernix </it>(3.2% / 6.1%) a higher amount of CA genes. For the genomes of <it>M. leprae</it>, <it>T. thermophilus, A. fulgidus, C. acetobutylicum, P. horikoshii, Halobacterium, B. burgdorferi, A. aeolicus </it>and <it>Nostoc</it>, the fraction of CA genes was below 5% and for the genomes of <it>B. melitensis, C. crescentus, M. jannaschii, T. pallidum, C. jejuni, M. thermautotrophicus, M. kandleri, P. aerophilum, C. perfringens, T. elongatus, R. conorii </it>and <it>C. muridarum</it>, it was below 3%. This was also true for the genomes of <it>D. radiodurans </it>and <it>H. pylori</it>, which is in contrast to already published findings <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. In the genomes of <it>N. meningitidis, R. prowazekii, F. nucleatum, Buchnera, M. genitalium, M. pulmonis </it>and <it>U. urealyticum </it>the fraction of CA genes was below 1%. These values correspond well with findings concerning the mosaic structure of genomes: In <it>N. meningitidis </it>only 2.2% of the genome are meningococcus specific <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. In the sequences of many microbial genomes, a noticeable skew in the usage of guanosine and cytosine residues is detectable and frequently used to identify the origin of replication. An extensive survey of genomes based on such methods <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> identified six chromosomes not presenting a significant structure: Those of <it>Nostoc</it>, <it>Synechocystis</it>, <it>Buchnera</it>, <it>R. conorii</it>, <it>B. burgdorferi </it>and <it>A. aeolicus</it>. Clusters of pA genes with deviating codon usage would presumably influence the local GC-content, which is obviously not the case in the considered genomes. The two chromosomes of <it>R. solanacearum </it>(>10% of CA genes each) have both a mosaic structure <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Fraction of compositional atypical genes in microbial genomes. The numbers in the column <it>CU</it><sub><it>LO </it></sub>are as in <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, column MPW gives the fraction of CA genes as determined by the MPW approach described here.</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Species</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b><it>CA</it><sub><it>LO </it></sub>[%]</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>MPW [%]</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Escherichia coli O157:H7</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>15.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Salmonella enterica </it>subsp. Enterica</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Vibrio cholerae </it>chr. II</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>14.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Salmonella typhimurium </it>LT2</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>13.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Streptococcus pneumoniae TIGR4</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>13.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mesorhizobium loti</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>12.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mycoplasma pneumoniae</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>11.6</p>
                     </c>
                     <c ca="center">
                        <p>12.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ralstonia solanacearum </it>megaplasmid</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>12.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Shigella flexneri</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>12.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Escherichia coli </it>K-12</p>
                     </c>
                     <c ca="center">
                        <p>12.8</p>
                     </c>
                     <c ca="center">
                        <p>12.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ralstonia solanacearum </it>chromosome</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>11.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Streptococcus agalactiae</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Streptococcus pyogenes</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Methanosarcina acetivorans</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Listeria innocua</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Corynebacterium glutamicum</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Yersinia pestis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>10.2</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bacillus subtilis</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>7.5</p>
                     </c>
                     <c ca="center">
                        <p>10.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Xanthomonas axonopodis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>9.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Xanthomonas campestris</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>9.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Listeria monocytogenes</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>9.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Lactococcus lactis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>8.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Thermoanaerobacter tengcongensis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>8.5</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Methanosarcina mazei</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>8.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Thermoplasma volcanium</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Staphylococcus aureus</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Sulfolobus tokodaii</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7.7</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Bacillus halodurans</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Vibrio cholerae </it>chr. I</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>7.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Pyrococcus abyssi</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6.8</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Oceanobacillus iheyensis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Sulfolobus solfataricus</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6.3</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Aeropyrum pernix</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>3.2</p>
                     </c>
                     <c ca="center">
                        <p>6.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Chlamydophila pneumoniae</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Thermotoga maritime</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>6.4</p>
                     </c>
                     <c ca="center">
                        <p>6.0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Brucella melitensis</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Chlamydophila pneumoniae </it>AR39</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>6.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Haemophilus influenzae </it>Rd</p>
                     </c>
                     <c ca="center">
                        <p>4.5</p>
                     </c>
                     <c ca="center">
                        <p>5.9</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Synechocystis </it>sp. PCC 6803</p>
                     </c>
                     <c ca="center">
                        <p>16.6</p>
                     </c>
                     <c ca="center">
                        <p>5.6</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Sinorhizobium meliloti</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5.4</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Thermoplasma acidophilum</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5.1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Picrophilus torridus</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5.0</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>The genome of <it>B. subtilis </it>has been assessed using a system of hidden Markov models in order to detect heterogeneities in DNA composition <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. Table <tblr tid="T2">2</tblr> is an extended version summing up these findings and the location of CA clusters identified with the MPW approach. The table demonstrates that both algorithms identify with similar efficiency regions of deviating DNA composition.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Genomic islands in the genome of <it>Bacillus subtilis</it>.</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Function</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>HMM [kb]</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>MPW [kb]</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Repeats [kb]</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Putative Source</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PHX: ribosomal proteins</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>108&#8211;155</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P1 prophage</p>
                     </c>
                     <c ca="center">
                        <p>202&#8211;220</p>
                     </c>
                     <c ca="center">
                        <p>202&#8211;223</p>
                     </c>
                     <c ca="center">
                        <p>202&#8211;213</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacilli</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Surfactin</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>402&#8211;410</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacilli</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P2 prophage</p>
                     </c>
                     <c ca="center">
                        <p>529&#8211;570</p>
                     </c>
                     <c ca="center">
                        <p>529-</p>
                     </c>
                     <c ca="center">
                        <p>555&#8211;567</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>570&#8211;600</p>
                     </c>
                     <c ca="center">
                        <p>-587</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacilli</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P3 prophage</p>
                     </c>
                     <c ca="center">
                        <p>651&#8211;664</p>
                     </c>
                     <c ca="center">
                        <p>653&#8211;664</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacilli</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Site-specific recombinase</p>
                     </c>
                     <c ca="center">
                        <p>738&#8211;747</p>
                     </c>
                     <c ca="center">
                        <p>737&#8211;746</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillus</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>yesJ-yesZ</it>, ABC transporter</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>752&#8211;782</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillales</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multidrug-efflux transporter</p>
                     </c>
                     <c ca="center">
                        <p>818&#8211;822</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>1124&#8211;1130</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P4 prophage</p>
                     </c>
                     <c ca="center">
                        <p>1262&#8211;1270</p>
                     </c>
                     <c ca="center">
                        <p>1275&#8211;1280</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PBSX prophage (1320&#8211;1348)</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>1397&#8211;1399</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>1385&#8211;1424</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>1442&#8211;1447</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>1478&#8211;1482</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P5 prophage</p>
                     </c>
                     <c ca="center">
                        <p>1879&#8211;1891</p>
                     </c>
                     <c ca="center">
                        <p>1879&#8211;1901</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacilli</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>2038&#8211;2041</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P6 prophage</p>
                     </c>
                     <c ca="center">
                        <p>2046&#8211;2073</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>2050&#8211;2060</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SP&#946; prophage</p>
                     </c>
                     <c ca="center">
                        <p>2151&#8211;2286</p>
                     </c>
                     <c ca="center">
                        <p>2152&#8211;2286</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacillales, Chlamydophila, Streptococcus</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Skin prophage</p>
                     </c>
                     <c ca="center">
                        <p>2652&#8211;2701</p>
                     </c>
                     <c ca="center">
                        <p>2652-</p>
                     </c>
                     <c ca="center">
                        <p>2654&#8211;2701</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacilli, Streptococcus</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>P7 prophage</p>
                     </c>
                     <c ca="center">
                        <p>2707&#8211;2756</p>
                     </c>
                     <c ca="center">
                        <p>-2747</p>
                     </c>
                     <c ca="center">
                        <p>2725&#8211;2735</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacilli</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Competence</p>
                     </c>
                     <c ca="center">
                        <p>3253&#8211;3257</p>
                     </c>
                     <c ca="center">
                        <p>3252&#8211;3257</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p><it>Enterobacteriaceae, Bacillus cereus </it>group</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Arsenic resistance regulator</p>
                     </c>
                     <c ca="center">
                        <p>3463&#8211;3467</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3462&#8211;3469</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>PHX: <it>eno, pgm, tpi, pgk, gap</it></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3475&#8211;3482</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>--</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3608&#8211;3634</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Cell wall synthesis</p>
                     </c>
                     <c ca="center">
                        <p>3658&#8211;3685</p>
                     </c>
                     <c ca="center">
                        <p>3658&#8211;3684</p>
                     </c>
                     <c ca="center">
                        <p>3665&#8211;3672</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, <it>Bacilli</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nitrate reductase</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>3819&#8211;3831</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillales</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>yxiQ-yxxG, bglS, deaD</it>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>4009&#8211;4022</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ABC transporter</p>
                     </c>
                     <c ca="center">
                        <p>4123&#8211;4134</p>
                     </c>
                     <c ca="center">
                        <p>4122&#8211;4139</p>
                     </c>
                     <c ca="center">
                        <p>--</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>ABC transporter</p>
                     </c>
                     <c ca="center">
                        <p>4171&#8211;4176</p>
                     </c>
                     <c ca="center">
                        <p>4168-</p>
                     </c>
                     <c ca="center">
                        <p>4170&#8211;4176</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria, gamma subdivision</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Streptothricin, tetracycline, mercury regul.</p>
                     </c>
                     <c ca="center">
                        <p>4184&#8211;4190</p>
                     </c>
                     <c ca="center">
                        <p>-4193</p>
                     </c>
                     <c ca="center">
                        <p>4189&#8211;4190</p>
                     </c>
                     <c ca="left">
                        <p>Bacteria</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Numbers give positions on the chromosome in kb. The values in the columns HMM and Repeats are as from <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The column "Putative Source" lists predictions generated by SIGI.</p>
               </tblfn>
            </tbl>
            <p>In a critical survey, four surrogate methods were compared <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> by studying the intersections of those sets of genes identified as pA by the various methods. It turned out that only the <it>CA</it><sub><it>LO </it></sub>approach and a method based on hidden Markov models tagged the same genes as pA more frequently than expected by change.</p>
            <p>In <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the genomes of two <it>Xanthomonas </it>pathogens were compared: <it>Xanthomonas axonopodis </it>pv. <it>citri </it>(<it>Xac</it>) and <it>Xanthomonas campestris </it>pv. <it>campestris </it>(<it>Xcc</it>). To identify unique genes the authors BLASTed each gene of one genome against all genes of the second one and analysed the hits. For the following comparison those genes were named <it>unique </it>that had no BLAST hit with an E-value &lt; 10<sup>-20 </sup>in the second genome. These data sets were downloaded from <url>http://cancer.lbi.ic.unicamp.br/xanthomonas</url>. As the MPW approach was designed to predict clusters only, <it>unique </it>genes lying isolated were removed. The resulting data sets were compared against the MPW prediction. For <it>Xac</it>, 425 genes were identified as <it>unique</it>; the MPW approach annotated 488 genes as CA. 248 genes were both <it>unique </it>and CA. In the <it>Xcc </it>data set consisting of 4240 genes, 340 had the attribute <it>unique </it>and 454 the attribute CA, 213 genes had both attributes. If these attributes were completely unrelated, one would expect for <it>Xcc </it>the following number <it>n </it>of genes with both attributes: <it>n </it>= (454/4240 &#215; 340/4240) &#215; 4240 = 36. In both genomes, the number of CA genes labelled as <it>unique </it>is more the five times the expected value.</p>
            <p>Altogether, these findings support the notion that the MPW approach identifies to a great extend the same class of genes as hidden Markov models or the <it>CA</it><sub><it>LO </it></sub>method. An example for a summary view of SIGI's output is given in figure <figr fid="F3">3</figr>.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Summary view of SIGI's annotation for the genome of <it>S. agalactiae</it>.</p>
               </caption>
               <text>
                  <p>Summary view of SIGI's annotation for the genome of <it>S. agalactiae</it>. Each symbol labels a single gene (product). Meaning of the characters: "R" tRNA gene, "x" or "X" two levels of bias in putatively highly expressed genes, "I" integrase, "T" transposase, "H" hypothetical protein identified as CA, "G" a gene annotated with a function and identified as CA, "." a gene classified as insuspicious.</p>
               </text>
               <graphic file="1471-2105-5-22-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Identification of ameliorating genes and excluding false predictions</p>
            </st>
            <p>Beginning with the acquisition of an alien gene, its codon usage will be modulated depending on selective constraints and mutational pressure affecting the recipient's genome. This process was named amelioration <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. One might argue that the codon usage of an ameliorating gene differs significantly from both the donor's and the acceptor's codon frequencies and may thus cause false predictions. In order to test the robustness of the MPW approach with respect to the amelioration process, synthetic genes consisting of random codon sequences of different length between 100 and 500 codons were generated. Each test set consisted of 500 sequences. Codons were selected randomly according to the frequency values as deposited in the <it>CUTG_RF </it>database (see Methods). For each test set, two species <it>REC </it>(recipient) and <it>DON </it>(donor) and a value <it>FRAC </it>(0.0 &#8804; <it>FRAC </it>&#8804; 1.0) were chosen. Codons were drawn according to the frequency tables <it>CDN</it><sub><it>REC </it></sub>or <it>CDN</it><sub><it>DON</it></sub>. <it>FRAC </it>determined, how often <it>CDN</it><sub><it>DON </it></sub>was selected as a source for the determination of codons frequencies. For the analysis described below, for each combination of a donor and an acceptor, nine data sets were generated according to the <it>FRAC</it>-values 0.0, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, 1.0. These test sets served as a crude model for the amelioration process of genes originating from the donor in the recipient's genome. Test sets were generated for four species. <it>M. loti </it>(GC-content 63%), <it>E. coli </it>K-12 (GC-content 52%), <it>P. horikoshii </it>(GC-content 42%) and <it>C. acetobutylicum </it>(GC-content 31%) were used as acceptors and different genomes were selected as donors. The donors were chosen individually, according to phylogenetic relation and their similarity in codon usage with respect to the acceptor. Table <tblr tid="T3">3</tblr> lists these combinations used to generate synthetic genes.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Combinations of acceptor and donor species for the generation of random sequences mimicking the amelioration process.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Acceptor</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GC-Content [%]</b>
                        </p>
                     </c>
                     <c cspan="8" ca="center">
                        <p>
                           <b>Donor</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>I</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>II</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>III</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>IV</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>V</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VI</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VII</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VIII</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mesorhizobium loti</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Pseudomonas</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Ralstonia</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Halobacterium</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Chloroflexus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Corynebacterium</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Thermotoga</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Staphylococcus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Fusobacterium</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>denitrificans 2.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>solanacearum 4.5</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>salinarum 6.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>aurantiacus 7.6</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>glutamicum 9.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>maritima 11.0</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>suis 17.8</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>nucleatum 24.6</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Escherichia coli </it>K-12</p>
                     </c>
                     <c ca="center">
                        <p>52</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Synechococcus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Methanosarcina</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Thermotoga</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Sinorhizobium</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Ralstonia</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Thermus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Streptomyces</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5.0</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>circulans 4.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>acetivorans 6.5</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>maritima 9.0</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>meliloti 9.6</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>solanacearum 13.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>thermophilus 14.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>natalensis 15.5</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Pyrococcus horikoshii</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>42</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Butyrivibrio</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Sulfolobus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Aquifex</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Pyrobaculum</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Mesorhizobium</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Myxococcus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Cellulomonas</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fibrisolvens 6.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>islandicus 4.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>aeolicus 5.9</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>aerophilum 7.0</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>circulans 8.2</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>loti 13.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>xanthus 18.4</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fimi 22.5</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Clostridium acetobutylicum</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>31</p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Borrelia</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Methanothermus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Anaplasma</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Neisseria</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Bacillus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Mesorhizobium</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Myxococcus</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>Cellulomonas</it>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>
                           <it>burgdorferi 2.7</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fervidus 3.3</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>phagocytophilum 6.3</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>meningitidis 15.7</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>caldolyticus 17.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>loti 21.1</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>xanthus 26.3</it>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <it>fimi 30.3</it>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The columnAcceptor gives the name of the "accepting" species. Column I to VIII list the names of those species selected as donors and the Manhattandistance to the acceptor's codon frequency table.</p>
               </tblfn>
            </tbl>
            <p>It is known that the mean GC-content at the three codon positions is correlated with the mean GC-content of the genome <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. A significant deviation of the position-specific GC-content from expectation values derived from the mean GC-content of a gene was interpreted as a signal identifying ameliorating genes <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. In order to determine a precise measure for the calculation of position-specific GC-values, 89 microbial genomic data sets were analysed and used for linear regression. The following formulas were deduced:</p>
            <p>GC<sub><it>exp </it>1 </sub>= 0.761 GC<sub><it>mean </it></sub>+ 17.9</p>
            <p>GC<sub><it>exp </it>2 </sub>= 0.481 GC<sub><it>mean </it></sub>+ 15.6</p>
            <p>GC<sub><it>exp </it>3 </sub>= 1.732 GC<sub><it>mean </it></sub>- 33.0</p>
            <p><it>GC</it><sub><it>mean </it></sub>is the mean GC-content of a gene set under study; <it>GC </it><sub><it>exp </it>1 </sub>.. <it>GC</it><sub><it>exp </it>3 </sub>are the expected GC-content values of codon positions 1 to 3. To evaluate the GC composition of a gene, expectation values <it>GC</it><sub><it>exp i </it></sub>as indicated above were derived from its mean GC-content and compared to the position-specific values <it>GC</it><sub><it>i</it></sub>. An indicator already used to signal amelioration <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> is the Manhattan distance <it>GC_dist </it>assessing codon positions 1 and 3:</p>
            <p>
               <graphic file="1471-2105-5-22-i1.gif"/>
            </p>
            <p>The underlying model implies that the mean <it>GC_dist </it>value is larger in the midst of the amelioration interval. The mean deviation should be minimal for newly acquired genes or those having nearly gained the donor's composition. Thus, one might expect that genes with high <it>GC_dist </it>values generate the largest number of false predictions. In table <tblr tid="T4">4</tblr>, which summarizes the results obtained for predicting the putative source of synthetic sequences, only entries having a <it>GC_dist </it>value below the cut-off value <it>AMELI </it>were analysed <it>via </it>the MPW approach. <it>AMELI </it>was incremented from 0.05 to 0.25. A putative source was regarded a wrong prediction, if it was neither a taxon linking the donor with the root of the taxonomic tree nor a taxon linking the acceptor with the root. The presented predictions were derived from the taxonomical relation of the <it>k </it>= 3 highest scoring species (see Methods). Consequently and in the worst case, the term "cellular organism" or the name of a superkingdom was predicted as the putative source. Predictions based on <it>k </it>= 2 pairwise scores were in many cases wrong (data not shown). Interestingly, the number of false predictions is extremely low, if the acceptor's GC-content is above 40% and if the Manhattan distance to the acceptor's codon usage is below 15. The results obtained for <it>FRAC </it>= 0.95 clearly indicate that the algorithm is robust as long as the codon usage is species-specific: Among 24 cases, only the combination of the acceptor <it>M. loti </it>and the donator <it>F. nucleatum </it>generated more than 20% false positive predictions.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>SIGI's performance in predicting the donor genome for synthetic genes modelling the amelioration process.</p>
               </caption>
               <tblbdy cols="10">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Acceptor</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Cut-off <it>AMELI</it></b>
                        </p>
                     </c>
                     <c cspan="8" ca="center">
                        <p>
                           <b>
                              <it>Donor</it>
                           </b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>
                           <b>I</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>II</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>III</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>IV</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>V</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VI</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VII</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>VIII</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="10">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Mesorhizobium loti</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>0.50 7/2</p>
                     </c>
                     <c ca="left">
                        <p>0.50 69/1</p>
                     </c>
                     <c ca="left">
                        <p>0.50 34/11</p>
                     </c>
                     <c ca="left">
                        <p>0.50 26/1</p>
                     </c>
                     <c ca="left">
                        <p>0.50 115/0</p>
                     </c>
                     <c ca="left">
                        <p>0.10 1/2</p>
                     </c>
                     <c ca="left">
                        <p>0.50 54/89</p>
                     </c>
                     <c ca="left">
                        <p>0.50 51/52</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.10</p>
                     </c>
                     <c ca="left">
                        <p>0.75 286/16</p>
                     </c>
                     <c ca="left">
                        <p>0.75 329/3</p>
                     </c>
                     <c ca="left">
                        <p>0.50 82/17</p>
                     </c>
                     <c ca="left">
                        <p>0.50 298/7</p>
                     </c>
                     <c ca="left">
                        <p>0.50 360/7</p>
                     </c>
                     <c ca="left">
                        <p>0.50 399/6</p>
                     </c>
                     <c ca="left">
                        <p>0.50 154/251</p>
                     </c>
                     <c ca="left">
                        <p>0.50 193/265</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="left">
                        <p>0.75 360/23</p>
                     </c>
                     <c ca="left">
                        <p>0.75 359/3</p>
                     </c>
                     <c ca="left">
                        <p>0.50 90/20</p>
                     </c>
                     <c ca="left">
                        <p>0.95 485/8</p>
                     </c>
                     <c ca="left">
                        <p>0.25 58/8</p>
                     </c>
                     <c ca="left">
                        <p>0.50 482/7</p>
                     </c>
                     <c ca="left">
                        <p>0.50 161/259</p>
                     </c>
                     <c ca="left">
                        <p>0.50 207/277</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>0.95 476/13</p>
                     </c>
                     <c ca="left">
                        <p>0.95 486/2</p>
                     </c>
                     <c ca="left">
                        <p>0.95 489/7</p>
                     </c>
                     <c ca="left">
                        <p>0.95 485/8</p>
                     </c>
                     <c ca="left">
                        <p>0.95 497/1</p>
                     </c>
                     <c ca="left">
                        <p>0.95 499/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 499/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 302/198</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Escherichia coli </it>K-12</p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>1.00 32/6</p>
                     </c>
                     <c ca="left">
                        <p>0.50 38/0</p>
                     </c>
                     <c ca="left">
                        <p>0.50 69/1</p>
                     </c>
                     <c ca="left">
                        <p>0.95 10/2</p>
                     </c>
                     <c ca="left">
                        <p>0.75 73/1</p>
                     </c>
                     <c ca="left">
                        <p>0.90 88/3</p>
                     </c>
                     <c ca="left">
                        <p>0.75 45/3</p>
                     </c>
                     <c ca="left">
                        <p>0.75 133/7</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.10</p>
                     </c>
                     <c ca="left">
                        <p>0.95 349/39</p>
                     </c>
                     <c ca="left">
                        <p>0.75 455/1</p>
                     </c>
                     <c ca="left">
                        <p>0.75 454/8</p>
                     </c>
                     <c ca="left">
                        <p>0.50 392/6</p>
                     </c>
                     <c ca="left">
                        <p>1.00 444/4</p>
                     </c>
                     <c ca="left">
                        <p>0.90 455/6</p>
                     </c>
                     <c ca="left">
                        <p>0.50 433/21</p>
                     </c>
                     <c ca="left">
                        <p>0.75 446/27</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="left">
                        <p>1.00 378/69</p>
                     </c>
                     <c ca="left">
                        <p>0.75 484/1</p>
                     </c>
                     <c ca="left">
                        <p>0.75 490/8</p>
                     </c>
                     <c ca="left">
                        <p>0.50 471/8</p>
                     </c>
                     <c ca="left">
                        <p>1.00 496/4</p>
                     </c>
                     <c ca="left">
                        <p>0.90 492/8</p>
                     </c>
                     <c ca="left">
                        <p>0.50 473/27</p>
                     </c>
                     <c ca="left">
                        <p>0.75 469/31</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>0.95 386/44</p>
                     </c>
                     <c ca="left">
                        <p>0.95 499/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 497/2</p>
                     </c>
                     <c ca="left">
                        <p>0.95 497/2</p>
                     </c>
                     <c ca="left">
                        <p>0.95 498/2</p>
                     </c>
                     <c ca="left">
                        <p>0.95 495/5</p>
                     </c>
                     <c ca="left">
                        <p>0.95 499/1</p>
                     </c>
                     <c ca="left">
                        <p>0.95 499/1</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Pyrococcus horikoshii</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>0.75 134/1</p>
                     </c>
                     <c ca="left">
                        <p>0.50 1/0</p>
                     </c>
                     <c ca="left">
                        <p>0.75 5/0</p>
                     </c>
                     <c ca="left">
                        <p>0.50 8/0</p>
                     </c>
                     <c ca="left">
                        <p>0.75 98/2</p>
                     </c>
                     <c ca="left">
                        <p>0.75 58/1</p>
                     </c>
                     <c ca="left">
                        <p>0.50 23/11</p>
                     </c>
                     <c ca="left">
                        <p>0.90 303/84</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.10</p>
                     </c>
                     <c ca="left">
                        <p>0.75 389/1</p>
                     </c>
                     <c ca="left">
                        <p>0.95 358/2</p>
                     </c>
                     <c ca="left">
                        <p>0.75 137/0</p>
                     </c>
                     <c ca="left">
                        <p>0.50 120/1</p>
                     </c>
                     <c ca="left">
                        <p>0.75 455/8</p>
                     </c>
                     <c ca="left">
                        <p>0.75 440/5</p>
                     </c>
                     <c ca="left">
                        <p>0.50 258/153</p>
                     </c>
                     <c ca="left">
                        <p>0.90 381/110</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="left">
                        <p>0.75 411/1</p>
                     </c>
                     <c ca="left">
                        <p>0.95 466/2</p>
                     </c>
                     <c ca="left">
                        <p>0.50 23/0</p>
                     </c>
                     <c ca="left">
                        <p>0.50 177/1</p>
                     </c>
                     <c ca="left">
                        <p>0.75 484/9</p>
                     </c>
                     <c ca="left">
                        <p>0.75 494/6</p>
                     </c>
                     <c ca="left">
                        <p>0.50 324/176</p>
                     </c>
                     <c ca="left">
                        <p>0.90 388/112</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>0.95 498/0</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>0.95 428/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 500/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 494/6</p>
                     </c>
                     <c ca="left">
                        <p>0.95 496/4</p>
                     </c>
                     <c ca="left">
                        <p>0.95 476/26</p>
                     </c>
                     <c ca="left">
                        <p>0.95 460/40</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <it>Clostridium acetobutylicum</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>0.05</p>
                     </c>
                     <c ca="left">
                        <p>1.00 20/7</p>
                     </c>
                     <c ca="left">
                        <p>0.50 5/1</p>
                     </c>
                     <c ca="left">
                        <p>0.75 134/42</p>
                     </c>
                     <c ca="left">
                        <p>0.50 46/37</p>
                     </c>
                     <c ca="left">
                        <p>0.75 157/6</p>
                     </c>
                     <c ca="left">
                        <p>0.50 50/32</p>
                     </c>
                     <c ca="left">
                        <p>0.25 60/20</p>
                     </c>
                     <c ca="left">
                        <p>0.75 121/242</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.10</p>
                     </c>
                     <c ca="left">
                        <p>1.00 60/39</p>
                     </c>
                     <c ca="left">
                        <p>0.00 4/2</p>
                     </c>
                     <c ca="left">
                        <p>0.75 335/137</p>
                     </c>
                     <c ca="left">
                        <p>0.50 309/148</p>
                     </c>
                     <c ca="left">
                        <p>0.75 452/19</p>
                     </c>
                     <c ca="left">
                        <p>0.50 299/167</p>
                     </c>
                     <c ca="left">
                        <p>0.25 312/151</p>
                     </c>
                     <c ca="left">
                        <p>0.75 181/306</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>0.25</p>
                     </c>
                     <c ca="left">
                        <p>1.00 79/44</p>
                     </c>
                     <c ca="left">
                        <p>0.00 7/2</p>
                     </c>
                     <c ca="left">
                        <p>0.75 354/146</p>
                     </c>
                     <c ca="left">
                        <p>0.50 338/162</p>
                     </c>
                     <c ca="left">
                        <p>0.50 472/23</p>
                     </c>
                     <c ca="left">
                        <p>0.25 378/122</p>
                     </c>
                     <c ca="left">
                        <p>0.25 339/161</p>
                     </c>
                     <c ca="left">
                        <p>0.75 188/312</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="left">
                        <p>0.95 66/35</p>
                     </c>
                     <c ca="left">
                        <p>0.95 155/0</p>
                     </c>
                     <c ca="left">
                        <p>0.95 405/95</p>
                     </c>
                     <c ca="left">
                        <p>0.95 495/5</p>
                     </c>
                     <c ca="left">
                        <p>0.95 494/6</p>
                     </c>
                     <c ca="left">
                        <p>0.95 496/4</p>
                     </c>
                     <c ca="left">
                        <p>0.95 470/30</p>
                     </c>
                     <c ca="left">
                        <p>0.95 405/95</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>For each pair of donor and acceptor, the worst case is given for three values of <it>AMELI</it>. Each entry lists the fraction <it>FRAC </it>and the number of correct/incorrect prediction generated for a dataset consisting of 500 sequences. The last line gives the number of correct/incorrect predictions for a <it>FRAC </it>value of 0.95.</p>
               </tblfn>
            </tbl>
            <p>If the random sequences used for the analysis are a proper model for the amelioration process, then the following problem arises: The result suggests that the amelioration of a sequence originating from a species with very dissimilar codon usage causes misleading signals. In addition, the results presented in table <tblr tid="T4">4</tblr> do not show a correlation of the relative number of false positive predictions with the <it>GC_dist </it>value. Therefore, and at least for the data set used here, the interpretation of the <it>GC_dist </it>value was no indicator to identify ameliorating sequences. If ameliorating sequences of that kind were frequent in genomes, then the interpretation of codon usage or signatures as in <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> would be questionable. However, there is one argument that might resolve the dilemma: It was made plausible that the range and the frequency of HGT is constrained by selective barriers <abbrgrp><abbr bid="B15">15</abbr></abbrgrp> and one might expect that a codon usage too dissimilar to the acceptor's one prevents an expression level necessary to guarantee the survival of a gene in the acceptor's genome.</p>
            <p>A second test for the predictive power was based on the analysis of native genes selected in 20 species representing the bacterial and archeal superkingdoms. From the genomes of <it>A. fulgidus, M. acetivorans, T. acidophilum, P. horikoshii, T. maritima, D. radiodurans, C. glutamicum, L. lactis, S. pneumoniae, B. subtilis, E. coli, Y. pestis, H. influenzae, N. meningitidis, H. pylori, A. tumefaciens, M. loti, R. conorii, C. pneumoniae </it>and <it>M. pulmonis</it>, genes annotated by SIGI as putative native ones were extracted. Each data set consisted of more than 200 genes. During the analysis and for all data sets, each of the mentioned species was regarded as the putative acceptor resulting in 20 &#215; 20 individual tests. For all genes, the putative source was predicted individually as described (see Methods). In 14 of the 20 data sets, the highest score identified the source correctly for more than 90% of the genes on the level of the taxonomical family irrespective of the choice of the putative acceptor. Less specific were the results for the data sets extracted from the genomes of <it>L. lactis </it>(72% correct predictions on the family level), <it>S. pneumoniae </it>(81%), <it>C. glutamicum </it>(88%), <it>B. subtilis </it>(88%), <it>Y. pestis </it>(83%) and from genes of chromosome 1 of <it>D. radiodurans </it>(59% correct predictions on the phylum level). Inferring the putative source from three high scoring pairwise comparisons as described below, reduced &#8211; as expected &#8211; the specificity of the taxonomical classification. However, the number of false predictions decreased drastically: For all cases besides <it>D. radiodurans</it>, less than 5% of the sources were misclassified on the level of the taxonomic class. In the worst case, i.e. for <it>D. radiodurans </it>genes, 32% of the predictions were wrong on the phylum level. These findings suggest that codon usage in the extracted gene set of <it>D. radiodurans </it>is unspecific and has to be studied in more detail. In no case, more than 1% false classifications were generated on the level of the superclass. This result indicates that codon usage in Bacteria and Archaea is quite distinct.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting the origin of CA genes</p>
            </st>
            <p>It is known that genomic islands are inhomogeneous in composition and have a mosaic structure, as they are the result of a multistep process <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. For a first analysis of the genomes however, I identified genomic islands annotated relatively homogeneously with respect to the putative donor.</p>
            <p>Each <it>Salmonella </it>genome has approximately 12 fimbria operons frequently involved in virulence <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In <it>S. typhimurium</it>, the operons <it>fim, saf, std </it>and <it>sth </it>were identified as CA, <it>bcf</it>, <it>lpf, stc, stf, sti </it>and <it>stj </it>were silent. In <it>S. enteritidis, sef, stb, std, ste </it>and <it>tcf </it>were identified as CA, <it>bcf, saf, sta, stc, stg </it>and <it>sth </it>were inconspicuous. The integrated phage genomes Gifsy-1, Gifsy-2, Fels-1 and Fels-2 were CA. In many cases, for these islands the taxa "Bacteria" or plasmids are predicted as putative source, indicating an unspecific codon usage. The bias seen in the fimbria operons might be due to the strong selective pressure imposed by the host immune system. Based on the analysis of the <it>ycdB </it>gene, it was claimed that genomic islands of low GC-content were acquired from <it>Lactococcus lactis </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. SIGI gave the following annotations for the most similar sequences of <it>ycdB</it>: AAC75047 of <it>E. coli </it>K-12 predicted as CA originating from <it>Bacilli</it>, second most similar codon usage (hit) <it>Lactococcus</it>, CAD06974 of <it>S. enteritidis </it>CA (<it>Bacilli</it>, first hit <it>Lactococcus</it>), AAL23317 of <it>S. typhimurium </it>was not annotated as CA, SF2054 of <it>S. flexneri </it>CA, (<it>Bacilli</it>, first hit <it>Lactococcus</it>).</p>
            <p>In <it>Salmonella enterica</it>, several CA clusters were identified. In the following list, for some clusters the positions, gene names, coded proteins and putative sources are given: 1004 kb &#8211; 1053 kb contains genes annotated as putative bacteriophage proteins, originating according to SIGI from plasmids or <it>Enterobacteriaceae</it>; 1625 kb &#8211; 1651 kb, <it>ssa </it>genes, coding for a type III secretion system, <it>Enterobacteriaceae</it>; 2118 kb &#8211; 2135 kb, <it>rfb</it>, putative transferases, inhomogeneous codon usage; 2863 kb &#8211; 2900 kb, <it>prg </it>and <it>sip</it>, pathogenicity 1 island effector proteins, <it>spa</it>, surface presentation of antigens, <it>inv</it>, secretory proteins, <it>Enterobacteriaceae</it>; 3830 kb &#8211; 3838 kb, <it>ccm </it>heme exporter protein, <it>Proteobacteria</it>; 3930 kb &#8211; 3941 kb, <it>waa</it>, involved in the lipopolysaccharide core biosynthesis, <it>Enterobacteriaceae</it>; 4403 kb &#8211; 4543 kb, <it>topB</it>, a topoisomerase, <it>pil</it>, <it>vex </it>polysaccaride export, <it>Enterobacteriaceae</it>, plasmids.</p>
            <p>The codon usage of most CA genes identified in <it>Bacillus subtilis </it>(see table <tblr tid="T2">2</tblr>) is unspecific. If SIGI predicts a specific taxon, it is a closely related clade. A similar result was observed for the genome of <it>Escherichia coli </it>O157:H7. 728 genes were identified as CA, the codon usage table causing the highest score was in 137 cases derived from a plasmid, 349 times it was from a species belonging to the gamma subdivision. All prophages and prophage-like elements known to be integrated into the genome <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> were found plus several additional CA clusters. All known pathogenicity islands of <it>V. cholerae </it>are CA, among these were the recently identified islands on chromosome one, named "seventh pandemic islands" <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. For a detailed analysis, see the material deposited on our webserver.</p>
            <p>In the genome of the &#945;-Proteobacterium <it>Caulobacter crescentus</it>, only 2.5% of the genes are CA. Several genes clustered in the area from 621 kb &#8211; 694 kb were predicted as originating from the <it>Rhizobiaceae </it>group: CC0575 coding for a putative beta-lactamase (<it>p1 </it>= 5 &#215; 10<sup>-4</sup>, <it>p2 </it>= 9 &#215; 10<sup>-5</sup>), CC0576, it's product is an asparaginase family protein (<it>p1 </it>= 0.01, <it>p2 </it>= 9 &#215; 10<sup>-5</sup>) or CC0618, <it>cysG </it>coding for a siroheme synthase (<it>p1 </it>= 0.01, <it>p2 </it>= 9 &#215; 10<sup>-5</sup>). The indices <it>p1 </it>and <it>p2 </it>are explained in Methods. A second cluster at 2.90 kb &#8211; 2.95 kb contains the genes for the conjugal transfer protein <it>trbI </it>and several transposases. The codon signature is inhomogeneous, dominated by species of the <it>Rhizobiaceae </it>group.</p>
            <p><it>Haemophilus influenzae </it>Rd is a small, Gram-negative bacterium; the only natural host is human. For 15 genes, the &#947;-Proteobacterium <it>Shewanella </it>was predicted as putative source. 12 of these hits were clustered in the region 1572 kb &#8211; 1590 kb which belongs to an island extending from 1555 kb &#8211; 1595 kb. It contains among genes for hypothetical proteins <it>fepC </it>and genes coding for Mu proteins like <it>muA</it>. Recently, a <it>Shewanella </it>species was identified as human pathogen <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> making the prediction plausible. As the GC-content of <it>H. influenzae </it>is 38%, these predictions have to be interpreted with care (see above, results for ameliorating genes).</p>
            <p>In many cases, restriction-modification enzymes were identified as CA like in <it>Nostoc</it>. A genomic island (3278 kb &#8211; 3289 kb) containing a type 1 restriction modification enzyme follows a tRNA-Ala gene and a transposase. A second enzyme of that type is located in the genomic island (4186 kb &#8211; 4220 kb) following a tRNA-Gly. These genes are predicted as originating from <it>Bacilli </it>and <it>Chlamydophila</it>.</p>
            <p>In the genome of <it>Streptococcus pyogenes</it>, 131 genes are annotated as "phage associated". However, only 50 of these genes were identified as CA. A CA island spans from 884 kb to 895 kb containing a putative methyltransferase and the <it>srt </it>system involved in lantibiotic production; the putative source is diffuse.</p>
            <p>It is known that the genome of <it>Mesorhizobium loti </it>contains a huge symbiosis island (4645 kb &#8211; 5256 kb) of size 611 kb <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. SIGI predicts most of these genes as originating from the <it>Rhizobiaceae </it>group. The hypothetical protein MLR6371 has the codon signature of the beta subdivision. Examples for additional CA clusters are: 341 kb &#8211; 362 kb coding for unknown proteins around a bacteriophage integrase, genes are similar in codon usage to plasmids and <it>Rhizobiaceae</it>; 779 kb &#8211; 827 kb codes several transferases; 843 kb &#8211; 861 kb containing genes for an adenylate cyclase, putatively originating from the <it>Rhizobiaceae </it>group (<it>p1 </it>= 0.04, <it>p2 </it>= 3 &#215; 10<sup>-5</sup>) and <it>rsp</it>, the rhizobiocin secretion system; 2592 kb &#8211; 2610 kb containing a cyclase and a glycosyltransferase gene; 3219 kb &#8211; 3234 kb with genes for a DNA invertase <it>rlgA </it>and an excisionase; 3705 kb &#8211; 3755 kb containing genes for a glycosyltransferase, a DNA polymerase, chloramphenicol-acetyltransferase, heat shock proteins, codon usage most similar to <it>Rhizobiaceae </it>group; 5714 kb &#8211; 5742 kb containing genes for elements of an ABC-transporter, methyltransferases, hydrolases; 6580 kb &#8211; 6681 kb genes for hypothetical proteins, an ABC-transporter, a DNA modification-methylase, a histidine-kinase and a site-specific recombinase, the codon usage is most similar to that of the <it>Rhizobiaceae </it>group.</p>
            <p>For a complete listing of the results, see the material deposited on the web server <url>http://www.g2l.bio.uni-goettingen.de</url>. For each genome, results are available in a tabulated version and a format readable by the gene browser ARTEMIS <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>It was argued that codon usage and atypical GC-content are no reliable indicators for the study of horizontally transferred genes <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. An analysis of positional orthologous genes in <it>E. coli </it>and <it>S. typhimurium </it>came up with a similar result <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Interestingly the genes referred to as being classified as false positives in <it>E. coli </it>K-12 (<it>gloB, gadB, yheB</it>) with the <it>CA</it><sub><it>LO </it></sub>approach were not classified as CA by the MPW method. Definitely, the number of false positive predictions can be reduced by applying a clustering method as introduced here. The risk of missing a large fraction of pA genes should be minimal, as the pieces of transferred DNA have usually a considerably length <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, although there exist exceptions like in the genome of <it>Neisseria </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp>.</p>
         <p>The assumption that surrogate methods might overlook genes acquired by horizontal transfer might be valid for more ancient events, recently acquired genes seem to be detected to a great extend by surrogate methods <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Lawrence and Ochman estimated the age of imported genes <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The conclusion was that most are relatively recent, i.e. acquired within the last few million years; see e.g. <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. This suggests that older imports have been purged from the genomes presumably because these genes did not improve fitness <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. If this argument is valid, there is no need to search for huge amounts of ancient pA genes.</p>
         <p>The highly consistent findings of the HMM and the MPW approach for the <it>B. subtilis </it>genome confirm specificity and sensitivity of the MPW method. However, there might be two problems: Predicting the false donor and amelioration. The most convincing proof for the correctness of SIGI's prediction are concordances with phylogenetic studies. One example of consistent results is the analysis of the <it>ycdB </it>gene presented above. However, in many cases, genes identified as pA with other methods were not part of a CA cluster. This was the case for <it>gltB </it>and <it>ino1 </it>of <it>Thermotoga maritima </it>identified as archeal <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, or the events of HGT described for <it>D. radiodurans </it><abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The analysis of synthetic genes showed that the risk of predicting a false source is high, if the codon usage of the donor is extremely different. There is however biological evidence that such HGT events are rare. Therefore, most of SIGI's predictions are reliable on a statistical level. The analysis of the domain structure of aminoacyl-tRNA synthetases revealed a complex history of HGT events <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. In no genome, the MPW method annotated an aminoacyl-tRNA synthetase as CA. This might indicate the limitations of the approach, which is limited to signals on the codon level.</p>
         <p>The GC-content decreases near the replication terminus of several microbial species. The AT richness of the terminus region could be caused by the replication machinery or the DNA repair system <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. This deviation might be the source for classifying genes incorrectly as CA. In many cases, the GC-content of pathogenicity islands is however lower than the average content &#8211; see examples in <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B31">31</abbr></abbrgrp> &#8211; and it might be that genomic islands were imported preferentially opposite of the origin of replication. In addition, not all genomic islands are AT rich: the area between 1555 kb and 1595 kb in the genome of <it>H. influenzae </it>consists of 40 genes having a GC-content that is higher and 11 genes having a GC-content lower than the mean GC-content of the genome. If GC-content is determined gene-wise, then for 45% of the genomes analysed here, more than 75% of the genes have a lower than mean GC-content, which is in agreement with <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. However, 18% of the genomes harbour in GIs more than 50% of CA genes having a higher GC-content. An extreme case is the genome of <it>S. solfataricus</it>, where 90% of CA genes have a GC-content higher than the mean value of 35%.</p>
         <p>In principle, the MPW should also identify genomic islands whose GC-content is similar to the rest of the genome as long as the codon usage is different. Even a similarity of codon usage as detected in thermophilic bacteria of different clades <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> will not cause false predictions: because of the interpretation of taxonomic relation between hits, the annotation will in these cases be less specific but not false.</p>
         <p>There are several options to improve SIGI: The integration of more codon usage tables and additional indicators like those introduced in <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> or <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> may further enhance its predictive power. Applying models for the amelioration process <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B34">34</abbr></abbrgrp> may allow to "reameliorate" genes and to determine the source of pA genes more specifically. Finally, a statistical model for the MPW approach has to be developed.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>SIGI is able to detect genomic islands with high sensitivity. These areas are also candidates for HGT events. Studying such events, SIGI complements methods based on phylogenetic approaches. The analysis of the taxonomical relation among putative donors makes clear that a simple comparison of codon usage may create misleading predictions.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>A scoring scheme to test relatedness of codon usage</p>
            </st>
            <p>The simplest statistical model describing genes as a sequence of codons assumes that codons occur independently from each other. For this model, the Neyman-Pearson lemma assures that a function of the type</p>
            <p>
               <graphic file="1471-2105-5-22-i2.gif"/>
            </p>
            <p>is optimal to decide, whether <it>gene </it>= <it>start codon</it>, <it>cdn</it><sub>1</sub>, <it>ccdn</it><sub>2</sub>......<it>cdn</it><sub><it>n </it></sub>is a member of the family <it>G1 </it>characterized by codon frequencies <it>f</it><sub><it>G</it>1 </sub>(<it>cdn</it><sub><it>j</it></sub>) or belongs to family <it>G2 </it>having codon frequencies <it>f</it><sub><it>G</it>2 </sub>(<it>cdn</it><sub><it>j</it></sub>). As a result of test theory, it is known that there exists no other function with a decision strength greater than expression (I). Applying the logarithm and normalizing for gene length gives:</p>
            <p>
               <graphic file="1471-2105-5-22-i3.gif"/>
            </p>
            <p>Now <it>h</it>(<it>gene</it>) is the sum of species-specific log-odds scores <it>PW</it><sub><it>G</it>1<it>G</it>2</sub>(<it>cdn</it>) divided by the number of codons constituting the gene. Scores of that type were utilized frequently and are supported by a sound theory <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. Recently it was shown that a similar approach is appropriate to quantify codon usage bias associated with translational efficiency <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>.</p>
            <p>The score values <it>PW</it><sub><it>G</it>1<it>G</it>2</sub>(<it>cdn</it>), which were here always deduced from codon frequencies among synonymous codons, can be used to decide whether codon usage in <it>gene </it>resembles more the prevalences of species <it>G1 </it>or species <it>G2</it>. If <it>gene </it>is from genome <it>G1 </it>and if <it>h</it>(<it>gene</it>) is >> 0 then its codon usage is more similar to <it>G2</it>. Therefore, and if <it>G2 </it>is taxonomically distinct, the gene under study must be regarded an alien gene and genome <it>G2 </it>might be its source. In the study presented here, a putative source was predicted for genes longer than 100 codons only. This lower limit for gene length was introduced in order to reduce statistical variation due to small sampling size.</p>
            <p>As it was one aim of the study to predict the putative source of compositional atypical genes, it was necessary to generate a sufficiently large number of score sets covering most of the possible origins of taxonomically related species. A prerequisite for the calculation of these scores <it>PW</it><sub><it>G</it>1<it>G</it>2 </sub>are codon usage tables. Their compilation was initiated with data sets derived from completely sequenced microbial genomes, which were publicly available. Frequency values <it>f</it><sub><it>G</it>2</sub>(<it>cdn</it><sub><it>j</it></sub>) were determined from those genes not annotated as hypothetical or with a putative function. It is known that codon frequencies in putatively highly expressed (PHX) genes deviate significantly from mean values <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B44">44</abbr></abbrgrp>. For each gene, z-scores were determined for CU contrast <abbrgrp><abbr bid="B18">18</abbr></abbrgrp> and <it>GCB</it>-values <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> (compare figure <figr fid="F2">2</figr>). A gene was regarded as PHX, if the combination of the two scores exceeded a predefined cut-off value. This initial set was supplemented with entries from the CUTG database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> in the version as of Aug. 2002. From this collection, only those microbial entries were accepted that contained more than 6400 codons. If more than one frequency table existed for the same taxonomic species, the data set deduced from the largest number of codons was processed further. After data collection, similarity of codon usage among species was controlled by calculating pairwise a Manhattan-like distance among codon usage tables. This distance values were used to select the final set <it>CUTG_RF </it>of codon frequencies. For all elements of <it>CUTG_RF</it>, it was confirmed that the most similar species on codon usage belongs to the same taxonomic class or superclass. This step and the other precautions mentioned above were introduced in order to guarantee taxonomic relatedness among the entries and to eliminate codon usage tables presumably derived from a non-representative sample of a genome. The collection was supplemented with codon usage tables of plasmids. Altogether <it>CUTG_RF </it>consisted of <it>n</it><sub><it>RF </it></sub>= 371 entries used for the calculation of scores <it>PW</it><sub><it>G</it>1<it>G</it>2</sub>(<it>cdn</it>) at position <it>G2</it>. The codon frequencies <it>f</it><sub><it>G</it>1 </sub>(<it>cdn</it><sub><it>j</it></sub>) of the genome <it>G1 </it>under study were determined from those genes not annotated with the terms "hypothetical" or "putative" and which were no PHX genes.</p>
            <p>For the analysis of each gene of a genome <it>G1</it>, its codon usage was evaluated in a multiple pairwise test (MPW) using <it>n</it><sub><it>RF </it></sub>individual scoring schemes <it>PW</it><sub><it>G</it>1<it>G</it>2</sub>(<it>cdn</it>). The species <it>G2</it><sub><it>max </it></sub>causing the highest score <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>) was considered a putative source, if <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>) exceeded a cut-off value. In order to quantify the statistical relevance of the prediction, two parameters <it>p1 </it>and <it>p2 </it>were introduced. <it>p1 </it>gives the fraction of genes in <it>G1 </it>that achieved a score at least as high as <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>), if evaluated with the scoring scheme <it>PW</it><sub><it>G</it>1<it>G</it>2<it>max</it></sub>(<it>cdn</it>). A <it>p1- </it>value of 0.01 e.g. indicates that 1% of the genes in <it>G1 </it>have a score equal to or greater than <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>) if compared to the codon usage of <it>G2</it><sub><it>max</it></sub>. The second parameter <it>p2 </it>was derived from a taxonomic rating of those <it>k </it>= 2 or 3 species <it>G2</it><sub>1</sub><it>-G2</it><sub><it>k </it></sub>triggering the <it>k </it>largest score values for <it>gene</it>. The basis for the analysis was a taxonomic tree generated by using material obtained from the ftp-server of the NCBI <url>ftp://ftp.ncbi.nih.gov/pub/taxonomy/</url>. The nodes that represent species belonging to the set <it>CUTG_RF </it>were labelled with an indicator. To calculate the parameter <it>p2</it>, the position of the leaves <it>G2</it><sub>1</sub><it>... G2</it><sub><it>k </it></sub>were used to identify the nearest node (ancestor in the taxonomy tree) <it>t</it><sub>1 </sub>that subsumes the <it>k </it>leaves. If <it>t</it><sub>1 </sub>is the ancestor of those <it>n</it><sub><it>t </it></sub>species belonging to <it>CUTG_RF</it>, then the probability <it>p2 </it>of picking by chance <it>k </it>species belonging to the taxonomic group <it>t</it><sub>1 </sub>can be calculated as</p>
            <p>
               <graphic file="1471-2105-5-22-i4.gif"/>
            </p>
            <p>Formula (III) was adapted accordingly, if <it>n</it><sub><it>t</it>1 </sub>was smaller than <it>k</it>. The identification of taxon <it>t</it><sub>1 </sub>and the <it>p</it><sub>2 </sub>value allow to determine the specificity of codon usage. If the high scoring species are taxonomically unrelated, <it>t</it><sub>1 </sub>will be unspecific and <it>p2 </it>relatively large. A specialized codon usage will result in small <it>p</it><sub>2 </sub>values and a more specific taxon.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of CA clusters</p>
            </st>
            <p>The concepts introduced so far allowed to characterize individual genes and to quantify related scores statistically. Now, it was necessary to assess the set of all <it>h</it><sub><it>max</it></sub>(<it>gene</it>) values in order to derive a cut-off which discriminated those values <it>h</it>(<it>gene</it>) > 0 that deviated significantly from expected fluctuations. Because of the focussing on identifying clusters of CA genes, a statistical approach could be utilized to eliminate false positive predictions. To identify genomic islands and to dynamically adapt the cut-off for each genome individually, a two-pass strategy was used. During the first pass, for each gene with number <it>i</it>, all <it>n</it><sub><it>RF </it></sub>scores were determined and <it>h</it><sub><it>MPW</it></sub>(<it>gene</it><sub><it>i</it></sub>) was identified. A text string <it>genome </it>was created according to the following instruction:</p>
            <p>
               <graphic file="1471-2105-5-22-i5.gif"/>
            </p>
            <p>For the string <it>genome </it>the global frequency <it>f</it><sub><it>glob</it></sub><it>(S) </it>was determined and clusters <it>SSSSS </it>indicating a successive sequence of at least five CA genes were localized. These clusters were extended in both directions until the local frequency <it>f</it><sub><it>loc</it></sub><it>(S) </it>fell below the value 2 &#215; <it>f</it><sub><it>glob</it></sub><it>(S)</it>. The <it>h</it><sub><it>MPW</it></sub>(<it>gene</it>)  values of the genes in the extended clusters and the remaining ones were accumulated in two histograms <it>h_cl </it>and <it>h_rem</it>. From the histogram <it>h_cl</it>, the cut-off <it>c_o</it><sub>2 </sub>for round two of the clustering process was derived as the <it>h</it><sub><it>MPW </it></sub>value exceeded by 95% of the values determined for genes in extended clusters. <it>c_o</it><sub>2 </sub>allows to estimate the error of not classifying a CA gene correctly: Applying <it>c_o</it><sub>2 </sub>on <it>h_rem </it>gives the number of genes having a score above this cut-off and not being classified as a CA gene. Using cut-off <it>c_o</it><sub>2</sub>, the clustering algorithm was reinitiated and the genes classified as belonging to extended clusters in round two were annotated as being CA. The cut-off for round one was always set to 0.025, a value deduced from the analysis of chromosome two of <it>V. cholerae </it>(see Results).</p>
            <p>There were several reasons to design the algorithm as described: The main argument for focusing on clusters was a combination of biological evidence and statistical principles that help to increase the reliability of the prediction. First, it is known that genomic islands frequently have a size of 10 &#8211; 200 kb <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Second, if the probability of annotating a gene as CA is <it>p(S) </it>then the probability for a CA cluster of <it>n </it>successive genes is <it>p(S)</it><sup><it>n</it></sup>, if independency is assumed. Thus, for realistic values of <it>p(S) </it>and <it>n </it>it is highly unlikely that such a cluster occurs merely by chance. Even if we consider a large value like <it>p(S) </it>= 0.3, then the probability <it>p(S)</it><sup>5 </sup>for a cluster of size 5 (as assumed above) is &lt; 2.5 &#215; 10<sup>-3</sup>. A rough estimation (1 / <it>p(S)</it><sup>5</sup>) gives that then one among 400 of such clusters occurs merely by chance and is a false positive classification. This situation allows to gain high sensitivity in identifying individual CA genes and to deliberately adjust the cut-off level as described above. As mentioned, the calculation is based on the assumption that the classification of adjacent genes is independent of the context. Assuming independency is a simplistic model, however a rough approximation, if compared to findings in <it>E. coli</it>: 80% of transcription units (which subsume operons) have less than five genes <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <p>The factor 2.0 used in the expression 2 &#215; <it>f</it><sub><it>glob</it></sub><it>(S) </it>for the propagation of extended clusters was inferred from the analysis of the integron island on chromosome two of <it>V. cholerae </it>(see Results). The exact value of this parameter did not critically influence the identification and localization of the integron island (annotation as from <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, data not shown). In general, the algorithm used for the extension of clusters resembles principles implemented in BLAST for the identification of optimal high scoring segment pairs <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Data sets (accession numbers in brackets)</p>
            </st>
            <sec>
               <st>
                  <p>Archaea</p>
               </st>
               <p><it>Archaeoglobus fulgidus </it>(NC_000917), <it>Aeropyrum pernix </it>(NC_000854), <it>Halobacterium </it>sp. NRC-1 (NC_002607), <it>Methanothermobacter thermautotrophicus </it>(NC_000916), <it>Methanocaldococcus jannaschii </it>(NC_000909), <it>Methanosarcina acetivorans </it>(NC_003552), <it>Methanosarcina mazei </it>(AE008384), <it>Pyrococcus abyssi </it>(NC_000868), <it>Pyrococcus horikoshii </it>(NC_000961), <it>Sulfolobus solfataricus </it>(NC_002754), <it>Sulfolobus tokodaii </it>(NC_003106), <it>Thermoplasma acidophilum </it>(NC_002578), <it>Thermoplasma volcanium </it>(NC_002689).</p>
            </sec>
            <sec>
               <st>
                  <p>Bacteria</p>
               </st>
               <p><it>Agrobacterium tumefaciens </it>(AE007869, AE007870), <it>Aquifex aeolicus </it>(NC_000918), <it>Bacillus halodurans </it>(NC_002570), <it>Bacillus subtilis </it>(NC_000964), <it>Brucella melitensis </it>(NC_003317, NC_003318), <it>Borrelia burgdorferi </it>(NC_001318), <it>Buchnera </it>sp. APS (NC_002528), <it>Campylobacter jejuni </it>(NC_002163), <it>Caulobacter crescentus </it>(NC_002696), <it>Chlamydia muridarum </it>(NC_002620), <it>Chlamydia trachomatis </it>(NC_000117), <it>Chlamydophila pneumoniae </it>J138 (NC_002491), <it>Chlamydophila pneumoniae </it>AR39 (NC_002179), <it>Clostridium acetobutylicum </it>(NC_003030), <it>Clostridium perfringens </it>(NC_003366), <it>Corynebacterium glutamicum </it>(NC_003450), <it>Deinococcus radiodurans </it>(Chromosome 1, NC_001263), <it>Escherichia coli </it>K-12 (NC_000913), <it>Escherichia coli </it>O157:H7 EDL933 (NC_002655), <it>Fusobacterium nucleatum </it>(NC_003454), <it>Haemophilus influenzae </it>Rd (NC_000907), <it>Helicobacter pylori </it>26695 (NC_000915), <it>Helicobacter pylori </it>J99 (NC_000921), <it>Lactococcus lactis </it>subsp. Lactis (NC_002662), <it>Listeria innocua </it>(NC_003212), <it>Listeria monocytogenes </it>(NC_003210), <it>Mesorhizobium loti </it>(NC_002678), <it>Methanopyrus kandleri </it>(NC_003551), <it>Mycobacterium tuberculosis </it>CDC1551 (NC_002755), <it>Mycobacterium tuberculosis </it>H37Rv (NC_000962), <it>Mycoplasma genitalium </it>(NC_000908), <it>Mycobacterium leprae </it>strain TN (NC_002677), <it>Mycoplasma pneumoniae </it>(NC_000912), <it>Mycoplasma pulmonis </it>(NC_002771), <it>Neisseria meningitidis </it>Z2491 (NC_003116), <it>Nostoc </it>sp. PCC 7120 (NC_003272), <it>Oceanobacillus iheyensis </it>(NC_004193), <it>Pasteurella multocida </it>(NC_002663), <it>Pseudomonas aeruginosa </it>(NC_002516), <it>Pyrobaculum aerophilum </it>(NC_003364), <it>Ralstonia solanacearum </it>(NC_003295, NC_003296), <it>Rickettsia conorii </it>(NC_003103),<it>Rickettsia prowazekii </it>(NC_000963), <it>Salmonella enterica </it>(NC_003198), <it>Salmonella typhimurium </it>(NC_003197), <it>Shigella flexneri </it>(NC_004337), <it>Sinorhizobium meliloti </it>(NC_003047), <it>Staphylococcus aureus </it>subsp. aureus N315 (NC_002745), <it>Staphylococcus aureus </it>strain Mu50 (NC_002758), <it>Streptococcus agalactiae </it>(NC_004116), <it>Streptococcus pneumoniae </it>R6 (NC_003098), <it>Streptococcus pneumoniae </it>TIGR4 (NC_003038), <it>Streptococcus pyogenes </it>(NC_002737), <it>Synechocystis sp. </it>PCC6803 (NC_000911), <it>Thermoanaerobacter tengcongensis </it>(NC_003896), <it>Thermosynechococcus elongatus </it>(NC_004113), <it>Thermotoga maritima </it>(NC_000853), <it>Treponema pallidum </it>(NC_000919), <it>Ureaplasma urealyticum </it>(NC_002162), <it>Vibrio cholerae </it>(NC_002505, NC_002506), <it>Xanthomonas axonopodis </it>(NC_003919), <it>Xanthomonas campestris </it>(NC_003902), <it>Xylella fastidiosa </it>(NC_002488), <it>Yersinia pestis </it>(NC_003143).</p>
               <p>The data sets for <it>Thermus thermophilus </it>and <it>Picrophilus torridus </it>were preliminary data prepared at the G&#246;ttingen Genomics Laboratory.</p>
            </sec>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The project was carried out within the framework of the Competence Network G&#246;ttingen "Genome research on bacteria" (GenoMik) financed by the German Federal Ministry of Education and Research (BMBF). I thank S. Waack and M. Stanke for discussions concerning statistics and test theory and A. Wiezer and J. Sobkowiak for supplying me with a perfect computational infrastructure.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Asymmetric substitution patterns in the two DNA strands of bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Lobry</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>660</fpage>
            <lpage>665</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8676740</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>G+C3 structuring along the genome: a common feature in Prokaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Perri&#232;re</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>471</fpage>
            <lpage>483</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msg022</pubid>
                  <pubid idtype="pmpid" link="fulltext">12654929</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The codon adaptation index - a measure of directional synonymous codon usage bias, and its potential applications</p>
            </title>
            <aug>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1987</pubdate>
            <volume>15</volume>
            <fpage>1281</fpage>
            <lpage>1295</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3547335</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Phylogenetic classification and the universal tree</p>
            </title>
            <aug>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>284</volume>
            <fpage>2124</fpage>
            <lpage>2129</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.284.5423.2124</pubid>
                  <pubid idtype="pmpid" link="fulltext">10381871</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Molecular archaeology of the Escherichia coli genome</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>9413</fpage>
            <lpage>9417</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.95.16.9413</pubid>
                  <pubid idtype="pmpid" link="fulltext">9689094</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>On surrogate methods for detecting lateral gene transfer</p>
            </title>
            <aug>
               <au>
                  <snm>Ragan</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2001</pubdate>
            <volume>201</volume>
            <fpage>187</fpage>
            <lpage>191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1097(01)00262-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11470360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Detection of lateral gene transfer among microbial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Ragan</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>620</fpage>
            <lpage>626</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(00)00244-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11682304</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Pathogenicity islands and the evolution of microbes</p>
            </title>
            <aug>
               <au>
                  <snm>Hacker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kaper</snm>
                  <fnm>JB</fnm>
               </au>
            </aug>
            <source>Annu Rev Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>54</volume>
            <fpage>641</fpage>
            <lpage>679</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1146/annurev.micro.54.1.641</pubid>
                  <pubid idtype="pmpid" link="fulltext">11018140</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti</p>
            </title>
            <aug>
               <au>
                  <snm>Kaneko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Asamizu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sasamoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Idesawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kimura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kishida</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kiyokawa</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Kohara</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matsumoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Matsuno</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Mochizuki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nakayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Nakazaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Shimpo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Sugimoto</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Takeuchi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tabata</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>331</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11214968</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Capturing whole-genome characteristics in short sequences using a na&#239;ve Bayesian classifier</p>
            </title>
            <aug>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Winberg</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Br&#228;nden</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Kaske</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ernberg</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>C&#246;ster</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>1404</fpage>
            <lpage>1409</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.186401</pubid>
                  <pubid idtype="pmpid" link="fulltext">11483581</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Detection of genes with atypical nucleotide sequence in microbial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Hooper</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>OG</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2002</pubdate>
            <volume>54</volume>
            <fpage>365</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11847562</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Detecting alien genes in bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Mr&#225;zek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Ann N Y Acad Sci</source>
            <pubdate>1999</pubdate>
            <volume>870</volume>
            <fpage>314</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10415493</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Origin and evolution of organisms as deduced from 5S ribosomal RNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Hori</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Osawa</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1987</pubdate>
            <volume>4</volume>
            <fpage>445</fpage>
            <lpage>472</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2452957</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Amelioration of bacterial genomes: rates of change and exchange</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1997</pubdate>
            <volume>44</volume>
            <fpage>383</fpage>
            <lpage>397</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9089078</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Horizontal gene transfer: a critical view</p>
            </title>
            <aug>
               <au>
                  <snm>Kurland</snm>
                  <fnm>CG</fnm>
               </au>
               <au>
                  <snm>Canback</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Berg</snm>
                  <fnm>OG</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>9658</fpage>
            <lpage>9662</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.1632870100</pubid>
                  <pubid idtype="pmpid" link="fulltext">12902542</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Calibrating bacterial evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Elwyn</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>12638</fpage>
            <lpage>12643</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.96.22.12638</pubid>
                  <pubid idtype="pmpid" link="fulltext">10535975</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Horizontal gene transfer in bacterial and archaeal complete genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Garcia-Vallv&#233;</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Romeu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Palau</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1719</fpage>
            <lpage>1725</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.130000</pubid>
                  <pubid idtype="pmpid" link="fulltext">11076857</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>9</volume>
            <fpage>335</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(01)02079-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">11435108</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae</p>
            </title>
            <aug>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gwinn</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Umayam</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Tettelin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Ermolaeva</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Vamathevan</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bass</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Qin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Dragoi</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Sellers</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fleishmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Nierman</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>406</volume>
            <fpage>477</fpage>
            <lpage>483</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35020000</pubid>
                  <pubid idtype="pmpid" link="fulltext">10952301</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A survey of codon and amino acid frequency bias in microbial genomes focusing on translational efficiency</p>
            </title>
            <aug>
               <au>
                  <snm>Merkl</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2003</pubdate>
            <volume>57</volume>
            <fpage>453</fpage>
            <lpage>466</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-003-2499-1</pubid>
                  <pubid idtype="pmpid">14708578</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Lateral gene transfer and the nature of bacterial innovation</p>
            </title>
            <aug>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
               <au>
                  <snm>Groisman</snm>
                  <fnm>EA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>405</volume>
            <fpage>299</fpage>
            <lpage>304</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35012500</pubid>
                  <pubid idtype="pmpid" link="fulltext">10830951</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Comparative genomics identifies the genetic islands that distinguish Neisseria meningitidis, the agent of cerebrospinal meningitis, from other Neisseria species</p>
            </title>
            <aug>
               <au>
                  <snm>Perrin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Bonacorsi</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Carbonnelle</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Talibi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dessen</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Nassif</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Tinsley</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2002</pubdate>
            <volume>70</volume>
            <fpage>7063</fpage>
            <lpage>7072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/IAI.70.12.7063-7072.2002</pubid>
                  <pubid idtype="pmpid" link="fulltext">12438387</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Genome sequence of the plant pathogen Ralstonia solanacearum</p>
            </title>
            <aug>
               <au>
                  <snm>Salanoubat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Genin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Artiguenave</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mangenot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arlat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Billault</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brottier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Camus</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Cattolico</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chandler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Choisne</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Claudel-Renard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cunnac</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Demange</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gaspin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lavie</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moisan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Saurin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Siguier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Th&#233;bault</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Whalen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Boucher</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>497</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415497a</pubid>
                  <pubid idtype="pmpid" link="fulltext">11823852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models</p>
            </title>
            <aug>
               <au>
                  <snm>Nicolas</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bize</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Muri</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Hoebeke</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rodolphe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ehrlich</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Prum</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bessi&#232;res</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>1418</fpage>
            <lpage>1426</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/30.6.1418</pubid>
                  <pubid idtype="pmpid" link="fulltext">11884641</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Comparison of the genomes of two Xanthomonas pathogens with differing host specificities</p>
            </title>
            <aug>
               <au>
                  <snm>da Silva</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Reinach</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Farah</snm>
                  <fnm>CS</fnm>
               </au>
               <au>
                  <snm>Furlan</snm>
                  <fnm>LR</fnm>
               </au>
               <au>
                  <snm>Quaggio</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Monteiro-Vitorello</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Van Sluys</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>NF</fnm>
               </au>
               <au>
                  <snm>Alves</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>do Amaral</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Bertolini</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>LE</fnm>
               </au>
               <au>
                  <snm>Camarotte</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Cannavan</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Cardozo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chambergo</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ciapina</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Cicarelli</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Coutinho</snm>
                  <fnm>LL</fnm>
               </au>
               <au>
                  <snm>Cursino-Santos</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>El-Dorry</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Faria</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Ferreira</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Ferreira</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Ferro</snm>
                  <fnm>MI</fnm>
               </au>
               <au>
                  <snm>Formighieri</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Franco</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Greggio</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Gruber</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Katsuyama</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Kishi</snm>
                  <fnm>LT</fnm>
               </au>
               <au>
                  <snm>Leite</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Lemos</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Lemos</snm>
                  <fnm>MV</fnm>
               </au>
               <au>
                  <snm>Locali</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Machado</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Madeira</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Martinez-Rossi</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Martins</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Meidanis</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Menck</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Miyaki</snm>
                  <fnm>CY</fnm>
               </au>
               <au>
                  <snm>Moon</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Moreira</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Novo</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Okura</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Oliveira</snm>
                  <fnm>VR</fnm>
               </au>
               <au>
                  <snm>Pereira</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Rossi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sena</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Silva</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Spinola</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Takita</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Tamura</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Teixeira</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Tezza</snm>
                  <fnm>RI</fnm>
               </au>
               <au>
                  <snm>Trindade dos Santos</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Truffi</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tsai</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>FF</fnm>
               </au>
               <au>
                  <snm>Setubal</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Kitajima</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>459</fpage>
            <lpage>463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/417459a</pubid>
                  <pubid idtype="pmpid" link="fulltext">12024217</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Recent evidence for evolution of the genetic code</p>
            </title>
            <aug>
               <au>
                  <snm>Osawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jukes</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Muto</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Microbiol Rev</source>
            <pubdate>1992</pubdate>
            <volume>56</volume>
            <fpage>229</fpage>
            <lpage>264</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1579111</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Quantifying the species-specificity in genomic signatures, synonymous codon choice, amino acid usage and G+C content</p>
            </title>
            <aug>
               <au>
                  <snm>Sandberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Br&#228;nden</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Ernberg</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>C&#246;ster</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2003</pubdate>
            <volume>311</volume>
            <fpage>35</fpage>
            <lpage>42</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(03)00581-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12853136</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Salmonella enterica serovar Typhi possesses a unique repertoire of fimbrial gene sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Townsend</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Kramer</snm>
                  <fnm>NE</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Simmonds</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Maloy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dougan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>B&#228;umler</snm>
                  <fnm>AJ</fnm>
               </au>
            </aug>
            <source>Infect Immun</source>
            <pubdate>2001</pubdate>
            <volume>69</volume>
            <fpage>2894</fpage>
            <lpage>2901</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/IAI.69.5.2894-2901.2001</pubid>
                  <pubid idtype="pmpid" link="fulltext">11292704</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Comparative genomics of closely related salmonellae</p>
            </title>
            <aug>
               <au>
                  <snm>Edwards</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Maloy</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2002</pubdate>
            <volume>10</volume>
            <fpage>94</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(01)02293-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11827811</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Diversification of Escherichia coli genomes: are bacteriophages the major contributors?</p>
            </title>
            <aug>
               <au>
                  <snm>Ohnishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kurokawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Hayashi</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2001</pubdate>
            <volume>9</volume>
            <fpage>481</fpage>
            <lpage>485</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(01)02173-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">11597449</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Comparative genomic analysis of Vibrio cholerae: genes that correlate with cholera endemic and pandemic disease</p>
            </title>
            <aug>
               <au>
                  <snm>Dziejman</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Balon</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Boyd</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Mekalanos</snm>
                  <fnm>JJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>1556</fpage>
            <lpage>1561</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.042667999</pubid>
                  <pubid idtype="pmpid" link="fulltext">11818571</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Soft tissue infection and bacteremia caused by Shewanella putrefaciens</p>
            </title>
            <aug>
               <au>
                  <snm>Pagani</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lang</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vedovelli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Moling</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Rimenti</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Prister&#224;</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mian</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Clin Microbiol</source>
            <pubdate>2003</pubdate>
            <volume>41</volume>
            <fpage>2240</fpage>
            <lpage>2241</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JCM.41.5.2240-2241.2003</pubid>
                  <pubid idtype="pmpid" link="fulltext">12734291</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Artemis: sequence visualization and annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Crook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Horsnell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>944</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.10.944</pubid>
                  <pubid idtype="pmpid" link="fulltext">11120685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Limitations of compositional approach to identifying horizontally transferred genes</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2001</pubdate>
            <volume>53</volume>
            <fpage>244</fpage>
            <lpage>250</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s002390010214</pubid>
                  <pubid idtype="pmpid" link="fulltext">11523011</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Codon bias and base composition are poor indicators of horizontally transferred genes</p>
            </title>
            <aug>
               <au>
                  <snm>Koski</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Morton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>404</fpage>
            <lpage>412</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11230541</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Horizontal gene transfer and the origin of species: lessons from bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Trends Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>8</volume>
            <fpage>128</fpage>
            <lpage>133</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0966-842X(00)01703-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">10707066</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Phylogenetic analyses of two "archaeal" genes in Thermotoga maritima reveal multiple transfers between Archaea and Bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Nesb&#248;</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>L'Haridon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Stetter</snm>
                  <fnm>KO</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>362</fpage>
            <lpage>375</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11230537</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Horizontal transfer of archaeal genes into the deinococcaceae: detection by molecular and computer-based approaches</p>
            </title>
            <aug>
               <au>
                  <snm>Olendzenski</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Murphey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Shin</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2000</pubdate>
            <volume>51</volume>
            <fpage>587</fpage>
            <lpage>599</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11116332</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Evolution of aminoacyl-tRNA synthetases-analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events</p>
            </title>
            <aug>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grishin</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>689</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10447505</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Base composition bias might result from competition for metabolic resources</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>291</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02690-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12044357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The source of laterally transferred genes in bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Lerat</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Perri&#232;re</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>R57</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/gb-2003-4-9-r57</pubid>
                  <pubid idtype="pmpid" link="fulltext">12952536</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Synonymous codon usage is subject to selection in thermophilic bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Lynn</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Singer</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>4272</fpage>
            <lpage>4277</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkf546</pubid>
                  <pubid idtype="pmpid" link="fulltext">12364606</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <fpage>2264</fpage>
            <lpage>2268</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">2315319</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Predicted highly expressed genes of diverse prokaryotic genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mrazek</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2000</pubdate>
            <volume>182</volume>
            <fpage>5238</fpage>
            <lpage>5250</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1128/JB.182.18.5238-5250.2000</pubid>
                  <pubid idtype="pmpid" link="fulltext">10960111</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Codon usage tabulated from the international DNA sequence databases; its status 1999</p>
            </title>
            <aug>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>292</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/27.1.292</pubid>
                  <pubid idtype="pmpid" link="fulltext">9847205</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Operons in Escherichia coli: genomic analyses and predictions</p>
            </title>
            <aug>
               <au>
                  <snm>Salgado</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Moreno-Hagelsieb</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2000</pubdate>
            <volume>97</volume>
            <fpage>6652</fpage>
            <lpage>6657</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.110147297</pubid>
                  <pubid idtype="pmpid" link="fulltext">10823905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
