<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>1471-2164-12-19</ui><ji>1471-2164</ji><fm>
<dochead>Research article</dochead>
<bibl>
<title>
<p>Comprehensive prediction of chromosome dimer resolution sites in bacterial genomes</p>
</title>
<aug>
<au id="A1"><snm>Kono</snm><fnm>Nobuaki</fnm><insr iid="I1"/><insr iid="I2"/><email>ciconia@sfc.keio.ac.jp</email></au>
<au ca="yes" id="A2"><snm>Arakawa</snm><fnm>Kazuharu</fnm><insr iid="I2"/><email>gaou@sfc.keio.ac.jp</email></au>
<au id="A3"><snm>Tomita</snm><fnm>Masaru</fnm><insr iid="I2"/><insr iid="I3"/><email>mt@sfc.keio.ac.jp</email></au>
</aug>
<insg>
<ins id="I1"><p>Systems Biology Program, Graduate School of Media and Governance, Keio University, Endo 5322, Fujisawa, Kanagawa 252-8520, Japan</p></ins>
<ins id="I2"><p>Institute for Advanced Biosciences, Keio University, Japan, Endo 5322, Fujisawa, Kanagawa 252-8520, Japan</p></ins>
<ins id="I3"><p>Department of Environment and Information Studies, Keio University, Endo 5322, Fujisawa, Kanagawa 252-8520, Japan</p></ins>
</insg>
<source>BMC Genomics</source>
<issn>1471-2164</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>1</issue>
<fpage>19</fpage>
<url>http://www.biomedcentral.com/1471-2164/12/19</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-12-19</pubid><pubid idtype="pmpid">21223577</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>24</day><month>9</month><year>2010</year></date></rec><acc><date><day>11</day><month>1</month><year>2011</year></date></acc><pub><date><day>11</day><month>1</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Kono et al; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>During the replication process of bacteria with circular chromosomes, an odd number of homologous recombination events results in concatenated dimer chromosomes that cannot be partitioned into daughter cells. However, many bacteria harbor a conserved dimer resolution machinery consisting of one or two tyrosine recombinases, XerC and XerD, and their 28-bp target site, <it>dif</it>.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>To study the evolution of the <it>dif/</it>XerCD system and its relationship with replication termination, we report the comprehensive prediction of <it>dif </it>sequences <it>in silico </it>using a phylogenetic prediction approach based on iterated hidden Markov modeling. Using this method, <it>dif </it>sites were identified in 641 organisms among 16 phyla, with a 97.64% identification rate for single-chromosome strains. The <it>dif </it>sequence positions were shown to be strongly correlated with the GC skew shift-point that is induced by replicational mutation/selection pressures, but the difference in the positions of the predicted <it>dif </it>sites and the GC skew shift-points did not correlate with the degree of replicational mutation/selection pressures.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>The sequence of <it>dif </it>sites is widely conserved among many bacterial phyla, and they can be computationally identified using our method. The lack of correlation between <it>dif </it>position and the degree of GC skew suggests that replication termination does not occur strictly at <it>dif </it>sites.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>In bacteria, replication fork arrest is mainly repaired by homologous recombination <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>. When such a recombination event occurs an odd number of times in one DNA replication event of circular chromosomes, the replicated chromosome is not properly segregated into two daughter chromosomes but instead produces a concatenated dimer <abbrgrp>
<abbr bid="B2">2</abbr>
<abbr bid="B3">3</abbr>
</abbrgrp>. Therefore, many bacteria harbor highly conserved chromosome dimer resolution (CDR) machinery to separate the dimer chromosome into two monomer daughter chromosomes.</p>
<p>In <it>Escherichia coli</it>, chromosome dimers are resolved by two tyrosine recombinases, XerC and XerD, by the addition of a crossover at a specific 28-bp sequence called the <it>dif </it>site, which is located in the replication termination region of the chromosome <abbrgrp>
<abbr bid="B4">4</abbr>
<abbr bid="B5">5</abbr>
</abbrgrp>. The <it>dif </it>sequence contains a pair of palindromic sequence motifs that correspond to the binding domains of XerC and XerD. The reaction is coordinated to the last stages of cell division by an essential cell division protein, FtsK, which functions as a septum-located DNA translocase <abbrgrp>
<abbr bid="B6">6</abbr>
<abbr bid="B7">7</abbr>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
<abbr bid="B10">10</abbr>
</abbrgrp>. FtsK moves along the chromosome unidirectionally towards the <it>dif </it>sequence, thanks to polar and orientated sequences, the KOPS <abbrgrp>
<abbr bid="B11">11</abbr>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp>. CDR is initiated when FtsK reaches <it>dif </it>and its extreme C-terminal domain directly interacts with the C-terminal domain of XerD <abbrgrp>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp>. The <it>dif</it>/XerCD chromosome dimer resolution system seems widely conserved. <it>In vivo </it>experimental evidence for its conservation has been obtained in <it>Xanthomonas campestris</it>, <it>Caulobacter crescentus </it>and <it>Vibrio cholerae </it>
<abbrgrp>
<abbr bid="B19">19</abbr>
<abbr bid="B20">20</abbr>
<abbr bid="B21">21</abbr>
</abbrgrp>. <it>In vitro </it>characterization of Xer recombinases and <it>dif </it>sites has also been carried in <it>Haemophilus influenzae </it>and <it>Bacillus subtilis </it>
<abbrgrp>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
</abbrgrp>. However, the importance of <it>dif</it>/XerCD for the fitness of bacteria has only been demonstrated in <it>E. coli </it>and <it>V. cholerae </it>
<abbrgrp>
<abbr bid="B20">20</abbr>
<abbr bid="B24">24</abbr>
</abbrgrp>. In some other bacteria, like <it>Lactococci </it>and <it>Streptococci</it>, chromosome dimer resolution is resolved by single tyrosine recombinases that act at specific <it>dif </it>site <abbrgrp>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
</abbrgrp>. In this case, dimer resolution still depends on FtsK and <it>dif </it>is still located opposite the origin of replication between oriented polar sequences <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. Several filamentous phages are known to hijack this site-specific recombination machinery of <it>dif</it>/XerCD for their integration into the host chromosome, containing pseudo-<it>dif </it>sequences within these phage genomes <abbrgrp>
<abbr bid="B28">28</abbr>
<abbr bid="B29">29</abbr>
<abbr bid="B30">30</abbr>
<abbr bid="B31">31</abbr>
<abbr bid="B32">32</abbr>
<abbr bid="B33">33</abbr>
<abbr bid="B34">34</abbr>
</abbrgrp>. However, the <it>dif </it>sequence remains intact during such recombination process to ensure the integrity of chromosome dimer resolution machinery <abbrgrp>
<abbr bid="B35">35</abbr>
<abbr bid="B36">36</abbr>
</abbrgrp>. The <it>dif</it>-like sequences in phages often contain more variable central region that is longer than the canonical 6 bp <abbrgrp>
<abbr bid="B31">31</abbr>
<abbr bid="B33">33</abbr>
<abbr bid="B34">34</abbr>
</abbrgrp>, and the XerD binding arm is considerably degenerate <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>.</p>
<p>Because there is only one origin of replication on bacterial circular chromosomes, replication generally terminates in a specific region of the chromosome. This can be followed by the existence of a GC skew on the two replichore arms of the chromosomes with a shift-point opposite the origin of replication <abbrgrp>
<abbr bid="B37">37</abbr>
</abbrgrp>. Based on the observation that <it>dif </it>sites are generally located at or near the GC skew shift-point, Hendrickson and Lawrence proposed that replication might generally terminate at <it>dif</it>, which coordinate replication and chromosome dimer resolution <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. In <it>E. coli</it>, the replication process usually terminates at a narrow region that includes approximately 5% of the genome length and is located directly opposite the replication origin <abbrgrp>
<abbr bid="B39">39</abbr>
<abbr bid="B40">40</abbr>
<abbr bid="B41">41</abbr>
</abbrgrp>. This is partly due to the existence of the Tus/Ter replication fork trap <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. <it>dif </it>is located within the replication fork trap but termination occurs precisely at the Tus site, not at <it>dif </it>
<abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp> and <it>dif </it>is active when displaced outside of the replication termination region if it is still within the zone where KOPS converge <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>. Nevertheless, the lack of universal conservation of the Tus protein may suggest that replication terminated at <it>dif </it>sites until the relatively recent takeover by the Tus-Ter system <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. We reasoned therefore that the comprehensive identification of <it>dif </it>sites and of their location with respect to the GC skew shift-point in hundreds of complete genomes might provide clues to the evolution of the CDR machinery and its possible link with the replication termination mechanism in bacterial species.</p>
<p>Prediction of the <it>dif </it>sequences has been reported by several groups with different approaches. Hendrickson and Lawrence showed that sequence skew can be used to predict the location of <it>dif </it>sites, and they identified putative <it>dif </it>sequences in 25 bacteria based on sequence similarity <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. Le Bourgeois and colleagues reported a new type of tyrosine recombinase, named XerS, which is responsible for CDR in <it>Streptococci </it>and <it>Lactococci </it>and this recombinase targets a 31-bp sequence element named <it>dif</it>
<sub>SL </sub>
<abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. For comparison, they predicted <it>dif </it>sequences in 22 Firmicutes based on their similarity to that of <it>B. subtilis </it>with Megablast <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp> and on the fact that the <it>dif </it>sequence occurs only once per genome. Val and colleagues identified that <it>V. cholerae </it>chromosome II, whose many features are plasmid-like, has an original <it>dif </it>sequence independently, and therefore it has FtsK-dependent CDR <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. For this purpose, they predicted <it>dif </it>sequences in five &#945;-Proteobacteria and ten &#946;-Proteobacteria that harbour multiple chromosomes, and discussed a conserved FtsK-dependent CDR on multiple chromosomes based on the close relative distance of the position of <it>dif </it>sequences and the GC skew shift-points. Their prediction method is based on a HMMER <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp> score (&lt;1.0e-05) with a profile built from 27 aligned <it>dif </it>sequences in the largest chromosomes of &#947;-Proteobacteria species, with manual checking for 6-bp spacing between two XerC and XerD binding motifs.</p>
<p>Carnoy and Roten reported the most comprehensive predictions to date, identifying putative <it>dif </it>sequences in 204 chromosomes in 137 Proteobacteria strains, discussing the high conservation of <it>dif/</it>XerCD systems and the possible loss of <it>dif </it>sequences in endosymbionts, with suggestions for other CDR mechanisms <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>. Here, the prediction was based on BLAST searches and YASS alignment <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp> with the <it>dif </it>sequences of <it>E. coli </it>and <it>B. subtilis</it>, and candidates were selected based on their proximity to the GC skew shift-points and a single occurrence per chromosome. Previous predictions were therefore limited to three bacterial phyla: Proteobacteria, Firmicutes and Actinobacteria.</p>
<p>To this end, we describe comprehensive predictions for <it>dif </it>sequences based on a machine learning approach, tracing the phylogenetic conservation patterns of XerCD recombinases and using an iterative hidden Markov modeling method. Furthermore, we observed the relationship between predicted <it>dif </it>sequence positions and GC skew shift-points, and investigated whether replication termination occurs at the <it>dif </it>site.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<sec>
<st>
<p>Overview of <it>dif </it>sequence prediction</p>
</st>
<p>We first analyzed the phylogenetic conservation patterns of XerC and XerD in bacterial species by calculating the distances of their amino acid sequences from those in the seed organisms with known <it>dif </it>sequences (experimentally confirmed: <it>E. coli </it>and <it>B. subtilis </it>and computationally predicted: <it>Frankia alni</it>). As depicted in Figure <figr fid="F1">1</figr> and Additional file <supplr sid="S1">1</supplr>, Figure S1, sequence similarity distributions were clearly distinguished by phylum. Sequences belonging to different phyla always showed ClustalW distances of &#8805;0.3, and based on this phylogenetic distribution pattern, we separately trained and predicted the <it>dif </it>sequences in each phylum using iterated HMM. By this phylogenetic prediction approach, we predicted <it>dif </it>sequences in 578 genomes out of 592 that harbor the XerCD recombinase (Additional file <supplr sid="S2">2</supplr>, Table S1 for a complete listing). The same prediction method was applied for 66 organisms with multiple chromosomes, totaling 142 chromosomes, where we could predict <it>dif </it>sequences in 63 organisms with 137 chromosomes (Table <tblr tid="T1">1</tblr>).</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>The phylogenetic distance of XerCD in each organism</p></caption><text>
   <p><b>The phylogenetic distance of XerCD in each organism</b>. The phylogenetic distances of bacterial genomes to three seed organisms, <it>Escherichia coli </it>(Proteobacteria), <it>Bacillus subtilis </it>(Firmicutes) and <it>Frankia alni </it>(Actinobacteria), were calculated as the average of phylogenetic distances of XerC and XerD. Detailed example is given in Additional file <supplr sid="S1">1</supplr>, Figure S1. A to C are scatter plots of the distances of these genomes to the seed organisms. Axes represent average distances as calculated by ClustalW. A, Distances from <it>Escherichia coli </it>K-12 and <it>Bacillus subtilis </it>168; B, distance from <it>Escherichia coli </it>K-12 and <it>Frankia alni </it>ACN14a; and C, distance from <it>Bacillus subtilis </it>168 and <it>Frankia alni </it>ACN14a. Blue represent the genomes of Proteobacteria, green represent Firmicutes, yellow represent Actinobacteria, and the gray marks represent other phyla. All phyla show strong preferences for seeds from the same phylum.</p>
</text><graphic file="1471-2164-12-19-1" hint_layout="double"/></fig>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>AdditionalFigures.pdf</b>.</p>
</text>
<file name="1471-2164-12-19-S1.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Complete list of predicted <it>dif </it>sequences</b>.</p>
</text>
<file name="1471-2164-12-19-S2.XLS">
   <p>Click here for file</p>
</file>
</suppl>
<tbl id="T1"><title><p>Table 1</p></title><caption><p>Prediction result overview</p></caption><tblbdy cols="4">
      <r>
         <c ca="left">
            <p>
               <b>Single Chromosome</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Organism</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>Predicted</b>
            </p>
         </c>
         <c ca="right">
            <p>
               <b>%</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Proteobacteria</p>
         </c>
         <c ca="right">
            <p>362</p>
         </c>
         <c ca="right">
            <p>357</p>
         </c>
         <c ca="right">
            <p>98.61</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Firmicutes</p>
         </c>
         <c ca="right">
            <p>100</p>
         </c>
         <c ca="right">
            <p>97</p>
         </c>
         <c ca="right">
            <p>97.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Actinobacteria</p>
         </c>
         <c ca="right">
            <p>66</p>
         </c>
         <c ca="right">
            <p>66</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Bacteroidetes</p>
         </c>
         <c ca="right">
            <p>19</p>
         </c>
         <c ca="right">
            <p>19</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Chlamydiae</p>
         </c>
         <c ca="right">
            <p>14</p>
         </c>
         <c ca="right">
            <p>14</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Chlorobi</p>
         </c>
         <c ca="right">
            <p>11</p>
         </c>
         <c ca="right">
            <p>11</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Acidobacteria</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Verrucomicrobia</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Chloroflexi</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>3</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Gemmatimonadetes</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Nitrospirae</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Elusimicrobia</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Tenericutes</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Spirochaetes</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>100.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Cyanobacteria</p>
         </c>
         <c ca="right">
            <p>5</p>
         </c>
         <c ca="right">
            <p>0</p>
         </c>
         <c ca="right">
            <p>0.00</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Planctomycetes</p>
         </c>
         <c ca="right">
            <p>1</p>
         </c>
         <c ca="right">
            <p>0</p>
         </c>
         <c ca="right">
            <p>0.00</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total</p>
         </c>
         <c ca="right">
            <p>592</p>
         </c>
         <c ca="right">
            <p>578</p>
         </c>
         <c ca="right">
            <p>97.64</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
         <c>
            <p/>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="center">
            <p>
               <b>Multiple Chromosomes</b>
            </p>
         </c>
         <c ca="center">
            <p>Organism (chr)</p>
         </c>
         <c ca="center">
            <p>Predicted (chr)</p>
         </c>
         <c ca="right">
            <p>% (chr %)</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Proteobacteria</p>
         </c>
         <c ca="left">
            <p>60 (130)</p>
         </c>
         <c ca="left">
            <p>57 (125)</p>
         </c>
         <c ca="left">
            <p>95.00 (96.15)</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Spirochaetes</p>
         </c>
         <c ca="left">
            <p>6 (12)</p>
         </c>
         <c ca="left">
            <p>6 (12)</p>
         </c>
         <c ca="left">
            <p>100.00 (100.00)</p>
         </c>
      </r>
      <r>
         <c cspan="4">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>Total</p>
         </c>
         <c ca="left">
            <p>66 (142)</p>
         </c>
         <c ca="left">
            <p>63 (137)</p>
         </c>
         <c ca="left">
            <p>94.45 (96.48)</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>chr: chromosomes</p>
   </tblfn></tbl>
<p>All of these predictions resulted in unique hits above the threshold, and their validity was further confirmed through leave-one-out cross-validation. On the other hand, predictions below the threshold (score &lt; 10 and e-value &gt; 1.0E-04) often resulted in multiple candidates with insufficient scores. When the initial prediction using the strict threshold failed, we manually checked the predicted sequences for the conservation of palindromic structure in the 7-12-bp and 17-22-bp positions, and candidates that were located close to the origin of replication were removed because the displacement of a <it>dif </it>sequence near the origin significantly reduces the growth rate <abbrgrp>
<abbr bid="B24">24</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Prediction results of each phylum</p>
</st>
<p>In Proteobacteria, fuzzy matching in 28 <it>Escherichia </it>strains based on the <it>dif </it>sequence of <it>E. coli </it>K12 for the creation of an initial seed profile hidden Markov model yielded a unique <it>dif </it>sequence in each of the 28 strains. Iterated HMM using this seed profile resulted in unique predictions over the validation threshold in 306 genomes. An additional 137 chromosomes in 69 genomes were predicted with iterated HMM separated by classes, and 10 distant genomes were predicted using an alternative seed profile created with the 3 most similar genomes. The predicted <it>dif </it>sequences totaled 482 in 414 organisms, with a prediction rate of 98.61% for single-chromosome strains and 95.00% for multiple-chromosome strains. Predictions failed in eight organisms and ten chromosomes, namely, <it>Agrobacterium tumefaciens </it>str. C58, <it>Paracoccus denitrificans </it>PD1222 chromosome I, II (&#945;-Proteobacteria), <it>Burkholderia phytofirmans </it>PsJN chromosome I, <it>Burkholderia </it>sp. 383 chromosome I, III, <it>Nitrosospira multiformis </it>ATCC 25196 (&#946;-Proteobacteria), <it>Desulfotalea psychrophila </it>LSv54 (&#948;-Proteobacteria), <it>Sulfurimonas denitrificans </it>DSM 1251 and <it>Nitratiruptor </it>sp. SB155-2 (&#949;-Proteobacteria).</p>
<p>For Firmicutes, fuzzy matching in 17 <it>Bacillus </it>strains (based on the <it>dif </it>sequence of <it>B. subtilis </it>str. 168 for the creation of the initial seed profile hidden Markov model) yielded a unique <it>dif </it>sequence in each of the 17 strains. Iterated HMM using this seed profile resulted in unique prediction over the validation threshold for 79 chromosomes in 79 genomes. The <it>dif </it>sequences are predicted in a total of 97 organisms, with a prediction rate of 97.00%. Prediction failed in three genomes, namely, <it>Clostridium perfringens </it>str. 13, <it>C. beijerinckii </it>NCIMB 8052 (Clostridia), and <it>Lactobacillus helveticus </it>DPC 4571 (Lactobacillales).</p>
<p>Although no experimentally confirmed <it>dif </it>sequence is available for Actinobacteria, that of <it>F. alni </it>is suggested to be 5'-CACGCCGATAATGCACATTATGTCAAGT-3' <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. Therefore, we used this sequence for fuzzy matching in two genomes, <it>Nocardia farcinica </it>IFM 10152 and <it>Mycobacterium avium </it>subsp. paratuberculosis K-10, whose XerCD amino acid sequences were most similar to those of <it>F. alni</it>. Iterated HMM using this seed profile resulted in successful predictions above the validation threshold in all 66 genomes.</p>
<p>In Chlorobi, an initial seed profile was created with predicted <it>dif </it>sequences in <it>Chlorobaculum parvum </it>NCIB 8327 and <it>Prosthecochloris aestuarii </it>DSM 271 that scored above the validation thresholds using the Firmicutes profile, which resulted in the highest scores compared to those of Proteobacteria and Actinobacteria. Likewise, the profile of Firmicutes yielded the highest scores in Chlamydiae, where the initial seed profile was created from predicted <it>dif </it>sequences in <it>Chlamydophila pneumoniae </it>CWL029 and <it>Protochlamydia amoebophila </it>UWE25, which were below the validation thresholds, but contained palindromic structure and were located within 0.01-1.48 degrees from the shift-points of GC skew. Using these seed profiles, iterated HMM successfully predicted <it>dif </it>sequences in all 11 genomes in Chlorobi and 14 genomes in Chlamydiae.</p>
<p>Because the number of genomes is very small in all of the other phyla, we utilized the profiles of Proteobacteria, Firmicutes, Actinobacteria, Chlorobi, and Chlamydia that were created thus far instead of applying iterated HMM based on specific seed profiles, and all of the following candidates were confirmed based on scores, palindromic structure, and position. In Elusimicrobia and Tenericutes, all profiles showed high HMMER scores, and predictions using the profiles of Firmicutes and Chlamydiae predicted identical <it>dif </it>sequences. Similarly, the profiles of Firmicutes, Chlamydiae, and Proteobacteria predicted identical <it>dif </it>sequences in Nitrospirae, and predictions based on the profiles of Proteobacteria and Chlorobi were identical in Gemmatimonadetes.</p>
<p>In Spirochaetes, predictions using the profiles of Firmicutes, Chlamydiae and Proteobacteria profiles resulted in unique <it>dif </it>sequences in species with single chromosomes, and the profiles of Firmicutes were used for the predictions of 12 chromosomes in 6 species with multiple chromosomes, all with HMMER scores above the validation thresholds. The most suitable profiles varied among species in other phyla. In Acidobacteria, the <it>dif </it>sequence of <it>Acidobacterium capsulatum </it>ATCC 51196 was predicted by the profiles of Firmicutes, Chlamydiae, and Chlorobi <it>dif </it>sequences, and other species were predicted using the profile of Firmicutes only. In Verrucomicrobia, profiles based on Proteobacteria, Firmicutes and Chlorobi predicted <it>Methylacidiphilum infernorum </it>V4, and that of Proteobacteria and Firmicutes predicted <it>Opitutus terrae </it>PB90-1 and <it>Akkermansia muciniphila </it>ATCC BAA-835. In Chloroflexi, the Chlorobi profile was suitable for <it>Dehalococcoides </it>sp. BAV1 and <it>Dehalococcoides </it>sp. CBDB1, and that of Actinobacteria was used in <it>D. ethenogenes </it>195 <it>dif </it>sequences. <it>dif </it>sequences were predicted in 14 Bacteroidetes strains using the profile of Proteobacteria, and those in five strains were predicted using alternative profiles created with the three most similar genomes. In this way, we successfully predicted <it>dif </it>sequences in most phyla, although the prediction failed in the phyla Cyanobacteria and Planctomycetes.</p>
</sec>
<sec>
<st>
<p>Correlation of the <it>dif </it>sequence position and the GC skew shift-points</p>
</st>
<p>Using the predicted <it>dif </it>sequences, we compared their positions within the genome to the shift-points of the GC skew. Firstly, we analyzed the distributions of relative genomic distances of <it>xerC</it>, <it>xerD </it>and <it>ftsK </it>genes from the predicted <it>dif </it>sites. As a result, <it>xerC </it>genes were mostly located near the <it>dif </it>sites, <it>xerD </it>genes were near the replication origin, and <it>ftsK </it>genes were located mostly in between <it>xerC </it>and <it>xerD </it>genes (Additional file <supplr sid="S1">1</supplr>, Figure S2). The comparison of positions between predicted <it>dif </it>sites and the shift-points of the GC skew showed that the <it>dif </it>sequences predicted in the phyla Proteobacteria and Firmicutes correlated significantly with the GC skew shift-points that are highly likely to be located within the terminus region (Spearman's rank correlation coefficients: &#961; = 0.844 and 0.715, respectively; Figure <figr fid="F2">2A</figr>). The differences among these positions fell to within 0.00-1.39% of the genome for &#177;1&#963;, and outliers did not exceed 3% in distance relative to the genome size (Additional file <supplr sid="S1">1</supplr>, Figure S3). The above results confirm that chromosome replication and CDR are related, and that show the accuracy of the predictions described in this work.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>The relationship between <it>dif </it>sites and GC skew</p></caption><text>
   <p><b>The relationship between <it>dif </it>sites and GC skew</b>. A. Correlation of the GC skew shift-point (corresponding to the replication terminus region, Y-axis) and the locations of <it>dif </it>sequences (X-axis) for genomes with predicted <it>dif </it>sequences. Genomes with no visible GC skew, as indicated by GC skew Index (GCSI) &#8804; 0.05, are omitted. Both axes are shown as the relative distance in percentage of half of the genome size (replichore size), from the position directly opposite of the replication origin. For example, 0% means that the position is directly opposite of the replication origin identified by the GC skew shift-point, and 100% means that it is at the replication origin. In other words, the higher the percentage, the closer the distance to the replication origin. Here the positions of GC skew shift-points and <it>dif </it>sites are strongly correlated in all three phyla. B. Lack of correlation between the difference in the positions of GC skew shift-points and <it>dif </it>sites (Y-axis) and the GCSI (X-axis). GCSI is a quantitative measure of the degree of GC skew, where GCSI = 0 is no observable skew, and GCSI = 1 is extremely pronounced skew. Typically GC skew is visible at GCSI &#8805; 0.1, and it is pronounced when GCSI &#8805; 0.3. Since we see no correlation in these plots, stronger replication-related mutation bias (i.e. larger GCSI) does not necessarily result in closer positions of the GC skew shift-point and the <it>dif </it>site. These results suggest that the replication termination occurs near the <it>dif </it>site, but not at the <it>dif </it>site. The number of <it>dif </it>sites is 517 in all bacteria, 438 in Proteobacteria and 97 in Firmicutes. The &#961; in this figure is Spearman's rank-correlation coefficient.</p>
</text><graphic file="1471-2164-12-19-2" hint_layout="double"/></fig>
<p>To further investigate whether replication terminates at the <it>dif </it>site, by observing the overall contribution of the genomic selection/mutation pressures of the replication machinery to the collinearity of the <it>dif </it>sequence positions and GC skew shift-points, we plotted the distances between them against the GC Skew Index (GCSI) of genomes to quantify the degree of replicational mutation/selection pressures. GCSI is an index that quantifies the degree of GC skew of a given genome, which can be used as a comparative measure of the accumulated replicational mutation/selection pressures <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp>. Since the strength of the GC skew is speculated to partly correlate with the growth rate of bacteria <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>, high replication mutation/selection rate indicated by GCSI implies a greater number of replication events in these organisms. Therefore, if the replication terminates at or around the <it>dif </it>site, even allowing for statistical fluctuations, we can assume that the increasing number of replication events should shape GC skew shift-points closer to the <it>dif </it>site by the central limit theorem and by the law of large numbers. Hence, genomes with higher GCSI should have closer relative distance between the GC skew shift-points and <it>dif </it>sites, if replication terminates at the <it>dif </it>site. However, as depicted in Figure <figr fid="F2">2B</figr>, we observed no correlation between these two variables (Spearman rank correlation coefficients in Proteobacteria and Firmicutes: &#961; = -0.046 and 0.112, respectively).</p>
</sec>
</sec>
<sec>
<st>
<p>Discussion</p>
</st>
<p>In this study, we first demonstrated that the conservation of XerCD genes follows phylogenetic conservation patterns that are specific to each bacterial phylum (Figure <figr fid="F1">1</figr>). Based on this principle, we comprehensively predicted the <it>dif </it>sequences in hundreds of completely sequenced genomes using a recursive strategy that iteratively models and predicts these sequences using profile hidden Markov models. As a result, we obtained unique candidate <it>dif </it>sequences in 715 chromosomes in 641 strains that were validated through multiple means, resulting in the largest collection of predicted <it>dif </it>sequences assembled to date. In comparison to previous work by Carnoy and Roten, which predicted <it>dif </it>sequences in 228 genomes, our predictions coincided with their results in 208 genomes and we added 507 genomes, including <it>Aromatoleum aromaticum </it>str. EbN1, which Carnoy and Roten reported to lack the <it>dif</it>/XerCD system. Excluding strains or chromosomes we could not predict, namely, <it>A. tumefaciens </it>str. C58, <it>Burkholderia </it>sp. 383 chromosome I, II, <it>D. psychrophila </it>LSv54, <it>N. multiformis </it>ATCC 25196, <it>P. denitrificans </it>PD1222 chromosome I, II and <it>S. denitrificans </it>DSM 1251, the predicted <it>dif </it>sequences in this study differed in 12 chromosomes in comparison to the results of Carnoy and Roten: <it>C. crescentus </it>CB15, <it>Granulibacter bethesdensis </it>CGDNIH1, <it>Pseudoalteromonas haloplanktis </it>TAC125 chromosome II, <it>Ralstonia eutropha </it>H16 chromosome II, <it>Rhodobacter sphaeroides </it>2.4.1 chromosome I, <it>R. sphaeroides </it>2.4.1 chromosome II, <it>Rickettsia bellii </it>OSU 85-389, <it>R. conorii</it>, <it>R. felis </it>URRWXCal2, <it>R. prowazekii</it>, <it>R. typhi </it>Wilmington, and <it>Shewanella </it>sp. ANA-3. For <it>R. eutropha </it>H16 chromosome II and <it>P. haloplanktis </it>TAC125 chromosome II, both studies predicted positions that were symmetric from the origin of replication, and although experimental confirmation is required to confirm which candidates function <it>in vivo</it>, the palindromic structures of the XerCD binding sites are more conserved in the candidates predicted by our method. Therefore, overall, our results were identical with those of Carnoy and Roten for 92% of the genome analyzed (208/228), and 11/12 mismatch resulted in candidates with more conserved XerCD binding sites, with the addition of 507 genomes among numerous phyla. Carnoy and Roten noted that some <it>Vibrio </it>species contain two <it>dif </it>sites both located at the vicinity of the GC skew shift-points. Therefore, we further tested whether the predicted <it>dif </it>sites in multiple chromosomes are all located near the GC skew shift-points. Using 5% genomic distance as a threshold, 45 out of 54 strains with two chromosomes, including <it>Vibrio </it>species, and 6 out of 9 strains with three chromosomes showed such agreement of the positions, (Additional file <supplr sid="S2">2</supplr>, Table S1).</p>
<p>There are four factors that may explain the advantages of our results. First, the selection of bacterial strains in the study by Carnoy and Roten was limited to genomes harboring XerCD that were identified by their similarity to those of <it>E. coli</it>, whereas we used all genomes with XerCD orthologs as identified by the KEGG Orthology database. While there is a little time-delay until the sequences are annotated and incorporated into the KEGG Orthology database, use of this database provides a more generic and comprehensive starting point. Second, similarity searches using software tools such as BLAST are not suitable for short sequence motifs that undergo mutation, and the difficulty in identifying only those <it>dif </it>sequences with sequence similarity has been shown for <it>C. crescentus </it>
<abbrgrp>
<abbr bid="B50">50</abbr>
</abbrgrp> and several classes of Proteobacteria <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Third, <it>dif </it>sequences require two binding motifs of XerC and XerD to be functional <abbrgrp>
<abbr bid="B51">51</abbr>
</abbrgrp>; therefore, the conservation of palindromic structure at the 7-12-bp and 17-22-bp positions should be confirmed for each predicted candidate. Finally, the use of iterated HMM allowed <it>dif </it>sequence prediction using the profiles of closely related species for each iteration, following the phylogenetic conservation pattern of XerCD.</p>
<p>The high predictability shown in this study suggests that the <it>dif/</it>XerCD system of chromosome dimer resolution is highly conserved among bacterial species and that <it>dif </it>sequences are almost always conserved when XerCD is present within the genome. In fact, according to the KEGG Orthology database, XerC and XerD are conserved in approximately 60-70% of bacterial species, which is a higher percentage than is found for the replication termination protein Tus <abbrgrp>
<abbr bid="B52">52</abbr>
</abbrgrp> and for universal genes such as the SOS response repressor LexA <abbrgrp>
<abbr bid="B53">53</abbr>
</abbrgrp>. In light of the remarkable conservation of the <it>dif/</it>XerCD system, although it is beyond the scope of this study, explorations of alternative CDR machinery in species that lack the <it>dif/</it>XerCD machinery would be an interesting area of future research. Chromosome dimer resolution pathways are suggested to be present in species that lack the <it>dif/</it>XerCD system, and several alternative pathways have been reported and suggested. Le Bourgeois <it>et al</it>. reported an unconventional CDR pathway involving only one recombinase (XerS) in <it>Streptococci </it>and <it>Lactococci</it>, along with a 31-bp <it>dif </it>sequence <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>. Similarly, through computational analysis, Carnoy and Roten suggested the existence of another pathway, termed XerH, in &#949;-Proteobacteria in place of XerCD and XerS and discussed the likelihood of the existence of <it>dif </it>analogues in these species <abbrgrp>
<abbr bid="B26">26</abbr>
<abbr bid="B46">46</abbr>
</abbrgrp>. The basic strategy of iterated HMM should be applicable in predicting <it>dif </it>analogues in these species when defined seed sequences and detailed positions of recombinase binding sites are elucidated.</p>
<p>Although we limited our analysis to strains containing XerCD orthologs, our predictions failed in several species. In Proteobacteria, we could not identify <it>dif </it>sequences in five organisms and seven chromosomes, including species with single chromosomes (<it>Nitratiruptor </it>sp. SB155-2 and <it>S. denitrificans </it>DSM 1251) that are &#949;-Proteobacteria, where an alternative CDR mechanism involving XerH is suggested <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, and species with multiple chromosomes (<it>P. denitrificans </it>PD1222 chromosome I, <it>P. denitrificans </it>PD1222 chromosome II, <it>B. phytofirmans </it>PsJN chromosome I, and <it>Burkholderia </it>sp. 383 chromosome I and III). Among these, <it>B. phytofirmans </it>PsJN and <it>Burkholderia </it>sp. 383 contained <it>dif </it>sequences in other chromosomes, indicating that the <it>dif/</it>XerCD system is conserved in these strains. Similarly, in Firmicutes, we could not determine <it>dif </it>sequence in <it>L. helveticus </it>DPC 4571, <it>C. perfringens </it>str. 13 or <it>C. beijerinckii </it>NCIMB 8052. Among these strains, <it>L. helveticus </it>DPC 4571 has an alternative CDR recombinase XerS in its genome, indicating that the <it>dif/</it>XerCD system may not be functional. This is an intriguing example of possible evolutionary intermediate with the co-existence of two systems, presumably resulting from a horizontal gene transfer event. While we are unable to find a <it>dif </it>sequence corresponding to the XerS machinery, <it>xerS </it>gene in this species is located close to the GC skew shift-point (<it>xerC</it>: 1031814-bp, <it>xerD</it>: 1055574-bp, <it>xerS</it>: 1228715-bp, and GC skew shift-point: 1225733-bp), which is indicative of its functionality as shown in previous works <abbrgrp>
<abbr bid="B25">25</abbr>
<abbr bid="B26">26</abbr>
<abbr bid="B46">46</abbr>
</abbrgrp>. <it>C. perfringens </it>str. 13 and <it>C. beijerinckii </it>exhibit highly biased GC contents (28.57% and 29.86%, respectively), and hidden Markov profiling of AT-rich <it>dif </it>sequences may have failed due to the background AT-richness of the genome. Comparative studies of <it>dif/</it>XerCD systems using close relatives of these genomes may provide evolutionary clues regarding the acquisition and loss of CDR machinery. For example, mapping the types of CDR machinery to the phylogenetic tree of &#949;-Proteobacteria obtained using 16S rRNA sequences with the dnaml program in the Phylip package shows that a XerH type of CDR machinery may have diverged at an early stage within this phylum. The XerCD type of CDR seems to be absent in the <it>Campylobacter </it>and <it>Helicobacter </it>genera, except for <it>Helicobacter hepaticus</it>, which suggests the existence of the XerH type of CDR in the common ancestor of these species (Figure <figr fid="F3">3</figr>). The <it>dif </it>candidate in <it>H. hepaticus </it>was predicted with iterated HMM only marginally above the threshold, with a score of 10.2 and an e-value of 5.5e-05. Further analysis is required to identify whether this species actually contains <it>dif/</it>XerCD or XerH-type machinery.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Phylogenetic tree based on rRNA for the comparison of XerCD- and XerH-containing genomes</p></caption><text>
   <p><b>Phylogenetic tree based on rRNA for the comparison of XerCD- and XerH-containing genomes</b>. This phylogenetic tree is constructed using the maximum-likelihood method and is based on 16S rRNAs of 14 organisms in &#949;-Proteobacteria, whose <it>dif </it>sequences are predicted in this study. The outgroup is <it>Escherichia coli </it>K12.</p>
</text><graphic file="1471-2164-12-19-3" hint_layout="double"/></fig>
<p>Predictions failed in all species belonging to the phylum Cyanobacteria. Although XerCD is present in these species, the sequence similarity distance of XerCD in Cyanobacteria to those of other phyla was high (average of 0.358 &#177; 0.0159, N = 540), with a minimum distance of 0.322 to <it>Actinosynnema mirum </it>(Actinobacteria), which exceeded the 0.3 threshold that was shown in Figure <figr fid="F4">4</figr>. Therefore, this divergence of XerCD in Cyanobacteria from those of other phyla implies low applicability of the iterated HMM approach, which utilizes the phylogenetic conservation pattern of XerCD. One possible explanation for the prediction failure in this phylum is that the <it>dif </it>sequences and XerCD are highly divergent in Cyanobacteria, preventing their identification with sequence profiles. The replication origin in Cyanobacteria is yet to be identified, and GC skew is weak in these species, implying low degree of replicational mutation/selection pressures, which could also be a reason for the failure of prediction in these species.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Prediction strategy</p></caption><text>
   <p><b>Prediction strategy</b>. A. Example of the iterated HMM in Proteobacteria. The first seed profile hidden Markov model is created from the seed <it>dif </it>sequence of <it>Escherichia coli</it>, by searching for <it>dif </it>sequences in 28 genomes belonging to the genus <it>Escherichia </it>by means of fuzzy matching. Based on this initial profile hidden Markov model, <it>dif </it>sequences were predicted in the genomes of the closest genus to the <it>Escherichia </it>genus (in this case, <it>Shigella</it>) according to XerCD amino acid sequences. Subsequently, a new profile is created using the previous profile and the newly predicted <it>dif </it>sequences, and this new profile is used to predict in the second closest genus (in this case, <it>Salmonella</it>). In this way, profile creation and <it>dif </it>sequence prediction were repeated recursively in decreasing order of similarity of XerCD from the <it>Escherichia </it>sequence. In this way, iterated HMM is conducted for each phylum. B. Flow chart of the overall strategy.</p>
</text><graphic file="1471-2164-12-19-4" hint_layout="double"/></fig>
<p>Predicted <it>dif </it>sequences largely existed in non-coding regions (93.92%). More than half of these coding regions that contained <it>dif </it>sequences were hypothetical, with no functional annotation. Furthermore, we found two <it>dif </it>sequences included in phage ORF in <it>Vibrio </it>and <it>Xanthomonas</it>. While these sequences may be integrated with the phages by their hijacking of the host recombination machinery, these sequences are speculated to be the functional <it>dif </it>sites, due to 1. their unique occurrence within the genome opposite of the replication origin, and 2. their similarity as identified by our phylogenetic modeling approach. As previously shown in Proteobacteria <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>, the XerC binding site is more variable and the XerD binding site is more conserved in all phyla (Figure <figr fid="F5">5</figr>), both for genomes with single chromosomes and for those with multiple chromosomes, presumably due to the interaction between XerD and FtsK for the initiation of first strand exchange <abbrgrp>
<abbr bid="B14">14</abbr>
</abbrgrp>. The <it>dif </it>sequences in &#945;-Proteobacteria with single chromosomes showed higher variation compared to these of other classes and phyla, but this variation was correlated with variations in genomic GC content (Additional file <supplr sid="S1">1</supplr>, Figure S4). These differences between variations are partly explains the failure of our prediction in extremely AT-rich genomes, such as those found in <it>C. perfringens </it>and <it>C. beijerinckii</it>.</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>The conservation of <it>dif </it>sequences</p></caption><text>
   <p><b>The conservation of <it>dif </it>sequences</b>. This figure shows the conservation quantities at each position of <it>dif </it>sequence in each phylum or class (Proteobacteria, Firmicutes, Actinobacteria, Bacteroidetes, &#945;-Proteobacteria, &#946;-Proteobacteria, &#947;-Proteobacteria, and &#948;-Proteobacteria). The black bars represent the degree of conservation in single-chromosome genomes, and the gray bars represent that of organisms harboring multiple chromosomes. The labels "XerC domain" and "XerD domain" in these graphs represent the binding sites of these proteins. The X-axis represents the nucleotide positions in the <it>dif </it>sequence, and the Y-axis represents the nucleotide conservation quantity. Y-axis values were normalized to percentages.</p>
</text><graphic file="1471-2164-12-19-5" hint_layout="double"/></fig>
<p>Although <it>dif </it>sequences are expected to be located near the shift-point of the GC skew, we did not use this feature to predict and validate <it>dif </it>sequences with iterated HMM; therefore, using the comprehensively predicted <it>dif </it>sequences across numerous phyla, we were able to directly compare the positions of predicted <it>dif </it>sequences with those of the GC skew shift-points to analyze their relationships. As expected, these two positions are highly correlated in terms of genomic loci, confirming a previous work <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>. In this respect, because GC skew is the cumulative result of replicational selection/mutations, the degree of conservation of the CDR machinery is presumably in concordance with the degree of replication selection/mutation pressures (i.e. GC skew), which is partly characterized by the difference in the replication machinery and partly characterized by the growth rate <abbrgrp>
<abbr bid="B54">54</abbr>
</abbrgrp>. On the other hand, as shown in Figure <figr fid="F2">2B</figr>, the differences in the positions of the GC skew shift-point and the strength of the GC skew, as quantified by GCSI, were not correlated. If replication termination occurs at the <it>dif </it>site, as proposed by Hendrickson and Lawrence <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>, a stronger GC skew that is generated by a larger number of replication events and/or a higher mutation rate should statistically bring the GC skew shift-point closer to the <it>dif </it>site by the central limit theorem and law of large numbers. In fact, the overall correlation of these loci leads to the proposal that the <it>dif </it>site is the replication termination point. However, because a stronger degree of replication mutation/selection pressures does not bring these two loci closer to each other, they are not in a causal relationship. Therefore, although the <it>dif </it>sequence is located near the replication termination site for efficient CDR, the replication termination site is suggested to be at a site other than the <it>dif </it>site, as was recently shown <it>in vivo </it>
<abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. On the other hand, the <it>dif </it>sequences in Firmicutes are more conserved in various phyla because the profile of Firmicutes was the best suited as the initial profile of iterated HMM in Chlorobi, Acidobacteria, Gemmatimonadetes, Nitrospirae, Elusimicrobia, Tenericutes, and Spirochaetes, where initial seed sequences were not available, and those in Proteobacteria were more variable, as shown by the requirement to predict by iterated HMM in classes instead of phyla. Tus proteins, which are shown to terminate replication <it>in vivo</it>, are more conserved in Proteobacteria and are not widely conserved in other, partly supporting the possible change in replication termination mechanism by a relatively recent takeover by the Tus-Ter system <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. On the other hand, to the best of our knowledge, Tus analogues have not been comprehensively searched in other phyla, and therefore further analysis is required in order to fully support this hypothesis.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>By taking the phylogenetic iterated HMM approach and validating predicted candidates through a combination of HMMER score thresholds, conservation of palindromic structure, and cross-validation, we achieved a comprehensive identification of unique <it>dif </it>candidates in hundreds of genomes. As the result, we obtained unique candidate <it>dif </it>sequences in 715 chromosomes in 641 strains that were validated through multiple means, resulting in the largest collection of predicted <it>dif </it>sequences assembled to date. All of the predicted <it>dif </it>sequences described in this study, as well as visualizations of <it>dif </it>locations on circular genome maps, are freely available in an online database at <url>http://www.g-language.org/data/repter/</url>. The locations of <it>dif </it>sequences can be useful for studies of the regions surrounding the replication terminus, for phylogenetic studies of the replication termination and chromosome dimer resolution mechanisms, and can serve as supporting evidence for GC skew analyses.</p>
<p>Furthermore, we compared the positions of predicted <it>dif </it>sequences with those of the GC skew shift-points to understand the relationship between <it>dif </it>sequence and replication terminus using GCSI. As the result, although these two positions were highly correlated in terms of genomic loci, the differences in the positions of the GC skew shift-point and the GCSI were not correlated. Therefore, despite the <it>dif </it>sequence is located near the replication termination site for efficient CDR, the replication termination site is suggested to be at a site other than the <it>dif </it>site.</p>
</sec>
<sec>
<st>
<p>Methods</p>
</st>
<sec>
<st>
<p>Software and sequences</p>
</st>
<p>All analyses in this study were conducted using programs written in Perl with the G-language Genome Analysis Environment, version 1.8.10 <abbrgrp>
<abbr bid="B55">55</abbr>
<abbr bid="B56">56</abbr>
<abbr bid="B57">57</abbr>
</abbrgrp>. Hidden Markov Modeling and searching was conducted with HMMER, version 2.3.2 <abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. The <it>dif </it>sequence is the binding site of the XerCD recombinase; therefore, we first selected 734 circular bacterial chromosomes among 658 species/strains according to their conservation of XerCD using the KEGG (Kyoto Encyclopedia of Genes and Genomes) Orthology database (KO; <abbrgrp>
<abbr bid="B58">58</abbr>
</abbrgrp>). We obtained these sequences from the NCBI FTP Repository <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>. The following experimentally confirmed (<it>E. coli </it>and <it>B. subtilis</it>) or computationally predicted (<it>F. alni</it>) <it>dif </it>sequences were used as seed sequences for subsequent searches and machine learning:</p>
<p>
<it>E. coli </it>5'-GGTGCGCATAATGTATATTATGTTAAAT-3' <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>
</p>
<p>
<it>B. subtilis </it>5'-ACTTCCTAGAATATATATTATGTAAACT-3' <abbrgrp>
<abbr bid="B23">23</abbr>
</abbrgrp>
</p>
<p>
<it>F. alni </it>5'-CACGCCGATAATGCACATTATGTCAAGT-3' <abbrgrp>
<abbr bid="B38">38</abbr>
</abbrgrp>
</p>
</sec>
<sec>
<st>
<p>Iterated Hidden Markov Modeling</p>
</st>
<p>XerCD conservation does not immediately imply <it>dif </it>sequence conservation <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>. Therefore, to determine the phylogenetic conservation patterns of XerCD, we first aligned all XerCD amino acid sequences in the 734 genomes analyzed in this work with those in organisms with the above-mentioned <it>dif </it>sequences using ClustalW <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp>. The average of the distances of XerC and XerD sequences that were calculated from this alignment were used to infer phylogenetic conservation patterns among phyla.</p>
<p>Based on the phylogenetic conservation patterns of XerCD, we iteratively created the hidden Markov models (HMM) for the accurate prediction of <it>dif </it>sequences, seeded with the previously described <it>dif </it>sequences (Figure <figr fid="F4">4A</figr>). Iterated HMM is shown to be able to build a more diverse and potentially more sensitive models than regular HMM, by incorporating distant homologous sequences while avoiding the contamination of non-homologous sequences into the model <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>, and thus iterative HMM has been frequently utilized in bioinformatics and computational biology <abbrgrp>
<abbr bid="B63">63</abbr>
<abbr bid="B64">64</abbr>
<abbr bid="B65">65</abbr>
<abbr bid="B66">66</abbr>
</abbrgrp>. In this work, the first profile hidden Markov model was created from the <it>dif </it>sequences identified in genomes belonging in the same genus as the genome harboring the seed sequence. For example, in Proteobacteria, the seed sequences came from <it>E. coli; </it>therefore, the <it>dif </it>sequences were searched in 28 genomes belonging to the genus <it>Escherichia </it>by means of fuzzy matching with the seed sequences of <it>E. coli </it>K12 using Perl module String::Approx 3.26 <abbrgrp>
<abbr bid="B67">67</abbr>
</abbrgrp>. For fuzzy matching, the maximum numbers of insertions, deletions, and substitutions were previously determined to be 0-bp, 0-bp, and 8-bp, respectively <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>. Likewise, initial profiles were created for Firmicutes based on 24 genomes in the genus <it>Bacillus </it>and for Actinobacteria based on two genomes in the genus <it>Frankia</it>. Based on these initial profile hidden Markov models, <it>dif </it>sequences were predicted in the genomes of the closest genus to the seed genus according to the amino acid sequences of XerCD proteins.</p>
<p>In the case of Proteobacteria, an initial profile was created using genomes belonging to the genus <it>Escherichia</it>, and this profile was used to predict <it>dif </it>sequences in the genus <it>Shigella</it>. Subsequently, a new profile was created using the previous profile and the newly predicted <it>dif </it>sequences, and this new profile was used to predict the second nearest genus (in the case of Proteobacteria, <it>Salmonella</it>). In this way, profile creation and <it>dif </it>sequence prediction were iterated in decreasing order of similarity of XerCD from the seed sequences; thus, iterated HMM was conducted for each phylum. Because no <it>dif </it>seed sequences were available for phyla other than the three described above, the three profile hidden Markov models obtained by iterated HMM in Proteobacteria, Firmicutes, and Actinobacteria were used as the initial profiles. At each iterated HMM, predicted candidates were validated according to the following criteria: 1) HMMER score &#8805;10 and E-value &lt; 1.0e-04, 2) leave-one-out cross-validation using the new profiles, and 3) conservation of the palindromic structure. For cross-validation, each time a new profile was created in the iterated HMM, we tested the validity of the training set by leaving out one of the <it>dif </it>sequences from the accumulated set of <it>dif </it>sequences and checking that the prediction of the left-out sequence by training with all of the other <it>dif </it>sequences is always above the threshold for all <it>dif </it>sequences collected up to that iteration. For the palindromic structure, positions 7-12-bp and 17-22-bp of <it>dif </it>sequences, corresponding to the binding sites of XerC and XerD, were checked for complementarities. For example, the palindromic structure of <it>E. coli dif </it>sequences in bracket notation is "--(--- ((((((-()-)))))) ---)--", and the conservation threshold is set to more than four pairs of complementarities within the 7-12-bp and 17-22-bp positions of the predicted <it>dif </it>sequences.</p>
<p>Although iterated HMM is based on phyla, this taxonomic unit is sometimes too diverse to accurately follow phylogeny with recursive means. Therefore, prediction was separately conducted in classes instead of phyla for 60 strains, harboring 130 chromosomes for classes &#945;-, &#946;- and &#947;-Proteobacteria. Similarly, sometimes, a species is highly phylogenetically distant from the seed organism, making it the case that utilization of profile hidden Markov models from other phyla is more suitable than own phyla's profile. When iterated HMM fails in such cases, an alternative seed profile is created using the <it>dif </it>sequences from the top three genomes with the closest XerCD sequences, as determined by alignment using ClustalW (Figure <figr fid="F4">4B</figr>).</p>
<p>GC skew's shift-point, calculated as (C - G)/(C + G), was computed using the find_ori_ter function of the G-language GAE, based on the cumulative GC skew <abbrgrp>
<abbr bid="B68">68</abbr>
</abbrgrp> at 1-bp resolution. Although GC skew is widely observed in bacterial species, a number of genomes do not exhibit notable compositional bias <abbrgrp>
<abbr bid="B48">48</abbr>
<abbr bid="B69">69</abbr>
</abbrgrp>. To determine the presence of genomic nucleotide compositional bias, the GC skew Index (GCSI) was calculated for all genomes, and GCSI &#8805; 0.05 was used as the threshold <abbrgrp>
<abbr bid="B48">48</abbr>
<abbr bid="B70">70</abbr>
</abbrgrp>. GCSI quantifies the degree of GC skew using the compositional distance between the leading and lagging strands and the spectral amplitude of 1 Hz signal of GC skew graph using Fast Fourier Transform. In this study, the replication origin is defined based on the cumulative GC skew at 1-bp resolution using the G-language GAE <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Calculation of the conservation quantity of <it>dif </it>sequences</p>
</st>
<p>Conservation quantity was calculated based on the nucleotide variance in each position of <it>dif </it>sequences in Figure <figr fid="F5">5</figr>. Firstly, we calculated the position-specific base composition of all <it>dif </it>sequences in a group (phylum or class). Subsequently, variance of the most frequent base in that position is calculated from the base composition. For example, when a group with 100 <it>dif </it>sequences has nth base composition of (A, T, G, C = 100, 0, 0, 0) or (A, T, G, C = 25, 25, 25, 25), the variance is 2500 or 0, respectively. Hence, if the position-specific base composition is biased toward any one base, its high variance indicates high degree of conservation. These values are normalized to percentages for comparison with other groups in Figure <figr fid="F5">5</figr>. In the case of multiple chromosomes, since these conservation quantities were calculated in each strain, the average value was used for normalization.</p>
</sec>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>NK carried out the analysis, and NK and KA wrote the manuscript. MT supervised the work, and all authors read and approved the final manuscript.</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>The authors would like to thank the anonymous reviewers for thoughtful comments and suggestions. This research was supported by a Grant-in-Aid for JSPS Fellows and funds from the Yamagata Prefectural Government and Tsuruoka City.</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Multiple pathways process stalled replication forks</p></title><aug><au><snm>Michel</snm><fnm>B</fnm></au><au><snm>Grompone</snm><fnm>G</fnm></au><au><snm>Flor&#232;s</snm><fnm>MJ</fnm></au><au><snm>Bidnenko</snm><fnm>V</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2004</pubdate><volume>101</volume><fpage>12783</fpage><lpage>12788</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0401586101</pubid><pubid idtype="pmcid">516472</pubid><pubid idtype="pmpid">15328417</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Genetic recombination and the cell cycle: What we have learned from chromosome dimers</p></title><aug><au><snm>Lesterlin</snm><fnm>C</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au></aug><source>Mol Microbiol</source><pubdate>2004</pubdate><volume>54</volume><fpage>1151</fpage><lpage>1160</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2004.04356.x</pubid><pubid idtype="pmpid" link="fulltext">15554958</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Bacterial chromosome dynamics</p></title><aug><au><snm>Sherratt</snm><fnm>D</fnm></au></aug><source>Science</source><pubdate>2003</pubdate><volume>301</volume><fpage>780</fpage><lpage>785</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1084780</pubid><pubid idtype="pmpid" link="fulltext">12907786</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Two related recombinases are required for site-specific recombination at dif and cer in E. coli K12</p></title><aug><au><snm>Blakely</snm><fnm>G</fnm></au><au><snm>May</snm><fnm>G</fnm></au><au><snm>McCulloch</snm><fnm>R</fnm></au><au><snm>Arciszewska</snm><fnm>LK</fnm></au><au><snm>Burke</snm><fnm>M</fnm></au><au><snm>Lovett</snm><fnm>ST</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>Cell</source><pubdate>1993</pubdate><volume>75</volume><fpage>351</fpage><lpage>361</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0092-8674(93)80076-Q</pubid><pubid idtype="pmpid" link="fulltext">8402918</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Site-specific recombination promoted by a short DNA segment of plasmid R1 and by a homologous segment in the terminus region of the Escherichia coli chromosome</p></title><aug><au><snm>Clerget</snm><fnm>M</fnm></au></aug><source>New Biol</source><pubdate>1991</pubdate><volume>3</volume><fpage>780</fpage><lpage>788</lpage><xrefbib><pubid idtype="pmpid">1931823</pubid></xrefbib></bibl><bibl id="B6"><title><p>The cytoplasmic domain of FtsK protein is required for resolution of chromosome dimers</p></title><aug><au><snm>Steiner</snm><fnm>W</fnm></au><au><snm>Liu</snm><fnm>G</fnm></au><au><snm>Donachie</snm><fnm>WD</fnm></au><au><snm>Kuempel</snm><fnm>P</fnm></au></aug><source>Mol Microbiol</source><pubdate>1999</pubdate><volume>31</volume><issue>2</issue><fpage>579</fpage><lpage>583</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2958.1999.01198.x</pubid><pubid idtype="pmpid" link="fulltext">10027974</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>FtsK functions in the processing of a Holliday junction intermediate during bacterial chromosome segregation</p></title><aug><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Aroyo</snm><fnm>M</fnm></au><au><snm>Colloms</snm><fnm>SD</fnm></au><au><snm>Helfrich</snm><fnm>A</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>Genes Dev</source><pubdate>2000</pubdate><volume>14</volume><issue>23</issue><fpage>2976</fpage><lpage>2988</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.188700</pubid><pubid idtype="pmcid">317095</pubid><pubid idtype="pmpid">11114887</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><title><p>FtsK activities in Xer recombination, DNA mobilization and cell division involve overlapping and separate domains of the protein</p></title><aug><au><snm>Bigot</snm><fnm>S</fnm></au><au><snm>Corre</snm><fnm>J</fnm></au><au><snm>Louarn</snm><fnm>JM</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Mol Microbiol</source><pubdate>2004</pubdate><volume>54</volume><issue>4</issue><fpage>876</fpage><lpage>886</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2004.04335.x</pubid><pubid idtype="pmpid" link="fulltext">15522074</pubid></pubidlist></xrefbib></bibl><bibl id="B9"><title><p>Delayed activation of Xer recombination at dif by FtsK during septum assembly in Escherichia coli</p></title><aug><au><snm>Kennedy</snm><fnm>SP</fnm></au><au><snm>Chevalier</snm><fnm>F</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Mol Microbiol</source><pubdate>2008</pubdate><volume>68</volume><issue>4</issue><fpage>1018</fpage><lpage>1028</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2008.06212.x</pubid><pubid idtype="pmpid" link="fulltext">18363794</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Fully efficient chromosome dimer resolution in Escherichia coli cells lacking the integral membrane domain of FtsK</p></title><aug><au><snm>Dubarry</snm><fnm>N</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>EMBO J</source><pubdate>2010</pubdate><volume>29</volume><fpage>597</fpage><lpage>605</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/emboj.2009.381</pubid><pubid idtype="pmpid" link="fulltext">20033058</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>Fast, DNA-sequence independent translocation by FtsK in a single-molecule experiment</p></title><aug><au><snm>Saleh</snm><fnm>OA</fnm></au><au><snm>P&#233;rals</snm><fnm>C</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Allemand</snm><fnm>JF</fnm></au></aug><source>EMBO J</source><pubdate>2004</pubdate><volume>23</volume><issue>12</issue><fpage>2430</fpage><lpage>2439</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.emboj.7600242</pubid><pubid idtype="pmcid">423284</pubid><pubid idtype="pmpid">15167891</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>KOPS: DNA motifs that control E. coli chromosome segregation by orienting the FtsK translocase</p></title><aug><au><snm>Bigot</snm><fnm>S</fnm></au><au><snm>Saleh</snm><fnm>OA</fnm></au><au><snm>Lesterlin</snm><fnm>C</fnm></au><au><snm>Pages</snm><fnm>C</fnm></au><au><snm>El Karoui</snm><fnm>M</fnm></au><au><snm>Dennis</snm><fnm>C</fnm></au><au><snm>Grigoriev</snm><fnm>M</fnm></au><au><snm>Allemand</snm><fnm>JF</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au></aug><source>EMBO J</source><pubdate>2005</pubdate><volume>24</volume><issue>21</issue><fpage>3770</fpage><lpage>3780</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.emboj.7600835</pubid><pubid idtype="pmcid">1276719</pubid><pubid idtype="pmpid">16211009</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>Oriented loading of FtsK on KOPS</p></title><aug><au><snm>Bigot</snm><fnm>S</fnm></au><au><snm>Saleh</snm><fnm>OA</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au><au><snm>Allemand</snm><fnm>JF</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Nat Struct Mol Biol</source><pubdate>2006</pubdate><volume>13</volume><issue>11</issue><fpage>1026</fpage><lpage>1028</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nsmb1159</pubid><pubid idtype="pmpid" link="fulltext">17041597</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>FtsK is a DNA motor protein that activates chromosome dimer resolution by switching the catalytic state of the XerC and XerD recombinases</p></title><aug><au><snm>Aussel</snm><fnm>L</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Aroyo</snm><fnm>M</fnm></au><au><snm>Stasiak</snm><fnm>A</fnm></au><au><snm>Stasiak</snm><fnm>AZ</fnm></au><au><snm>Sherratt</snm><fnm>D</fnm></au></aug><source>Cell</source><pubdate>2002</pubdate><volume>108</volume><fpage>195</fpage><lpage>205</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0092-8674(02)00624-4</pubid><pubid idtype="pmpid" link="fulltext">11832210</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Dissection of a functional interaction between the DNA translocase, FtsK, and the XerD recombinase</p></title><aug><au><snm>Yates</snm><fnm>J</fnm></au><au><snm>Zhekov</snm><fnm>I</fnm></au><au><snm>Baker</snm><fnm>R</fnm></au><au><snm>Eklund</snm><fnm>B</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au><au><snm>Arciszewska</snm><fnm>LK</fnm></au></aug><source>Mol Microbiol</source><pubdate>2006</pubdate><volume>59</volume><fpage>1754</fpage><lpage>1766</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2005.05033.x</pubid><pubid idtype="pmcid">1413583</pubid><pubid idtype="pmpid">16553881</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Species specificity in the activation of Xer recombination at dif by FtsK</p></title><aug><au><snm>Yates</snm><fnm>J</fnm></au><au><snm>Aroyo</snm><fnm>M</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Mol Microbiol</source><pubdate>2003</pubdate><volume>49</volume><issue>1</issue><fpage>241</fpage><lpage>249</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2958.2003.03574.x</pubid><pubid idtype="pmpid" link="fulltext">12823825</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>Asymmetric activation of Xer site-specific recombination by FtsK</p></title><aug><au><snm>Massey</snm><fnm>TH</fnm></au><au><snm>Aussel</snm><fnm>L</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>EMBO Rep</source><pubdate>2004</pubdate><volume>5</volume><issue>4</issue><fpage>399</fpage><lpage>404</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/sj.embor.7400116</pubid><pubid idtype="pmcid">1299027</pubid><pubid idtype="pmpid">15031713</pubid></pubidlist></xrefbib></bibl><bibl id="B18"><title><p>Asymmetric DNA requirements in Xer recombination activation by FtsK</p></title><aug><au><snm>Bonn&#233;</snm><fnm>L</fnm></au><au><snm>Bigot</snm><fnm>S</fnm></au><au><snm>Chevalier</snm><fnm>F</fnm></au><au><snm>Allemand</snm><fnm>JF</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><volume>37</volume><issue>7</issue><fpage>2371</fpage><lpage>2380</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2673442</pubid><pubid idtype="pmpid">19246541</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>oriC region and replication termination site, dif, of the Xanthomonas campestris pv. campestris 17 chromosome</p></title><aug><au><snm>Yen</snm><fnm>MR</fnm></au><au><snm>Lin</snm><fnm>NT</fnm></au><au><snm>Hung</snm><fnm>CH</fnm></au><au><snm>Choy</snm><fnm>KT</fnm></au><au><snm>Weng</snm><fnm>SF</fnm></au><au><snm>Tseng</snm><fnm>YH</fnm></au></aug><source>Appl Environ Microbiol</source><pubdate>2002</pubdate><volume>68</volume><fpage>2924</fpage><lpage>2933</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/AEM.68.6.2924-2933.2002</pubid><pubid idtype="pmcid">123971</pubid><pubid idtype="pmpid">12039751</pubid></pubidlist></xrefbib></bibl><bibl id="B20"><title><p>FtsK-dependent dimer resolution on multiple chromosomes in the pathogen Vibrio cholerae</p></title><aug><au><snm>Val</snm><fnm>ME</fnm></au><au><snm>Kennedy</snm><fnm>SP</fnm></au><au><snm>El Karoui</snm><fnm>M</fnm></au><au><snm>Bonn&#233;</snm><fnm>L</fnm></au><au><snm>Chevalier</snm><fnm>F</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>PLoS Genet</source><pubdate>2008</pubdate><volume>4</volume><fpage>e1000201</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1000201</pubid><pubid idtype="pmcid">2533119</pubid><pubid idtype="pmpid">18818731</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>The bifunctional FtsK protein mediates chromosome partitioning and cell division in Caulobacter</p></title><aug><au><snm>Wang</snm><fnm>SC</fnm></au><au><snm>West</snm><fnm>L</fnm></au><au><snm>Shapiro</snm><fnm>L</fnm></au></aug><source>J Bacteriol</source><pubdate>2006</pubdate><volume>188</volume><issue>4</issue><fpage>1497</fpage><lpage>1508</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.188.4.1497-1508.2006</pubid><pubid idtype="pmcid">1367234</pubid><pubid idtype="pmpid">16452433</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>Site-specific recombination at dif by Haemophilus influenzae XerC</p></title><aug><au><snm>Neilson</snm><fnm>L</fnm></au><au><snm>Blakely</snm><fnm>G</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>Mol Microbiol</source><pubdate>1999</pubdate><volume>31</volume><fpage>915</fpage><lpage>926</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2958.1999.01231.x</pubid><pubid idtype="pmpid" link="fulltext">10048034</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Identification and characterization of the dif Site from Bacillus subtilis</p></title><aug><au><snm>Sciochetti</snm><fnm>SA</fnm></au><au><snm>Piggot</snm><fnm>PJ</fnm></au><au><snm>Blakely</snm><fnm>GW</fnm></au></aug><source>J Bacteriol</source><pubdate>2001</pubdate><volume>183</volume><fpage>1058</fpage><lpage>1068</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.183.3.1058-1068.2001</pubid><pubid idtype="pmcid">94974</pubid><pubid idtype="pmpid">11208805</pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>Restriction of the activity of the recombination site dif to a small zone of the Escherichia coli chromosome</p></title><aug><au><snm>Cornet</snm><fnm>F</fnm></au><au><snm>Louarn</snm><fnm>J</fnm></au><au><snm>Patte</snm><fnm>J</fnm></au><au><snm>Louarn</snm><fnm>JM</fnm></au></aug><source>Genes Dev</source><pubdate>1996</pubdate><volume>10</volume><fpage>1152</fpage><lpage>1161</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gad.10.9.1152</pubid><pubid idtype="pmpid" link="fulltext">8654930</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>The unconventional Xer recombination machinery of Streptococci/Lactococci</p></title><aug><au><snm>Le Bourgeois</snm><fnm>P</fnm></au><au><snm>Bugarel</snm><fnm>M</fnm></au><au><snm>Campo</snm><fnm>N</fnm></au><au><snm>Daveran-Mingot</snm><fnm>ML</fnm></au><au><snm>Labont&#233;</snm><fnm>J</fnm></au><au><snm>Lanfranchi</snm><fnm>D</fnm></au><au><snm>Lautier</snm><fnm>T</fnm></au><au><snm>Pag&#232;s</snm><fnm>C</fnm></au><au><snm>Ritzenthaler</snm><fnm>P</fnm></au></aug><source>PLoS Genet</source><pubdate>2007</pubdate><volume>3</volume><fpage>e117</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.0030117</pubid><pubid idtype="pmcid">1914069</pubid><pubid idtype="pmpid">17630835</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Are two better than one? Analysis of an FtsK/Xer recombination system that uses a single recombinase</p></title><aug><au><snm>Nolivos</snm><fnm>S</fnm></au><au><snm>Pages</snm><fnm>C</fnm></au><au><snm>Rousseau</snm><fnm>P</fnm></au><au><snm>Le Bourgeois</snm><fnm>P</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au></aug><source>Nucleic Acids Res</source><volume>38</volume><issue>19</issue><fpage>6477</fpage><lpage>6489</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkq507</pubid><pubid idtype="pmcid">2965235</pubid><pubid idtype="pmpid">20542912</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Chromosomal constraints in Gram-positive bacteria revealed by artificial inversions</p></title><aug><au><snm>Campo</snm><fnm>N</fnm></au><au><snm>Dias</snm><fnm>MJ</fnm></au><au><snm>Daveran-Mingot</snm><fnm>ML</fnm></au><au><snm>Ritzenthaler</snm><fnm>P</fnm></au><au><snm>Le Bourgeois</snm><fnm>P</fnm></au></aug><source>Mol Microbiol</source><pubdate>2004</pubdate><volume>51</volume><fpage>511</fpage><lpage>522</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1046/j.1365-2958.2003.03847.x</pubid><pubid idtype="pmpid" link="fulltext">14756790</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Plasmids carrying cloned fragments of RF DNA from the filamentous phage (phi)Lf can be integrated into the host chromosome via site-specific integration and homologous recombination</p></title><aug><au><snm>Lin</snm><fnm>NT</fnm></au><au><snm>Chang</snm><fnm>RY</fnm></au><au><snm>Lee</snm><fnm>SJ</fnm></au><au><snm>Tseng</snm><fnm>YH</fnm></au></aug><source>Mol Genet Genomics</source><pubdate>2001</pubdate><volume>266</volume><issue>3</issue><fpage>425</fpage><lpage>435</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1007/s004380100532</pubid><pubid idtype="pmpid" link="fulltext">11713672</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Filamentous phage integration requires the host recombinases XerC and XerD</p></title><aug><au><snm>Huber</snm><fnm>KE</fnm></au><au><snm>Waldor</snm><fnm>MK</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>417</volume><issue>6889</issue><fpage>656</fpage><lpage>659</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature00782</pubid><pubid idtype="pmpid" link="fulltext">12050668</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>VGJ phi, a novel filamentous phage of Vibrio cholerae, integrates into the same chromosomal site as CTX phi</p></title><aug><au><snm>Campos</snm><fnm>J</fnm></au><au><snm>Mart&#237;nez</snm><fnm>E</fnm></au><au><snm>Suzarte</snm><fnm>E</fnm></au><au><snm>Rodr&#237;guez</snm><fnm>BL</fnm></au><au><snm>Marrero</snm><fnm>K</fnm></au><au><snm>Silva</snm><fnm>Y</fnm></au><au><snm>Led&#243;n</snm><fnm>T</fnm></au><au><snm>del Sol</snm><fnm>R</fnm></au><au><snm>Fando</snm><fnm>R</fnm></au></aug><source>J Bacteriol</source><pubdate>2003</pubdate><volume>185</volume><issue>19</issue><fpage>5685</fpage><lpage>5696</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.185.19.5685-5696.2003</pubid><pubid idtype="pmcid">193952</pubid><pubid idtype="pmpid">13129939</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>The single-stranded genome of phage CTX is the form used for integration into the genome of Vibrio cholerae</p></title><aug><au><snm>Val</snm><fnm>ME</fnm></au><au><snm>Bouvier</snm><fnm>M</fnm></au><au><snm>Campos</snm><fnm>J</fnm></au><au><snm>Sherratt</snm><fnm>D</fnm></au><au><snm>Cornet</snm><fnm>F</fnm></au><au><snm>Mazel</snm><fnm>D</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Mol Cell</source><pubdate>2005</pubdate><volume>19</volume><issue>4</issue><fpage>559</fpage><lpage>566</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molcel.2005.07.002</pubid><pubid idtype="pmpid" link="fulltext">16109379</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>A horizontally acquired filamentous phage contributes to the pathogenicity of the plague bacillus</p></title><aug><au><snm>Derbise</snm><fnm>A</fnm></au><au><snm>Chenal-Francisque</snm><fnm>V</fnm></au><au><snm>Pouillot</snm><fnm>F</fnm></au><au><snm>Fayolle</snm><fnm>C</fnm></au><au><snm>Pr&#233;vost</snm><fnm>MC</fnm></au><au><snm>M&#233;digue</snm><fnm>C</fnm></au><au><snm>Hinnebusch</snm><fnm>BJ</fnm></au><au><snm>Carniel</snm><fnm>E</fnm></au></aug><source>Mol Microbiol</source><pubdate>2007</pubdate><volume>63</volume><issue>4</issue><fpage>1145</fpage><lpage>1157</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2006.05570.x</pubid><pubid idtype="pmpid" link="fulltext">17238929</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>VEJ{phi}, a novel filamentous phage of Vibrio cholerae able to transduce the cholera toxin genes</p></title><aug><au><snm>Campos</snm><fnm>J</fnm></au><au><snm>Mart&#237;nez</snm><fnm>E</fnm></au><au><snm>Izquierdo</snm><fnm>Y</fnm></au><au><snm>Fando</snm><fnm>R</fnm></au></aug><source>Microbiology</source><pubdate>2010</pubdate><volume>156</volume><issue>Pt 1</issue><fpage>108</fpage><lpage>115</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1099/mic.0.032235-0</pubid><pubid idtype="pmpid" link="fulltext">19833774</pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Molecular keys of the tropism of integration of the cholera toxin phage</p></title><aug><au><snm>Das</snm><fnm>B</fnm></au><au><snm>Bischerour</snm><fnm>J</fnm></au><au><snm>Val</snm><fnm>ME</fnm></au><au><snm>Barre</snm><fnm>FX</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2010</pubdate><volume>107</volume><issue>9</issue><fpage>4377</fpage><lpage>4382</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0910212107</pubid><pubid idtype="pmcid">2840090</pubid><pubid idtype="pmpid">20133778</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>Smarter than the average phage</p></title><aug><au><snm>Blakely</snm><fnm>GW</fnm></au></aug><source>Mol Microbiol</source><pubdate>2004</pubdate><volume>54</volume><issue>4</issue><fpage>851</fpage><lpage>854</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2004.04330.x</pubid><pubid idtype="pmpid" link="fulltext">15522071</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>Characterization of XerC- and XerD-dependent CTX phage integration in Vibrio cholerae</p></title><aug><au><snm>McLeod</snm><fnm>SM</fnm></au><au><snm>Waldor</snm><fnm>MK</fnm></au></aug><source>Mol Microbiol</source><pubdate>2004</pubdate><volume>54</volume><issue>4</issue><fpage>935</fpage><lpage>947</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2004.04309.x</pubid><pubid idtype="pmpid" link="fulltext">15522078</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>Asymmetric substitution patterns in the two DNA strands of bacteria</p></title><aug><au><snm>Lobry</snm><fnm>JR</fnm></au></aug><source>Mol Biol Evol</source><pubdate>1996</pubdate><volume>13</volume><fpage>660</fpage><lpage>665</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">8676740</pubid></xrefbib></bibl><bibl id="B38"><title><p>Mutational bias suggests that replication termination occurs near the dif site, not at Ter sites</p></title><aug><au><snm>Hendrickson</snm><fnm>H</fnm></au><au><snm>Lawrence</snm><fnm>JG</fnm></au></aug><source>Mol Microbiol</source><pubdate>2007</pubdate><volume>64</volume><fpage>42</fpage><lpage>56</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2007.05596.x</pubid><pubid idtype="pmpid" link="fulltext">17376071</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>Evidence for a fixed termination site of chromosome replication in Escherichia coli K12</p></title><aug><au><snm>Louarn</snm><fnm>J</fnm></au><au><snm>Patte</snm><fnm>J</fnm></au><au><snm>Louarn</snm><fnm>JM</fnm></au></aug><source>J Mol Biol</source><pubdate>1977</pubdate><volume>115</volume><fpage>295</fpage><lpage>314</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/0022-2836(77)90156-5</pubid><pubid idtype="pmpid" link="fulltext">338909</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>Inhibition of replication forks exiting the terminus region of the Escherichia coli chromosome occurs at two loci separated by 5 min</p></title><aug><au><snm>de Massy</snm><fnm>B</fnm></au><au><snm>B&#233;jar</snm><fnm>S</fnm></au><au><snm>Louarn</snm><fnm>J</fnm></au><au><snm>Louarn</snm><fnm>JM</fnm></au><au><snm>Bouch&#233;</snm><fnm>JP</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1987</pubdate><volume>84</volume><fpage>1759</fpage><lpage>1763</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.84.7.1759</pubid><pubid idtype="pmcid">304520</pubid><pubid idtype="pmpid">3550797</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>The terminus region of the Escherichia coli chromosome contains two separate loci that exhibit polar inhibition of replication</p></title><aug><au><snm>Hill</snm><fnm>TM</fnm></au><au><snm>Henson</snm><fnm>JM</fnm></au><au><snm>Kuempel</snm><fnm>PL</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1987</pubdate><volume>84</volume><fpage>1754</fpage><lpage>1758</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.84.7.1754</pubid><pubid idtype="pmcid">304519</pubid><pubid idtype="pmpid">3550796</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Termination structures in the Escherichia coli chromosome replication fork trap</p></title><aug><au><snm>Duggin</snm><fnm>IG</fnm></au><au><snm>Bell</snm><fnm>SD</fnm></au></aug><source>J Mol Biol</source><pubdate>2009</pubdate><volume>387</volume><fpage>532</fpage><lpage>539</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.jmb.2009.02.027</pubid><pubid idtype="pmpid" link="fulltext">19233209</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>The replication fork trap and termination of chromosome replication</p></title><aug><au><snm>Duggin</snm><fnm>IG</fnm></au><au><snm>Wake</snm><fnm>RG</fnm></au><au><snm>Bell</snm><fnm>SD</fnm></au><au><snm>Hill</snm><fnm>TM</fnm></au></aug><source>Mol Microbiol</source><pubdate>2008</pubdate><volume>70</volume><fpage>1323</fpage><lpage>1333</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.2008.06500.x</pubid><pubid idtype="pmpid" link="fulltext">19019156</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>A greedy algorithm for aligning DNA sequences</p></title><aug><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Schwartz</snm><fnm>S</fnm></au><au><snm>Wagner</snm><fnm>L</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au></aug><source>J Comput Biol</source><pubdate>2000</pubdate><volume>7</volume><fpage>203</fpage><lpage>214</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1089/10665270050081478</pubid><pubid idtype="pmpid" link="fulltext">10890397</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Profile hidden Markov models</p></title><aug><au><snm>Eddy</snm><fnm>SR</fnm></au></aug><source>Bioinformatics</source><pubdate>1998</pubdate><volume>14</volume><fpage>755</fpage><lpage>763</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/14.9.755</pubid><pubid idtype="pmpid" link="fulltext">9918945</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>The dif/Xer recombination systems in proteobacteria</p></title><aug><au><snm>Carnoy</snm><fnm>C</fnm></au><au><snm>Roten</snm><fnm>CA</fnm></au></aug><source>PLoS One</source><pubdate>2009</pubdate><volume>4</volume><fpage>e6531</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pone.0006531</pubid><pubid idtype="pmcid">2731167</pubid><pubid idtype="pmpid">19727445</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>YASS: enhancing the sensitivity of DNA similarity search</p></title><aug><au><snm>Noe</snm><fnm>L</fnm></au><au><snm>Kucherov</snm><fnm>G</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><issue>33 Web</issue><fpage>W540</fpage><lpage>543</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gki478</pubid><pubid idtype="pmcid">1160238</pubid><pubid idtype="pmpid">15980530</pubid></pubidlist></xrefbib></bibl><bibl id="B48"><title><p>The GC Skew Index: a measure of genomic compositional asymmetry and the degree of replicational selection</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>Evol Bioinform Online</source><pubdate>2007</pubdate><volume>3</volume><fpage>159</fpage><lpage>168</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2684130</pubid><pubid idtype="pmpid">19461976</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Origin of replication in circular prokaryotic chromosomes</p></title><aug><au><snm>Worning</snm><fnm>P</fnm></au><au><snm>Jensen</snm><fnm>LJ</fnm></au><au><snm>Hallin</snm><fnm>PF</fnm></au><au><snm>Staerfeldt</snm><fnm>HH</fnm></au><au><snm>Ussery</snm><fnm>DW</fnm></au></aug><source>Environ Microbiol</source><pubdate>2006</pubdate><volume>8</volume><issue>2</issue><fpage>353</fpage><lpage>361</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1462-2920.2005.00917.x</pubid><pubid idtype="pmpid" link="fulltext">16423021</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Analysis of the terminus region of the Caulobacter crescentus chromosome and identification of the dif site</p></title><aug><au><snm>Jensen</snm><fnm>RB</fnm></au></aug><source>J Bacteriol</source><pubdate>2006</pubdate><volume>188</volume><fpage>6016</fpage><lpage>6019</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.00330-06</pubid><pubid idtype="pmcid">1540080</pubid><pubid idtype="pmpid">16885470</pubid></pubidlist></xrefbib></bibl><bibl id="B51"><title><p>Recombinase binding specificity at the chromosome dimer resolution site dif of Escherichia coli</p></title><aug><au><snm>Hayes</snm><fnm>F</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>J Mol Biol</source><pubdate>1997</pubdate><volume>266</volume><issue>3</issue><fpage>525</fpage><lpage>537</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.1996.0828</pubid><pubid idtype="pmpid" link="fulltext">9067608</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>Evidence of a ter specific binding protein essential for the termination reaction of DNA replication in Escherichia coli</p></title><aug><au><snm>Kobayashi</snm><fnm>T</fnm></au><au><snm>Hidaka</snm><fnm>M</fnm></au><au><snm>Horiuchi</snm><fnm>T</fnm></au></aug><source>EMBO J</source><pubdate>1989</pubdate><volume>8</volume><fpage>2435</fpage><lpage>2441</lpage><xrefbib><pubidlist><pubid idtype="pmcid">401190</pubid><pubid idtype="pmpid">2551684</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>Characterization of DinR, the Bacillus subtilis SOS repressor</p></title><aug><au><snm>Winterling</snm><fnm>KW</fnm></au><au><snm>Levine</snm><fnm>AS</fnm></au><au><snm>Yasbin</snm><fnm>RE</fnm></au><au><snm>Woodgate</snm><fnm>R</fnm></au></aug><source>J Bacteriol</source><pubdate>1997</pubdate><volume>179</volume><issue>5</issue><fpage>1698</fpage><lpage>1703</lpage><xrefbib><pubidlist><pubid idtype="pmcid">178884</pubid><pubid idtype="pmpid">9045831</pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Is there a role for replication fork asymmetry in the distribution of genes in bacterial genomes?</p></title><aug><au><snm>Rocha</snm><fnm>EP</fnm></au></aug><source>Trends Microbiol</source><pubdate>2002</pubdate><volume>10</volume><fpage>393</fpage><lpage>395</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S0966-842X(02)02420-4</pubid><pubid idtype="pmpid" link="fulltext">12217498</pubid></pubidlist></xrefbib></bibl><bibl id="B55"><title><p>G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Mori</snm><fnm>K</fnm></au><au><snm>Ikeda</snm><fnm>K</fnm></au><au><snm>Matsuzaki</snm><fnm>T</fnm></au><au><snm>Kobayashi</snm><fnm>Y</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>Bioinformatics</source><pubdate>2003</pubdate><volume>19</volume><fpage>305</fpage><lpage>306</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/19.2.305</pubid><pubid idtype="pmpid" link="fulltext">12538262</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>G-language System as a platform for large-scale analysis of high-throughput omics data</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>Journal of Pesticide Science</source><pubdate>2006</pubdate><volume>31</volume><fpage>282</fpage><lpage>288</lpage><xrefbib><pubid idtype="doi">10.1584/jpestics.31.282</pubid></xrefbib></bibl><bibl id="B57"><title><p>Computational Genome Analysis Using The G-language System</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Suzuki</snm><fnm>H</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>Genes, Genomes and Genomics</source><pubdate>2008</pubdate><volume>2</volume><fpage>1</fpage><lpage>13</lpage></bibl><bibl id="B58"><title><p>KEGG for representation and analysis of molecular networks involving diseases and drugs</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Furumichi</snm><fnm>M</fnm></au><au><snm>Tanabe</snm><fnm>M</fnm></au><au><snm>Hirakawa</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2009</pubdate><issue>38 Database</issue><fpage>D335</fpage><lpage>360</lpage></bibl><bibl id="B59"><title><p>NCBI FTP Repository</p></title><pubdate>2009</pubdate><url>http://www.ncbi.nlm.nih.gov/Ftp</url></bibl><bibl id="B60"><title><p>Interactions of the site-specific recombinases XerC and XerD with the recombination site dif</p></title><aug><au><snm>Blakely</snm><fnm>GW</fnm></au><au><snm>Sherratt</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1994</pubdate><volume>22</volume><fpage>5613</fpage><lpage>5620</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/22.25.5613</pubid><pubid idtype="pmcid">310124</pubid><pubid idtype="pmpid">7838714</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice</p></title><aug><au><snm>Thompson</snm><fnm>JD</fnm></au><au><snm>Higgins</snm><fnm>DG</fnm></au><au><snm>Gibson</snm><fnm>TJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1994</pubdate><volume>22</volume><fpage>4673</fpage><lpage>4680</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/22.22.4673</pubid><pubid idtype="pmcid">308517</pubid><pubid idtype="pmpid">7984417</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Hidden Markov model speed heuristic and iterative HMM search procedure</p></title><aug><au><snm>Johnson</snm><fnm>LS</fnm></au><au><snm>Eddy</snm><fnm>SR</fnm></au><au><snm>Portugaly</snm><fnm>E</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2010</pubdate><volume>11</volume><fpage>431</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-11-431</pubid><pubid idtype="pmcid">2931519</pubid><pubid idtype="pmpid">20718988</pubid></pubidlist></xrefbib></bibl><bibl id="B63"><title><p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p></title><aug><au><snm>Altschul</snm><fnm>SF</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Schaffer</snm><fnm>AA</fnm></au><au><snm>Zhang</snm><fnm>J</fnm></au><au><snm>Zhang</snm><fnm>Z</fnm></au><au><snm>Miller</snm><fnm>W</fnm></au><au><snm>Lipman</snm><fnm>DJ</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1997</pubdate><volume>25</volume><issue>17</issue><fpage>3389</fpage><lpage>3402</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/25.17.3389</pubid><pubid idtype="pmcid">146917</pubid><pubid idtype="pmpid">9254694</pubid></pubidlist></xrefbib></bibl><bibl id="B64"><title><p>Hidden Markov models for detecting remote protein homologies</p></title><aug><au><snm>Karplus</snm><fnm>K</fnm></au><au><snm>Barrett</snm><fnm>C</fnm></au><au><snm>Hughey</snm><fnm>R</fnm></au></aug><source>Bioinformatics</source><pubdate>1998</pubdate><volume>14</volume><issue>10</issue><fpage>846</fpage><lpage>856</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/14.10.846</pubid><pubid idtype="pmpid" link="fulltext">9927713</pubid></pubidlist></xrefbib></bibl><bibl id="B65"><title><p>Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements</p></title><aug><au><snm>Schaffer</snm><fnm>AA</fnm></au><au><snm>Aravind</snm><fnm>L</fnm></au><au><snm>Madden</snm><fnm>TL</fnm></au><au><snm>Shavirin</snm><fnm>S</fnm></au><au><snm>Spouge</snm><fnm>JL</fnm></au><au><snm>Wolf</snm><fnm>YI</fnm></au><au><snm>Koonin</snm><fnm>EV</fnm></au><au><snm>Altschul</snm><fnm>SF</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2001</pubdate><volume>29</volume><fpage>2994</fpage><lpage>3005</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/29.14.2994</pubid><pubid idtype="pmcid">55814</pubid><pubid idtype="pmpid">11452024</pubid></pubidlist></xrefbib></bibl><bibl id="B66"><title><p>Application of Protein Structure Alignments to Iterated Hidden Markov Model Protocols for Structure Prediction</p></title><aug><au><snm>Scheeff</snm><fnm>ED</fnm></au><au><snm>Bourne</snm><fnm>PE</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>410</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-7-410</pubid><pubid idtype="pmcid">1622756</pubid><pubid idtype="pmpid">16970830</pubid></pubidlist></xrefbib></bibl><bibl id="B67"><title><p>Perl module String::Approx 3.26</p></title><url>http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm</url></bibl><bibl id="B68"><title><p>Analyzing genomes with cumulative skew diagrams</p></title><aug><au><snm>Grigoriev</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1998</pubdate><volume>26</volume><fpage>2286</fpage><lpage>2290</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/26.10.2286</pubid><pubid idtype="pmcid">147580</pubid><pubid idtype="pmpid">9580676</pubid></pubidlist></xrefbib></bibl><bibl id="B69"><title><p>Noise-reduction filtering for accurate detection of replication termini in bacterial genomes</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Saito</snm><fnm>R</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>FEBS Lett</source><pubdate>2007</pubdate><volume>581</volume><fpage>253</fpage><lpage>258</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.febslet.2006.12.021</pubid><pubid idtype="pmpid" link="fulltext">17188685</pubid></pubidlist></xrefbib></bibl><bibl id="B70"><title><p>Quantitative analysis of replication-related mutation and selection pressures in bacterial chromosomes and plasmids using generalised GC skew index</p></title><aug><au><snm>Arakawa</snm><fnm>K</fnm></au><au><snm>Suzuki</snm><fnm>H</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>BMC Genomics</source><pubdate>2009</pubdate><volume>10</volume><fpage>640</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2164-10-640</pubid><pubid idtype="pmcid">2804667</pubid><pubid idtype="pmpid">20042086</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>
