<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-8-187</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>ZPS: visualization of recent adaptive evolution of proteins</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Chattopadhyay</snm>
               <fnm>Sujay</fnm>
               <insr iid="I1"/>
               <email>sujayc@u.washington.edu</email>
            </au>
            <au id="A2">
               <snm>Dykhuizen</snm>
               <mi>E</mi>
               <fnm>Daniel</fnm>
               <insr iid="I2"/>
               <email>dedykh01@gwise.louisville.edu</email>
            </au>
            <au id="A3">
               <snm>Sokurenko</snm>
               <mi>V</mi>
               <fnm>Evgeni</fnm>
               <insr iid="I1"/>
               <email>evs@u.washington.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Microbiology, University of Washington, Seattle, WA 98195 USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Biology, University of Louisville, Louisville, KY 40292 USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>187</fpage>
         <url>http://www.biomedcentral.com/1471-2105/8/187</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17555597</pubid>
               <pubid idtype="doi">10.1186/1471-2105-8-187</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2007</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>07</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>07</day>
               <month>6</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Chattopadhyay et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Detection of adaptive amino acid changes in proteins under recent short-term selection is of great interest for researchers studying microevolutionary processes in microbial pathogens or any other biological species. However, independent occurrence of such point mutations within genetically diverse haplotypes makes it difficult to detect the selection footprint by using traditional molecular evolutionary analyses. The recently developed Zonal Phylogeny (ZP) has been shown to be a useful analytic tool for identifying the footprints of short-term positive selection. ZP separates protein-encoding genes into evolutionarily long-term (with silent diversity) and short-term (without silent diversity) categories, or zones, followed by statistical analysis to detect signs of positive selection in the short-term zone. However, successful broad application of ZP for analysis of large haplotype datasets requires automation of the relatively labor-intensive computational process.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here we present Zonal Phylogeny Software (ZPS), an application that describes the distribution of single nucleotide polymorphisms (SNPs) of synonymous (silent) and non-synonymous (replacement) nature along branches of the DNA tree for any given protein-coding gene locus. Based on this information, ZPS separates the protein variant haplotypes with silent variability (Primary zone) from those that have recently evolved from the Primary zone variants by amino acid changes (External zone). Further comparative analysis of mutational hot-spot frequencies and haplotype diversity between the two zones allows determination of whether the External zone haplotypes emerged under positive selection.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>As a visualization tool, ZPS depicts the protein tree in a DNA tree, indicating the most parsimonious numbers of synonymous and non-synonymous changes along the branches of a maximum-likelihood based DNA tree, along with information on homoplasy, reversion and structural mutation hot-spots. Through zonal differentiation, ZPS allows detection of recent adaptive evolution via selection of advantageous structural mutations, even when the advantage conferred by such mutations is relatively short-term (as in the case of "source-sink" evolutionary dynamics, which may represent a major mode of virulence evolution in microbes).</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Amino acid replacements in proteins may be advantageous in the course of an organism's adaptation to changing conditions in an established habitat or upon its spread into a novel habitat <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Such recently-acquired mutations may occur independently in genetically distinct allelic backgrounds, in small numbers per allele and in different protein regions. This makes it difficult to detect the signals of adaptive SNPs using traditional molecular evolutionary analyses, such as <it>K</it><sub><it>a</it></sub><it>/K</it><sub><it>s </it></sub>(<it>D</it><sub><it>N</it></sub><it>/D</it><sub><it>S</it></sub>) ratio <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, Tajima <it>D </it><abbrgrp><abbr bid="B4">4</abbr></abbrgrp> or Fu &amp; Li <it>D* </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> statistics, primarily due to an overwhelming level of pre-existing neutral SNPs (both synonymous and non-synonymous) in the loci under selection <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Additionally, the adaptive mutations may provide only short-term advantage to the organisms. This occurs in the course of so-called 'source-sink' dynamics of evolution, where species populations are continuously spreading from established, evolutionarily-stable reservoir habitats (sources) into novel, evolutionarily-untested habitats (sinks) that commonly are transient in nature <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In these cases, mutational adaptation to sink habitats may constitute a liability upon the collapse of sink habitat, due to functional trade-offs that these mutations generally demonstrate in the reservoir source habitat. The source-sink dynamic is characteristic, for example, of pathogenicity-adaptive (pathoadaptive) evolution of microbial pathogens <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>We have recently developed Zonal Phylogeny (ZP) analysis, to detect adaptive amino acid changes in proteins under selection during short-term habitat adaptation <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Along each branch in a DNA tree, we indicate the number of synonymous and non-synonymous mutation information. Then, the synonymous-only branches are collapsed in the tree and the DNA tree is converted to a protein tree where each node corresponds to a evolutionarily unique structural variant. This minimizes the effect on the protein tree of nucleotide homoplasy and reversion events that obscure phylogenetic relationships of protein variants. ZP then separates structural variants of the protein into two categories, or zones: those encoded by multiple haplotypes (i.e., differing from each other by only synonymous SNPs) are assigned to the Primary zone, while each of the variants encoded by a single unique haplotype is assigned to the External zone. Accumulation of synonymous substitutions in genes that encode proteins from the Primary zone indicates their circulation over extended evolutionary time, thereby suggesting evolutionary stability of the protein variants. On the contrary, the External zone variants would have evolved relatively recently, because synonymous variation is yet to accumulate within the encoding genes.</p>
         <p>The External zone variants are likely to be under positive rather than neutral or purifying selection (i.e. with mutations being of adaptive rather than of neutral or slightly deleterious nature) when: (i) their number is higher than expected relative to the frequency of Primary zone variants <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>; (ii) the amino acid replacements are more commonly occur in same positions (structural hot spots) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>; (iii) silent SNPs along the connecting branches are relatively rare <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and (iv) haplotype diversity (based on size and frequency of haplotypes) of the External zone is significantly higher than in neutrally-evolving genes <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Such statistical comparisons of the two zones show the unambiguous signature of positive selection in, for example, <it>fimH </it>and <it>papG-II </it>(encoding adhesin genes of mannose- and digalactose-specific fimbriae of uropathogenic strains of <it>Escherichia coli </it>respectively), but not in genes from the same strains that are involved in either fimbrial biogenesis or housekeeping functions <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <p>Here, we present Zonal Phylogeny Software (ZPS) that computerizes ZP. ZPS uses DNA tree topology and haplotype alignment of a gene under analysis to recreate the DNA-based phylogeny, to demarcate the number of synonymous (or silent) and non-synonymous (or structural) changes along each branch, to separate haplotype nodes into Primary and External zones, and then to provide zone-wise information on amino acid substitutions, structural hot-spots and haplotype diversity.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>The ZPS program presented here can be downloaded as zps.pl [see Additional file <supplr sid="S1">1</supplr>] to be run in command prompt under Windows environment. The attempt is, at one hand, to design a visualization tool to have insights onto a gene phylogeny based on distribution of synonymous vs. non-synonymous SNPs, and on the other hand, to incorporate quantitative statistical measures of recent adaptive evolution based on ZP analysis <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <suppl id="S1">
            <title>
               <p>Additional File 1</p>
            </title>
            <text>
               <p>The Perl program code for ZPS.</p>
            </text>
            <file name="1471-2105-8-187-S1.pl">
               <p>Click here for file</p>
            </file>
         </suppl>
         <sec>
            <st>
               <p>Inputs</p>
            </st>
            <p>Two input files are used: (i) a DNA alignment in FASTA format (e.g., &lt;<it>filename</it>> .<it>fasta</it>) [see Additional files <supplr sid="S2">2</supplr> and <supplr sid="S3">3</supplr>] using a DNA alignment software, such as ClustalX <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>; and (ii) a maximum-likelihood DNA tree topology (e.g., &lt;<it>filename</it>> .<it>ml.tre</it>) [see Additional files <supplr sid="S4">4</supplr> and <supplr sid="S5">5</supplr>] generated by PAUP* <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. In the representative haplotype name, the user should only use alphanumeric characters (i.e. only decimal digits and alphabets). To allow for haplotype size/frequency-based analysis, duplicate haplotypes need to be removed in the input files, but with the user marking haplotypes with multiple representatives in the dataset by <it>n</it>&lt; no. of representatives> . For example, if <it>seqA</it>, <it>seqB </it>and <it>seqC </it>haplotypes are identical, the user should use <it>seqAn3 </it>(or <it>seqBn3 </it>or <it>seqCn3) </it>as input. If there is a single representative of a haplotype, the user can use the name as it is and the program would be able to detect it as '<it>n1</it>'.</p>
            <suppl id="S2">
               <title>
                  <p>Additional File 2</p>
               </title>
               <text>
                  <p>The FASTA alignment input files of <it>fumC </it>and <it>fimH </it>genes respectively</p>
               </text>
               <file name="1471-2105-8-187-S2.fast">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S3">
               <title>
                  <p>Additional File 3</p>
               </title>
               <text>
                  <p>The FASTA alignment input files of <it>fumC </it>and <it>fimH </it>genes respectively</p>
               </text>
               <file name="1471-2105-8-187-S3.fast">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S4">
               <title>
                  <p>Additional File 4</p>
               </title>
               <text>
                  <p>The PAUP*-output tree files of <it>fumC </it>and <it>fimH </it>genes respectively as other inputs for ZPS.</p>
               </text>
               <file name="1471-2105-8-187-S4.tre">
                  <p>Click here for file</p>
               </file>
            </suppl>
            <suppl id="S5">
               <title>
                  <p>Additional File 5</p>
               </title>
               <text>
                  <p>The PAUP*-output tree files of <it>fumC </it>and <it>fimH </it>genes respectively as other inputs for ZPS.</p>
               </text>
               <file name="1471-2105-8-187-S5.tre">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Outputs</p>
            </st>
            <p>There is one tree output &#8211; "zp_tree.dnd" where each node name (for example, 'E4-seqA-n3-2S/1N-A77D' or 'P3-seqE-n8-5S/0N') depicts (i) haplotype separation to either the External ('E') or Primary ('P') zone, with intermediate hypothetical (unresolved) nodes marked as 'H'; (ii) followed by an arbitrary number assigned to a protein variant encoded by the haplotype (e.g. 'E4' or 'P3'); (iii) original name of the representative haplotype and the user defined number of haplotypes that are identical to it in the dateset (e.g. 'seqA-n3' or 'seqE-n8'), with ZPS automatically adding '-n1' to the haplotypes with single representatives; (iv) number of synonymous(S)/non-synonymous(N) SNPs along the connecting branch (e.g. '2S/1N' or '5S/0N'), and (v) specification of amino acid changes due to the non-synonymous SNPs (e.g. 'A77D'). The ZPS output tree can be viewed with tree-presenting software, like TreeView <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> or HyperTree <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The latter application also enables usage of color coding to visually distinguish different type of haplotypes and branches. Keeping HyperTree in mind, ZPS generates an additional color-code file, for the output tree file, to color-code the Primary and the External zone representatives. Two color-codes have been used: blue for all the Primary zone haplotypes that exhibit same-protein silent variability and red for all the External zone representatives. To color-view "zp_tree.dnd" in HyperTree, the user needs to 'import colors' calling "color-zp_tree.txt" file.</p>
            <p>There are two analytical outputs: "pairwise-variation.txt" and "analysis-results.txt". The former file details the positions and specific changes along each branch in the tree, while the latter presents (i) the Primary and External zone representatives; (ii) haplotype ratio (as a ratio of the number of External zone haplotypes to the total number of haplotypes in the dataset); (iii) position-wise structural mutation information, both overall and zone-based structural hot-spot frequency (as a ratio of the number of hot-spot structural mutations to the total number of structural mutations), and (iv) calculations of &#945; and Simpson's diversity statistics <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and Discussion</p>
         </st>
         <p>ZPS has been extensively tested with different genes from <it>Escherichia coli </it>of diverse origin <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B9">9</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, <it>Burkholderia cenocepacia </it><abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, <it>Vibrio vulnificus </it>and hepatitis C virus genotype 1 [unpublished data].</p>
         <p>Figure <figr fid="F1">1</figr> shows the color-coded outputs (using HyperTree) of the ZPS tree for two genes &#8211; <it>fumC </it>and <it>fimH </it>&#8211; of <it>E. coli </it>that encode housekeeping enzyme fumarase C and mannose-specific surface adhesin FimH. Even at first glance, one can see a relatively poorly developed External zone in <it>fumC </it>that suggests the presence of strong purifying selection (as expected for a housekeeping gene). At the same time, a massive External zone is quite evident in <it>fimH </it>that indicates relatively extensive recent evolution via amino acid changes.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Comparative view of ZPS-generated trees for <it>fumC </it>and <it>fimH </it>genes of <it>E. coli </it>[9]</p>
            </caption>
            <text>
               <p>Comparative view of ZPS-generated trees for <it>fumC </it>and <it>fimH </it>genes of <it>E. coli </it>[9].</p>
            </text>
            <graphic file="1471-2105-8-187-1"/>
         </fig>
         <p>The "analysis-results.txt" output includes the calculations to compare the patterns of evolution for different genes quantitatively, as shown in Table <tblr tid="T1">1</tblr>. The External zone frequencies of strains, haplotypes and structural hot-spots are significantly higher in <it>fimH </it>than in <it>fumC</it>. The diversity measures (Simpson's index, &#955;, and the &#945; index value) show that the Primary zone &#955; and &#945; values for the two genes are comparable (<it>p </it>> 0.50), suggesting the presence of long-circulated stable structural variants in the population of both FumC and FimH. The haplotype diversity of the Primary zone of <it>fimH </it>or <it>fumC </it>is significantly lower than the haplotype diversity of <it>fimH </it>External zone, but not of <it>fumC </it>External zone. In <it>fimH</it>, the low diversity of the Primary zone compared to the corresponding External zone could be hypothesized to be due to selective sweeps or bottleneck effects. However, the increased diversity of the <it>fimH </it>External zone can only be explained by positive selection, as we found its diversity being significantly higher than the diversity of both zones of <it>fumC </it>and, also, of Primary and External zones of three other genes from same strains &#8211; another housekeeping gene, <it>adk</it>, and type 1 fimbrial biogenesis genes, <it>fimI </it>and <it>fimC </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. At the same time, relatively high diversity was shown for External zone of <it>papG-II </it>gene encoding another, di-galactose-specific <it>E. coli </it>adhesin, indicating that adhesin genes could be prone to accumulation of adaptive amino acid changes under a short-term positive selection <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>.</p>
         <tbl id="T1">
            <title>
               <p>Table 1</p>
            </title>
            <caption>
               <p>Comparison of ZPS statistics for two genes: <it>fumC</it>, expected to be under strong purifying selection against structural variation as a housekeeping gene, and <it>fimH</it>, evolving under strong positive selection through SNPs as shown for genes encoding surface adhesins of pathogenic bacteria. The sample includes identical datasets of 75 strains for the two genes [9]. The <it>p</it>-values for the diversity measures are based on differential zonal haplotype diversity [9], while the other significance values are derived using 2 &#215; 2 &#967; <sup>2 </sup>statistic. P and E denote Primary and External zones respectively</p>
            </caption>
            <tblbdy cols="5">
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>zone</p>
                  </c>
                  <c ca="center">
                     <p>
                        <it>fumC</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <it>fimH</it>
                     </p>
                  </c>
                  <c ca="center">
                     <p>
                        <it>p-values</it>
                     </p>
                  </c>
               </r>
               <r>
                  <c cspan="5">
                     <hr/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>no. of strains</p>
                  </c>
                  <c ca="center">
                     <p>P</p>
                  </c>
                  <c ca="center">
                     <p>69</p>
                  </c>
                  <c ca="center">
                     <p>27</p>
                  </c>
                  <c ca="center">
                     <p>&lt; 0.0001</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>E</p>
                  </c>
                  <c ca="center">
                     <p>6</p>
                  </c>
                  <c ca="center">
                     <p>48</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>no. of haplotypes</p>
                  </c>
                  <c ca="center">
                     <p>P</p>
                  </c>
                  <c ca="center">
                     <p>20</p>
                  </c>
                  <c ca="center">
                     <p>14</p>
                  </c>
                  <c ca="center">
                     <p>&lt; 0.0001</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>E</p>
                  </c>
                  <c ca="center">
                     <p>3</p>
                  </c>
                  <c ca="center">
                     <p>29</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>zone-wise structural hot-spot frequency (no. of hot-spots/total no. of mutations)</p>
                  </c>
                  <c ca="center">
                     <p>P</p>
                  </c>
                  <c ca="center">
                     <p>0.00(0/1)</p>
                  </c>
                  <c ca="center">
                     <p>0.00(0/3)</p>
                  </c>
                  <c ca="center">
                     <p>1.00</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>E</p>
                  </c>
                  <c ca="center">
                     <p>0.00 (0/3)</p>
                  </c>
                  <c ca="center">
                     <p>0.53 (19/36)</p>
                  </c>
                  <c ca="center">
                     <p>0.039</p>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>Simpson's index (&#955;)</p>
                  </c>
                  <c ca="center">
                     <p>P</p>
                  </c>
                  <c ca="center">
                     <p>0.11 &#177; 0.01</p>
                  </c>
                  <c ca="center">
                     <p>0.12 &#177; 0.03</p>
                  </c>
                  <c ca="center">
                     <p>0.002</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>E</p>
                  </c>
                  <c ca="center">
                     <p>0.39 &#177; 0.10</p>
                  </c>
                  <c ca="center">
                     <p>0.07 &#177; 0.01</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
               <r>
                  <c ca="center">
                     <p>&#945; index</p>
                  </c>
                  <c ca="center">
                     <p>P</p>
                  </c>
                  <c ca="center">
                     <p>9.45 &#177; 1.80</p>
                  </c>
                  <c ca="center">
                     <p>11.71 &#177; 3.88</p>
                  </c>
                  <c ca="center">
                     <p>0.005</p>
                  </c>
               </r>
               <r>
                  <c>
                     <p/>
                  </c>
                  <c ca="center">
                     <p>E</p>
                  </c>
                  <c ca="center">
                     <p>2.39 &#177; 1.66</p>
                  </c>
                  <c ca="center">
                     <p>31.00 &#177; 8.25</p>
                  </c>
                  <c>
                     <p/>
                  </c>
               </r>
            </tblbdy>
         </tbl>
         <p>It is noteworthy that an advantage of ZP analysis of the haplotype diversity is that it considers both haplotype richness (i.e. total number of unique haplotypes) as well as frequency distribution (evenness) of these haplotypes in a zone. The latter feature of the diversity index incorporates the idea of relative fitness of a particular haplotype through the extent of its predominance in the sample set (provided the set is large enough, and relatively random).</p>
         <p>To compare performance of ZPS with other commonly used methods for detecting signals of positive selection, we analyzed our datasets for <it>fumC </it>and <it>fimH </it>with codeml program implemented in the PAML package <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. For each gene, we initially used two different models: one-ratio null model of neutral evolution (&#969; &lt; 1) and one-ratio selection model of adaptive evolution (&#969; > 1). For <it>fumC </it>there is no difference (<it>p </it>= 1) between the log likelihood values of neutral (<it>lnL </it>= -1082.13) and selection (<it>lnL </it>= -1082.13) models. For <it>fimH </it>also, the neutral (<it>lnL </it>= -2245.44) and selection (<it>lnL </it>= -2243.58) log likelihood values are not statistically different (<it>p </it>= 0.16), though unlike <it>fumC</it>, the <it>p </it>value shows a possible trend toward selection. Thus, based on the entire tree, codeml was unable to detect unambiguously the presence of positive selection in <it>fimH</it>, demonstrating higher sensitivity of ZPS in this type of analysis. Then we used branch-specific selection model approach and assigned &#969; > 1 to clades containing multiple External zone nodes. For some of such clades on the <it>fimH </it>tree the log likelihood values for the selection model either differed significantly from the neutral model value (<it>p </it>&lt; 0.0001), or differed considerably suggesting a distinct direction of selection (<it>p </it>&lt; 0.11). No such difference was detected for the <it>fumC </it>clade that contained two External zone nodes (<it>p </it>= 0.84). Thus, clade-specific codeml analysis confirmed presence of positive selection for non-synonymous mutations in <it>fimH</it>, but not in <it>fumC</it>. However, unlike codeml, ZPS does not require any preliminary knowledge about the clade composition to detect the selection. At the same time, ZPS can be used in combination with codeml to ease singling out of the clades or branches on gene tree that were derived under positive selection.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Synonymous mutations are generally considered to be selectively neutral and to accumulate randomly at a constant rate for a given gene. ZPS utilizes DNA trees to differentiate haplotypes that have evolved with accumulation of silent variations from those derived only through amino acid replacements, enabling visualization of adaptive structural variations that have recently emerged under positive selection. Information about the presence of mutational hot-spots and comparative zonal statistics on the size and frequency of various haplotypes provides insights into the adaptive evolution of genomic loci in any organism, from virus to human.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p><b>Project name: </b>Zonal Phylogeny Software (ZPS)</p>
         <p>
            <b>Project home page: </b>
            <url>http://faculty.washington.edu/sujayc/zps.shtml</url>
         </p>
         <p><b>Operating systems: </b>Windows</p>
         <p><b>Programming language: </b>Perl</p>
         <p><b>Other requirements: </b>ClustalsX, PAUP* and any tree-viewing software, e.g. TreeView or HyperTree</p>
         <p><b>License: </b>GPL <url>https://sourceforge.net/projects/zps/</url></p>
      </sec>
      <sec>
         <st>
            <p>Abbreviations</p>
         </st>
         <p>ZP &#8211; Zonal Phylogeny</p>
         <p>ZPS &#8211; Zonal Phylogeny Software</p>
         <p>SNPs &#8211; Single Nucleotide Polymorphisms</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SC designed the software, implemented it and drafted the manuscript. DED contributed to the idea of the zonal phylogeny and helped to draft the manuscript. EVS conceptualized the zonal phylogeny, designed the software and wrote the manuscript. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Scott J. Weissman for critical reading and discussion of the manuscript. Research was supported by grants from the National Institutes of Health.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Ecology and speciation</p>
            </title>
            <aug>
               <au>
                  <snm>Orr</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>TB</fnm>
               </au>
            </aug>
            <source>Trends Ecol Evol</source>
            <pubdate>1998</pubdate>
            <volume>13</volume>
            <fpage>502</fpage>
            <lpage>506</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1016/S0169-5347(98)01511-0</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A map of recent positive selection in the human genome</p>
            </title>
            <aug>
               <au>
                  <snm>Voight</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Kudaravalli</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wen</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Pritchard</snm>
                  <fnm>JK</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <issue>3</issue>
            <fpage>e72</fpage>
            <lpage/>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pbio.0040446</pubid>
                  <pubid idtype="pmpid" link="fulltext">16494531</pubid>
                  <pubid idtype="pmcid">1382018</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions</p>
            </title>
            <aug>
               <au>
                  <snm>Nei</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Gojobori</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1986</pubdate>
            <volume>3</volume>
            <fpage>418</fpage>
            <lpage>426</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">3444411</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Statistical method for testing the neutral mutation hypothesis by DNA Polymorphisms</p>
            </title>
            <aug>
               <au>
                  <snm>Tajima</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1989</pubdate>
            <volume>123</volume>
            <fpage>585</fpage>
            <lpage>595</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1203831</pubid>
                  <pubid idtype="pmpid" link="fulltext">2513255</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Statistical tests of neutrality of mutations</p>
            </title>
            <aug>
               <au>
                  <snm>Fu</snm>
                  <fnm>YX</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1993</pubdate>
            <volume>133</volume>
            <fpage>693</fpage>
            <lpage>709</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1205353</pubid>
                  <pubid idtype="pmpid" link="fulltext">8454210</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Selection footprint in the FimH adhesin shows pathoadaptive niche differentiation in <it>Escherichia coli</it></p>
            </title>
            <aug>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Feldgarden</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Trintchina</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Avagyan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1373</fpage>
            <lpage>1383</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/molbev/msh136</pubid>
                  <pubid idtype="pmpid" link="fulltext">15044596</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Sources, sinks, and population regulation</p>
            </title>
            <aug>
               <au>
                  <snm>Pulliam</snm>
                  <fnm>HR</fnm>
               </au>
            </aug>
            <source>Am Nat</source>
            <pubdate>1988</pubdate>
            <volume>132</volume>
            <fpage>652</fpage>
            <lpage>661</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1086/284880</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Source-sink dynamics of virulence evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Gomulkiewicz</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Nat Rev Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>4</volume>
            <fpage>548</fpage>
            <lpage>555</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrmicro1446</pubid>
                  <pubid idtype="pmpid" link="fulltext">16778839</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Haplotype diversity in "source-sink" dynamics of <it>Escherichia coli urovirulence</it></p>
            </title>
            <aug>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Feldgarden</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weissman</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>van Belle</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2007</pubdate>
            <volume>64</volume>
            <fpage>204</fpage>
            <lpage>214</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-006-0063-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">17177088</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
               <au>
                  <snm>Plewniak</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Jeanmougin</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>4876</fpage>
            <lpage>4882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147148</pubid>
                  <pubid idtype="pmpid" link="fulltext">9396791</pubid>
                  <pubid idtype="doi">10.1093/nar/25.24.4876</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <source>PAUP*: Phylogenetic Analysis Using Parsimony and Other Methods (software)</source>
            <publisher>Sunderland, MA: Sinauer Associates</publisher>
            <pubdate>2000</pubdate>
         </bibl>
         <bibl id="B12">
            <title>
               <p>TREEVIEW: An application to display phylogenetic trees on personal computers</p>
            </title>
            <aug>
               <au>
                  <snm>Page</snm>
                  <fnm>RDM</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>357</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8902363</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Visualizing large hierarchical clusters in hyperbolic space</p>
            </title>
            <aug>
               <au>
                  <snm>Bingham</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sudarsanam</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>660</fpage>
            <lpage>661</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.7.660</pubid>
                  <pubid idtype="pmpid" link="fulltext">11038340</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Clonal analysis reveals high rate of structural mutations in fimbrial adhesions of extraintestinal pathogenic <it>Escherichia coli</it></p>
            </title>
            <aug>
               <au>
                  <snm>Weissman</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Aprikian</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Obata-Yasuoka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yarova-Yarovaya</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Stapleton</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ba-Thein</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2006</pubdate>
            <volume>59</volume>
            <fpage>975</fpage>
            <lpage>988</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1380272</pubid>
                  <pubid idtype="pmpid" link="fulltext">16420365</pubid>
                  <pubid idtype="doi">10.1111/j.1365-2958.2005.04985.x</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Selection for functional diversity drives accumulation of point mutations in Dr adhesions of <it>Escherichia coli</it></p>
            </title>
            <aug>
               <au>
                  <snm>Korotkova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tabata</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Beskhlebnaya</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Vigdorovich</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Kaiser</snm>
                  <fnm>BK</fnm>
               </au>
               <au>
                  <snm>Strong</snm>
                  <fnm>RK</fnm>
               </au>
               <au>
                  <snm>Dykhuizen</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Sokurenko</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Moseley</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2007</pubdate>
            <volume>64</volume>
            <fpage>180</fpage>
            <lpage>194</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.2007.05648.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">17376081</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Conservation of a novel protein associated with an antibiotic efflux operon in <it>Burkholderia cenocepacia</it></p>
            </title>
            <aug>
               <au>
                  <snm>Nair</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Joachimiak</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Chattopadhyay</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Montono</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Burns</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>FEMS Microbiol Lett</source>
            <pubdate>2005</pubdate>
            <volume>245</volume>
            <fpage>337</fpage>
            <lpage>344</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.femsle.2005.03.027</pubid>
                  <pubid idtype="pmpid" link="fulltext">15837391</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1998</pubdate>
            <volume>15</volume>
            <fpage>568</fpage>
            <lpage>573</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9580986</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>908</fpage>
            <lpage>917</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12032247</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>

