<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2004-5-3-r20</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Method</dochead>
      <bibl rating="0">
         <title>
            <p>Visualization of the phylogenetic content of five genomes using dekapentagonal maps</p>
         </title>
         <aug>
            <au id="A1" ca="no" ce="no" pa="no" da="no">
               <snm>Zhaxybayeva</snm>
               <fnm>Olga</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2" ca="no" ce="no" pa="no" da="no">
               <snm>Hamel</snm>
               <fnm>Lutz</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A3" ca="no" ce="no" pa="no" da="no">
               <snm>Raymond</snm>
               <fnm>Jason</fnm>
               <insr iid="I3"/>
            </au>
            <au id="A4" ca="yes" ce="no" pa="no" da="no">
               <snm>Gogarten</snm>
               <mnm>Peter</mnm>
               <fnm>J</fnm>
               <insr iid="I1"/>
               <email>gogarten@uconn.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA</p>
            </ins>
            <ins id="I2">
               <p>Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI 02881, USA</p>
            </ins>
            <ins id="I3">
               <p>Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ 85287-1604, USA</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>3</issue>
         <fpage>R20</fpage>
         <url>http://genomebiology.com/2004/5/3/R20</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15003123</pubid>
               <pubid idtype="doi" link="fulltext">10.1186/gb-2004-5-3-r20</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>4</day>
               <month>11</month>
               <year>2003</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>18</day>
               <month>12</month>
               <year>2003</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>13</day>
               <month>1</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>16</day>
               <month>2</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Zhaxybayeva et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <shorttitle>
         <p>Visualization of the phylogenetic content of five genomes using dekapentagonal maps</p>
      </shorttitle>
      <shortabs>
         <p>Dekapentagonal maps depict phylogenetic information for orthologous genes present in five genomes, and provide a pre-screen for putatively horizontally transferred genes.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>The methods presented here summarize phylogenetic relationships of genomes in visually appealing and informative figures. Dekapentagonal maps depict phylogenetic information for orthologous genes present in five genomes, and provide a pre-screen for putatively horizontally transferred genes. If the majority of individual gene phylogenies are unresolved, bipartition histograms provide a means of uncovering and analyzing the plurality consensus. Analyses of genomes representing five photosynthetic bacterial phyla and of the prokaryotic contributions to the eukaryotic cell illustrate the utility of the methods.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Transfer of genetic information between divergent organisms has turned the tree of life into a net or web <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, and genomes into mosaics. Different parts of genomes have different histories; therefore representing the history of genome evolution as a single tree appears inconsistent with the data. Nevertheless, the assumption of a tree-like process still underlies many approaches. Recently, we developed a tool that provides an assessment and graphic illustration of the mosaic nature of microbial genomes <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. The tool is based on maximum likelihood (ML) mapping developed by Korbinian Strimmer and Arndt von Haeseler <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. They utilized Bayesian posterior probabilities to assess the phylogenetic information contained in an alignment of four homologous sequences. With four sequences there are only three possible tree topologies, and thus the three posterior probabilities corresponding to these three trees must sum to one. Utilizing a barycentric coordinate system, the resulting probability vector is represented as a point in an equilateral triangle, where the distances of the point to the three sides represent the three probabilities. Strimmer and von Haeseler applied this approach to depict the phylogenetic information content present in a multiple sequence alignment. We adapted this approach to represent the phylogenetic information content present in four completely sequenced genomes (for details and methodology see <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>; for an extension that improves taxon sampling and uses bootstrap support values see <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>). Unfortunately, this approach is limited to the analysis of only four genomes at a time. In many instances, it is interesting to compare more than four genomes simultaneously (for example <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>). The number of possible tree topologies for <it>N</it> taxa is (2<it>N</it> - 5)!/ [2<sup><it>N</it>-3</sup>(<it>N</it> - 2)!] <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, and therefore rises dramatically as <it>N</it> increases. There are 15 possible unrooted tree topologies for five taxa, 105 for six taxa, and so on. Creating a visually appealing graphic representation poses a difficult challenge.</p>
         <p>Here we report a new mapping approach to visualize data from the analyses of five genomes. The utility of this approach is illustrated by applying it to the evolution of photosynthetic bacteria and by dissecting the eukaryotic genome with respect to different prokaryotic contributions. Where the majority of the individual gene phylogenies are unresolved, a histogram giving the frequency of well-supported bipartitions provides a useful complement to the support-value maps.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <p>Using the same dataflow as described in <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, we detect sets of orthologous proteins for five genomes (quintets of orthologous proteins, or QuintOPs), and for each QuintOP we obtain posterior probabilities for each of the possible 15 tree topologies. By analogy with barycentric coordinates in ML mapping, the tree topologies are placed into vertices of a dekapentagon (that is, a polygon with 15 vertices corresponding to the 15 possible unrooted tree topologies), and each probability vector for a dataset corresponds to the point inside the dekapentagon: the vector is defined as the gravicenter of a dekapentagon where the posterior probabilities are considered as weights attached to the dekapentagon's vertices. If the distribution of topologies to the corners of the polygon is given, each probability vector unambiguously maps to a point inside the polygon (see Figure <figr fid="F1">1</figr>). However, the position of a probability vector crucially depends on the arrangement of topologies at the polygon vertices. We consider an arrangement of topologies optimal, if for a genome the probability vectors for all sets of orthologous genes map as closely to the periphery as possible. The optimal dekapentagonal map is only one of many possible projections of the 15-dimensional support-value vectors to two-dimensional space. The tree space containing all possible five taxon trees cannot be embedded into three-dimensional space <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The projection of tree space represented in the dekapentagonal maps highlights the ambiguities of phylogenetic reconstruction and repeated patterns of inconsistency; thus the major evolutionary histories represented by different parts of the genomes are most easily recognized.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Schematic presentation of calculating and plotting probability vectors into a dekapentagon</p>
            </caption>
            <text>
               <p>Schematic presentation of calculating and plotting probability vectors into a dekapentagon. Posterior probabilities associated with each vertex are represented as weights attached to the vertices. Points <it>M</it> indicate locations of center of gravities of vertices that are mentioned in the index associated with each point <it>M</it>. See Materials and methods for details of the calculation of the coordinates.</p>
            </text>
            <graphic file="gb-2004-5-3-r20-1"/>
         </fig>
         <p>It is worth noting that while every probability vector maps to a unique place in the optimized dekapentagonal map, the reverse is not true. A single point inside the dekapentagonal map corresponds to infinitely many probability vectors. For example, a point in the center just indicates that the probabilities for topologies on opposing sites of the dekapentagon cancel each other out, but it does not indicate the identities of these topologies. Also, some points might be located close to one vertex only because the probability vector equally supports the topologies located on both neighboring vertices of the vertex. However, these points are only 'misplaced' because of the fact that the corresponding datasets do not strongly favor one or other topology; that is, these vectors represent unresolved relationships.</p>
         <p>We use a genetic algorithm to find the optimal arrangement of the topologies at the polygon vertices. The optimality criterion is to minimize the sum of shortest distances for each mapped probability vector to the polygon's circumference. We found that the algorithm quickly converges towards solutions that are related to one another by rotation; that is, the neighborhood relations between the different topologies are the same. As our genetic optimization algorithm is a stochastic process, we measure its success on the basis of the probability of convergence. Our confidence that the algorithm did indeed find an optimal solution rises with the probability that on subsequent runs the algorithm can reproduce the same solution and that other solutions found are always inferior to the one deemed optimal. We consistently obtained a convergence rate in the range of 66% to 100%: from 50 independent runs, 33 in one case and 50 in the other converged on the same arrangement, while 17 arrangements in the former case were suboptimal. This suggests that our genetic optimization algorithm does indeed converge on the optimal arrangement.</p>
         <p>Comparative studies have shown that bootstrap values are more conservative measures of support than Bayesian posterior probabilities <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, and therefore they provide a more realistic assessment of the support that the different topologies receive. Also, simulation studies have shown that increase of the size of a dataset by introducing additional homologous sequences improves the accuracy of the reconstruction <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> (see <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> and <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> for recent discussion). Therefore, in addition to plotting posterior probabilities, we also calculated and mapped bootstrap support values for each QuintOP from extended datasets - that is, the datasets containing additional homologous sequences (see <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> for details on the calculation of bootstrap support values from extended datasets).</p>
         <p>We applied both probability mapping according to <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and bootstrap support-value mapping to two different genome quintets. The first is the case of five bacterial genomes representing the five phyla that contain organisms with chlorophyll-based photosynthesis. The other is an interdomain genome quintet consisting of representatives of all three domains of life.</p>
         <sec>
            <st>
               <p>Analysis of five photosynthetic bacterial genomes</p>
            </st>
            <p>For the genome quintet of photosynthetic organisms that we initially analyzed in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, both the posterior probability map and the bootstrap support map show that a plurality of datasets support three tree topologies: numbers 5, 10, and 15 (see Figures <figr fid="F2">2</figr> and <figr fid="F3">3</figr>). The extended datasets (Figure <figr fid="F3">3</figr>) provide a more realistic illustration of the reliability of the individual analyses than the map based on the ML-mapping approach (Figure <figr fid="F2">2</figr>). While the plurality consensus is still discernable in Figure <figr fid="F3">3</figr>, many datasets do not map close to any of the vertices, suggesting that these sets of orthologous proteins cannot discriminate between at least some of the possible phylogenies. One might be tempted to conclude that not much phylogenetic information survived and that the apparent conflicts <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> were due to a lack of resolution only <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. However, each five-taxon tree has two internal branches, that is, two bipartitions that contain phylogenetic information. The smallest quantum of phylogenetic information is the individual bipartition, not the resolved tree topology. In the five-taxon case a bipartition can be viewed as a partially unresolved tree where two taxa are grouped together, while the relationship among the other three taxa remains unresolved. An analysis of the possible bipartitions is a better way to gauge the extent of surviving phylogenetic information and the conflict between the individual datasets than dekapentagonal maps. We summarize the support for the 10 possible bipartitions in the form of a histogram (Figure <figr fid="F4">4</figr>). The bipartition corresponding to the plurality consensus signal for trees 5, 10 and 15 is labeled as bipartition A. This bipartition has plurality support. Xiong <it>et al.</it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp> reported that enzymes involved in (bacterio)chlorophyll biosynthesis are supporting the topology that in the dekapentagonal map is labeled as topology 13. Topology 13 corresponds to two bipartitions labeled as E and G in Figure <figr fid="F4">4</figr>. In our set of 188 QuintOPs, only a few members of the chlorophyll biosynthesis pathway are present: <it>bchB/chlB</it>, <it>bchL/chlL</it> and <it>chlM</it>. The other members of the chlorophyll biosynthesis pathway were not picked up because of the strict requirements imposed on the QuintOP assembly, that is, the requirement that the open reading frames (ORFs) that form a QuintOP mutually pick up each other in all five genomes as top-scoring BLAST hits. The reason that some members of the chlorophyll biosynthetic pathways are not assembled into QuintOPs is that there are multiple paralogous genes present in some of those genomes (especially in the <it>Chlorobium</it> and <it>Chloroflexus</it> genomes), and these prevent proper QuintOPs from being formed. We manually compiled the extended datasets for <it>bchH/chlH</it>, <it>bchI/chlI, bchD/chlD, bchN/chlN</it> genes and calculated the bootstrap support values for bipartitions A, E and G with different phylogenetic methods (Figure <figr fid="F5">5</figr>). In all cases the members of the photosynthetic pathway do not support the plurality bipartition, but significantly support the bipartitions reported by Xiong <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. This suggests that the genes from the chlorophyll biosynthetic pathway have a phylogenetic history different from the apparent plurality consensus.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Posterior probability map for the analyses of five photosynthetic genomes: <it>Synechocystis</it> sp., <it>Chloroflexus aurantiacus</it>, <it>Chlorobium tepidum</it>, <it>Rhodobacter capsulatus</it> and <it>Heliobacillus mobilis</it></p>
               </caption>
               <text>
                  <p>Posterior probability map for the analyses of five photosynthetic genomes: <it>Synechocystis</it> sp., <it>Chloroflexus aurantiacus</it>, <it>Chlorobium tepidum</it>, <it>Rhodobacter capsulatus</it> and <it>Heliobacillus mobilis</it>. Each QuintOP is represented by a point inside the dekapentagon (there are a total of 188 points for 188 sets of orthologs common to the five genomes <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>). The dekapentagon is divided into zones of proximity to topologies: points that fall into one of the 15 zones that correspond to the 15 tree topologies favor either that topology most or several neighboring topologies, and points that fall into the single central zone represent unresolved relationships. The tree topology number (1 to 15) is given first, followed by the number of points per zone in parentheses. Tree topology numbers correspond to the abbreviations described in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Abbreviations: Ca, <it>Chloroflexus aurantiacus</it>; Ct, <it>Chlorobium tepidum</it>; H, <it>Heliobacillus mobilis</it>; R, <it>Rhodobacter capsulatus</it>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-2"/>
            </fig>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Bootstrap support map from extended QuintOPs of five photosynthetic genomes</p>
               </caption>
               <text>
                  <p>Bootstrap support map from extended QuintOPs of five photosynthetic genomes. For notations see legend to Figure <figr fid="F2">2</figr>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Modified Lento-plot for a genome quintet with five photosynthetic bacteria</p>
               </caption>
               <text>
                  <p>Modified Lento-plot for a genome quintet with five photosynthetic bacteria. We summarized the results for 15 trees into 10 possible bipartitions. Each bipartition is labeled on the modified Lento plot <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> by the two taxa that group together (the other three taxa are in an unresolved trifurcation), and by a letter A through J. For each bipartition, the bar above the <it>x</it>-axis gives the number of datasets that support the bipartition and the bar below the <it>x</it>-axis indicates the number of datasets that conflict with this bipartition. This conflict value is calculated as the sum of support for all conflicting bipartitions. The levels of support are color coded. Every bipartition is represented by at least several datasets with significant support. The plurality bipartition (grouping <it>Chlorobium</it> with <it>Rhodobacter</it>) is supported by 32 datasets with bootstrap support 70% or better. However, even more datasets support its conflicting bipartitions, and therefore appear in conflict with the plurality topology. Abbreviations as in Figure <figr fid="F2">2</figr>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Summary of phylogenetic analyses of photosynthetic genes with different tree-reconstruction methods</p>
               </caption>
               <text>
                  <p>Summary of phylogenetic analyses of photosynthetic genes with different tree-reconstruction methods. For each gene (indicated in the first column) sequences from the genome quintet were supplemented with homologous sequences from other photosynthetic bacteria (see Materials and methods for details). Support is shown for the plurality consensus bipartition (compare Figure <figr fid="F4">4</figr>), and for the two bipartitions that correspond to the tree for photosynthetic genes reported in <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Support values for the different methods of phylogenetic reconstruction are color coded.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-5"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Contributions to a eukaryotic genome during its evolution</p>
            </st>
            <p>Genes in eukaryotes are proposed to represent different contributions from different organisms (Figure <figr fid="F6">6</figr>). If appropriate representatives of the bacterial and archaeal domains are chosen, the genes that were acquired from different putative contributors to the eukaryotic lineage can be differentiated through different tree topologies. Here we attempt to partition a eukaryotic genome with respect to the different contributions. We selected the genome quintet containing one well-annotated eukaryote (<it>Saccharomyces cerevisiae</it>), two archaea representing two archaeal kingdoms (the euryarchaeote <it>Archaeoglobus fulgidus</it> and the crenarchaeote <it>Sulfolobus solfataricus</it>), and two bacteria (the alpha-proteobacterium <it>Rhodobacter capsulatus</it> and the Gram-positive bacterium <it>Bacillus subtilis</it>).</p>
            <fig id="F6">
               <title>
                  <p>Figure 6</p>
               </title>
               <caption>
                  <p>Schematic diagram of established and proposed contributions to the eukaryotic genome</p>
               </caption>
               <text>
                  <p>Schematic diagram of established and proposed contributions to the eukaryotic genome. The eukaryotic genome is proposed to contain genes from many different sources. The nucleocytoplasm was proposed to have evolved from an archaeal-like ancestor <abbrgrp><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. This archaeal ancestor was either an organism that branched off before the most recent common ancestor of the today's archaea (as in the traditional rRNA-based tree of life that contains a monophyletic archaeal clade <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>), or it might have been more specifically related to the crenarchaeota (as in the eocyte proposal <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, which results in the archaea being a paraphyletic grouping). Other well-corroborated contributions are the mitochondria and chloroplasts <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>, which evolved from bacterial endosymbionts, and which contributed many genes to the nuclear genome <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. Additional contributions were proposed to have originated from now-extinct organisms <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>, such as the 'chronocyte', and through many single-gene transfers from many different sources that might have been ingested as food by early eukaryotes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-6"/>
            </fig>
            <p>For the interdomain genome quintet (Figures <figr fid="F7">7</figr>, <figr fid="F8">8</figr>) most support-value vectors map close to four vertices: topology 11 (corresponding to the traditional ribosomal RNA tree as described by <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>), topology 12 (supporting the eocyte hypothesis, <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>), topology 9 (predicted for the genes of mitochondrial origin) and topology 4 (eukaryotic homolog with other bacteria). Notably, there are some datasets that support other topologies (see Table <tblr tid="T1">1</tblr>): the large subunit of carbamoyl-phosphate synthase supports topology 2, which groups a euryarchaeote within the Bacteria, and ribosomal protein S3 homologs support topology 15, which groups yeast with <it>Archaeoglobus</it>. The large subunit of carbamoyl-phosphate synthase contains an internal duplication (<abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>, and A. Lazcano, personal communication) and its phylogeny was described as being consistent with an interdomain horizontal gene transfer from the bacteria to the ancestor of the euryarchaeota <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. Topologies 4 and 12 might represent different prokaryotic contributions to the yeast genome, transfers between the two prokaryotic domains, or a single bacterial contribution to the eukaryotic cell <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. In all those datasets that group the eukaryotic homolog with bacterial sequences we were not able to detect any consistent phylogenetic signature. This finding is in agreement with the 'you are what you eat' hypothesis <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, but it also could be due to limited phylogenetic information surviving in the individual datasets.</p>
            <fig id="F7">
               <title>
                  <p>Figure 7</p>
               </title>
               <caption>
                  <p>Posterior probability map of QuintOPs from an interdomain genome quintet</p>
               </caption>
               <text>
                  <p>Posterior probability map of QuintOPs from an interdomain genome quintet. The quintet consists of genomes of yeast <it>Saccharomyces cerevisiae</it> (Y, red), the alpha-proteobacterium <it>Rhodobacter capsulatus</it> (R, green), the Gram-positive bacterium <it>Bacillus subtilis</it> (B, green), the euryarchaeote <it>Archaeoglobus fulgidus</it> (A, blue) and the crenarchaeote <it>Sulfolobus solfataricus</it> (S, blue). There are 53 QuintOPs in this genome quintet. For notations see legend for Figure <figr fid="F2">2</figr>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-7"/>
            </fig>
            <fig id="F8">
               <title>
                  <p>Figure 8</p>
               </title>
               <caption>
                  <p>Bootstrap support map from extended QuintOPs for an interdomain quintet</p>
               </caption>
               <text>
                  <p>Bootstrap support map from extended QuintOPs for an interdomain quintet. For notations see legends for Figures <figr fid="F2">2</figr> and <figr fid="F7">7</figr>.</p>
               </text>
               <graphic file="gb-2004-5-3-r20-8"/>
            </fig>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>List of QuintOPs that support the indicated tree topology with bootstrap support above 65%</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Function</b>
                        </p>
                     </c>
                     <c ca="left">
                        <p>
                           <b>Supporting hypothesis</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Bootstrap support</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Undecaprenyl diphosphate synthase homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>81</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Seryl-tRNA synthetase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Arginyl-tRNA synthetase homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Succinyl-CoA synthetase, beta subunit</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>96</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Signal recognition particle, subunit SRP54</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>68</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Nicotinate-nucleotide pyrophosphorylase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Anthranilate phosphoribosyltransferase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>70</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glu-tRNA amidotransferase, subunit A homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phenylalanyl-tRNA synthetase alpha subunit</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>99</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Adenylosuccinate lyase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 11 <abbrgrp><abbr bid="B15">15</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Aspartate aminotransferase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>92</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Carbamoyl-phosphate synthase, small subunit</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>83</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ketol-acid reductoisomerase homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>72</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Dihydroxy-acid dehydratase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Homoserine dehydrogenase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>82</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Histidinol-phosphate aminotransferase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 12 <abbrgrp><abbr bid="B16">16</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>NH<sub>3</sub>-dependent NAD+ synthetase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 9: genes of mitochondrial origin</p>
                     </c>
                     <c ca="center">
                        <p>67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Argininosuccinate synthetase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 9: genes of mitochondrial origin</p>
                     </c>
                     <c ca="center">
                        <p>95</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Carbamoyl-phosphate synthase large subunit</p>
                     </c>
                     <c ca="left">
                        <p>Tree 2 <abbrgrp><abbr bid="B20">20</abbr></abbrgrp></p>
                     </c>
                     <c ca="center">
                        <p>91</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phosphoglycerate kinase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 4</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothetical protein</p>
                     </c>
                     <c ca="left">
                        <p>Tree 4</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Translation initiation factor eIF-2B homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 4</p>
                     </c>
                     <c ca="center">
                        <p>67</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Argininosuccinate lyase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 4</p>
                     </c>
                     <c ca="center">
                        <p>75</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Glutamate synthase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 4</p>
                     </c>
                     <c ca="center">
                        <p>80</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Phosphoribosylformylglycinamidine synthase</p>
                     </c>
                     <c ca="left">
                        <p>Tree 15</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Ribosomal protein S3 homologs</p>
                     </c>
                     <c ca="left">
                        <p>Tree 15</p>
                     </c>
                     <c ca="center">
                        <p>71</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Tree numbers correspond to the designations in used in Figures <figr fid="F7">7</figr> and <figr fid="F8">8</figr>.</p>
               </tblfn>
            </tbl>
            <p>The dekapentagonal maps depicted in Figures <figr fid="F7">7</figr> and <figr fid="F8">8</figr> emphasize the mosaicism of the eukaryotic genome of yeast, and delineate different contributions to the yeast genome that have occurred over the course of evolution. The map reveals that individual datasets support different, in some instances conflicting, hypotheses proposed to explain the origin of eukaryotes. While the resulting maps illustrate the mosaic nature of the eukaryotic genome, their discriminatory power regarding different proposed contributions is limited. For example, the datasets that support the traditional topology (number 11) are equally compatible with genes that were contributed to the eukaryotic cell via the chronocyte <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. Because our approach only considers unrooted trees, the two scenarios result in identical topologies, with only the branch lengths differing under the two scenarios, that is, the genes contributed by the chronocyte are expected to have the eukaryotic genes on very long branches <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. Another shortcoming is that the map includes only two bacterial taxa. Without inspecting the phylogenies inferred from the extended datasets (see above) it is impossible to decide if many genes were contributed from a single bacterium, as assumed in hypotheses proposed in <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr><abbr bid="B28">28</abbr></abbrgrp>, or were acquired through many independent transfers <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Dekapentagonal mapping provides a useful extension to the earlier developed ML-, posterior probability, and bootstrap support-values mapping for four genomes described in <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. For the analyses of four genomes the mapping of the support values to the two-dimensional space is unique; for analyses of five genomes we had to select one out of the many possible projections of the 15-dimensional support-value vectors to two-dimensional space. We used an optimality criterion to perform a heuristic search for a map that would emphasize genome mosaicism and frequently unresolved bifurcations. Support-value mapping using an optimized barycentric coordinate system allows us to dissect genomes into parts that have different evolutionary histories, and to focus attention on genes that contain atypical phylogenetic information.</p>
         <p>If most of the individual molecular phylogenies are unresolved, analysis of individual bipartitions provides a means to assess a plurality phylogenetic signal. The modified Lento plot <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> applied to extended datasets provides both the bipartitions supported by the plurality of genes, and the number of genes that significantly disagree with these bipartitions.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Genome quintets</p>
            </st>
            <p>The first genome quintet consists of five photosynthetic bacteria from five bacterial phyla: <it>Rhodobacter capsulatus, Chlorobium tepidum, Chloroflexus aurantiacus, Heliobacillus mobilis</it> and <it>Synechocystis</it> sp. PCC 6803.</p>
            <p>The second genome quintet consists of genomes representing all three domains of life: the yeast genome of <it>Saccharomyces cerevisiae</it>, the alpha-proteobacterium <it>Rhodobacter capsulatus</it>, the Gram-positive bacterium <it>Bacillus subtilis</it>, the euryarchaeote <it>Archaeoglobus fulgidus</it> and the crenarchaeote <it>Sulfolobus solfataricus</it>.</p>
            <p>The <it>Rhodobacter capsulatus</it> and <it>Heliobacillus mobilis</it> genome data were obtained from Integrated Genomics <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. Genome sequence for <it>Chlorobium tepidum</it> was downloaded from The Institute for Genomic Research (TIGR) <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. The <it>Rhodopseudomonas palustris</it> genome was downloaded from the DOE Joint Genome Institute <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. Other genomes for the genome quintets were downloaded from the National Center for Biotechnology Information (NCBI) <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Assembly of quintets of orthologous proteins (QuintOPs)</p>
            </st>
            <p>Detection of QuintOPs was analogous to detection of quartets of orthologous proteins <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. In brief, for each genome in a genome quintet, BLAST <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> searches of every ORF in one genome against the other three genomes were performed using the <it>blastp</it> program. The E-value cutoff for the BLAST searches was set to 10<sup>-4</sup>. We defined QuintOPs as those sets of genes that mutually pick each other as the top-scoring hit in all pairwise genome BLAST comparisons. The amino-acid sequences for each QuintOP were retrieved and the datasets were aligned with ClustalW <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Maximum likelihoods for 15 tree topologies for each QuintOP were calculated using TREE-PUZZLE version 5.1 <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> under the auto-detected substitution model. Posterior probability vectors were calculated from ML values.</p>
         </sec>
         <sec>
            <st>
               <p>Assembly of extended datasets for the QuintOPs</p>
            </st>
            <p>For each sequence in a QuintOP we detect the top-scoring BLAST <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> hit with an E-value above 10<sup>-8</sup> in each of 60 completely sequenced archaeal and bacterial reference genomes (<it>Aeropyrum pernix, Archaeoglobus fulgidus, Anabaena</it> sp., <it>Aquifex aeolicus, Agrobacterium tumefaciens, Borrelia burgdorferi, Bradyrhizobium japonicum, Bifidobacterium longum, Bacillus subtilis, Brucella suis, Buchnera</it> sp., <it>Clostridium acetobutylicum, Caulobacter crescentus, Corynebacterium glutamicum, Campylobacter jejuni, Clamydophila pneumoniae, Deinococcus radiodurans, Escherichia coli</it> K12, <it>Fusobacterium nucleatum, Halobacterium</it> sp., <it>Haemophilus influenzae, Helicobacter pylori, Leptospira interrogans, Lactococcus lactis, Listeria monocytogenes, Lactobacillus plantarum, Mycoplasma genitalium, Methanococcus jannaschii, Methanopyrus kandleri, Mezorhizobium loti, Methanosarcina mazei, Methanobacterium thermoautotrophicum, Mycobacterium tuberculosis, Neisseria meningitides, Oceanobacillus iheyensis, Pseudomonas aeruginosa, Pyrobaculum aerophilum, Pyrococcus horikoshii, Pasteurella multocida, Rickettsia conorii, Ralstonia solanacearum, Staphylococcus aureus, Streptomyces coelicolor, Sinorhizobium meliloti, Shewanella oneidensis, Sulfolobus solfataricus, Salmonella typhi, Synechocystis</it> sp., <it>Thermoplasma acidophilum, Thermosynechococcus elongates, Thermotoga maritime, Treponema pallidum, Thermoanaerobacter tengcongensis, Tropheryma whipplei, Ureaplasma urealyticum, Vibrio cholerae, Wigglesworthia brevipalpis, Xanthomonas campestris, Xylella fastidiosa, Yersinia pestis</it>). These genomes were downloaded from the NCBI <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. The resulting sequences were added to the QuintOP dataset and duplicated sequences were eliminated. The datasets were aligned with ClustalW <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and 100 bootstrap samples were generated using the SEQBOOT program from the PHYLIP package version 3.6a2.1 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The distances were generated using TREE-PUZZLE version 5.1 <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> under the auto-detected substitution model. Neighbor-joining trees were calculated from these distances using NEIGHBOR from the PHYLIP package version 3.6a2.1 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The resulting trees were parsed with respect to which of the 15 five-taxon subtrees they contain.</p>
         </sec>
         <sec>
            <st>
               <p>Calculation of posterior probability vector locations for individual QuintOPs</p>
            </st>
            <p>The dekapentagon was placed into the Cartesian coordinate system with its center coinciding with the origin of the coordinate system. Then the coordinates (<it>x</it><sub><it>i</it></sub>, <it>y</it><sub><it>i</it></sub>) of a vertex <it>i</it> are <it>x</it><sub><it>i</it></sub> = <it>R</it>*cos(<it>i</it>*360/15), <it>y</it><sub><it>i</it></sub> = <it>R</it>*sin(<it>i</it>*360/15), where <it>R</it> is the distance from origin to the vertex (equal for all the vertices due to the location of the origin of the coordinate system), and 1 &#8804; <it>i</it> &#8804; 15. For each pair of vertices <it>i</it> and <it>j</it> the coordinates of the center of gravity M<sub><it>ij</it></sub> (<it>x</it><sub><it>M</it></sub>, <it>y</it><sub><it>M</it></sub>) are calculated according to the law of the lever: <it>x</it><sub><it>M</it></sub> = <it>x</it><sub><it>i</it></sub> + (<it>x</it><sub><it>j</it></sub> - <it>x</it><sub><it>i</it></sub>)*<it>p</it><sub><it>j</it></sub>/(<it>p</it><sub><it>i</it></sub> + <it>p</it><sub><it>j</it></sub>), <it>y</it><sub><it>M</it></sub> = <it>y</it><sub><it>i</it></sub> + (<it>y</it><sub><it>j</it></sub> - <it>y</it><sub><it>i</it></sub>)*<it>p</it><sub><it>j</it></sub>/(<it>p</it><sub><it>i</it></sub> + <it>p</it><sub><it>j</it></sub>), where <it>p</it><sub><it>i</it></sub> and <it>p</it><sub><it>j</it></sub> are the posterior probabilities of vertices <it>i</it> and <it>j</it>. The process is repeated for all pairs of vertices, and then iteratively for all 'intermediate' centers of gravities until only one pair of coordinates remains, which gives the center of gravity of the dekapentagon that is equivalent to the location of probability vector. The resulting coordinates of the dekapentagon's center of gravity do not depend on the order in which the masses are combined.</p>
         </sec>
         <sec>
            <st>
               <p>Finding of optimal arrangement and testing it for reproducibility</p>
            </st>
            <p>There are (15 - 1)!/2 = 14!/2 &#8776; 4*10<sup>10</sup> possible arrangements of topologies on dekapentagon's vertices (only free circular permutations <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> are counted, and the arrangements that become equivalent by rotation of dekapentagon or flipping the dekapentagon over are considered as the same arrangements). The arrangement was considered optimal when the topologies arranged at the polygon vertices in such way that maximizes the sum of all distances of the barycentric points from the center of the polygon. There are too many arrangements of topologies around the dekapentagon to search for the optimal arrangement exhaustively. Therefore, we used a heuristic search for optimal solutions based on a hybrid genetic algorithm <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Each tree topology was assigned a numerical identifier (1 through 15), and the arrangements of topologies around the dekapentagon's vertices were encoded as arrays of the tree topology identifiers where each position in the array represents a position on the polygon circumference. The genetic algorithm applies mutation and cross-over operations to each successive generation of arrangements until the optimal solution is obtained <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. Each generation consisted of a population of 300 individuals. In order to preserve diversity among the individuals as much as possible and prevent premature convergence of the algorithm the population was divided into 10 demes (subpopulations) each with 30 individuals and with controlled migration between demes.</p>
            <p>We hybridized the genetic algorithm by equipping the algorithm with a local search heuristic in addition to the global search strategy based on the genetic operators to explore better the space of possible arrangements. A manuscript reporting details on the algorithm for finding the optimal arrangements is in preparation (L.H., O.Z. and J.P.G., unpublished work). The program calculating the optimal arrangement of topologies is available on request.</p>
            <p>To test the reproducibility, the search for the optimal arrangement was repeated independently 50 times with different starting seeds.</p>
         </sec>
         <sec>
            <st>
               <p>Plotting</p>
            </st>
            <p>The resulting posterior probability and bootstrap support vectors were plotted into dekapentagonal maps using GNUPLOT version 3.7 <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Analyses of genes from the chlorophyll biosynthesis pathway</p>
            </st>
            <p>Sequences from the genome quintet were supplemented with homologous sequences from other photosynthetic bacteria to improve taxon sampling, aligned with ClustalW <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, and phylogenetic trees were reconstructed. For distance and parsimony analyses, 100 bootstrap samples were generated with SEQBOOT <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Distances were calculated in TREE-PUZZLE v. 5.1 <abbrgrp><abbr bid="B36">36</abbr></abbrgrp> with among-site rate variation taken into account. Neighbor-joining trees were calculated with NEIGHBOR <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, Fitch-Margoliash trees with FITCH <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, protein parsimony trees with PROTPARS <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. MrBayes version 3.0B4 <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> analyses were run three times independently for 500,000 generations per run (100,000 of which were burned in), under the JTT substitution model <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, and with an exponential prior set for branch length.</p>
         </sec>
         <sec>
            <st>
               <p>Software packages used</p>
            </st>
            <p>Scripts for data manipulation were written in Perl and used many of the SEALS package subroutines <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Tree-parsing programs were written in Java utilizing PAL library classes <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. The genetic algorithm was written in C++ and is based on the genetic algorithm library GALIB version 2.4.5 <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>Additional data file <supplr sid="s1">1</supplr> contains accession numbers for the datasets in two genome quintets analyzed in this article.</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Accession numbers for the datasets in two genome quintets analyzed</p>
            </caption>
            <text>
               <p>Accession numbers for the datasets in two genome quintets analyzed</p>
            </text>
            <file name="gb-2004-5-3-r20-s1.pdf">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Korbinian Strimmer for useful comments on the manuscript. This work was supported through the NASA Astrobiology Institute at Arizona State University, the NASA Exobiology Program, and in part through the NSF Microbial Genetics Program.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1" rating="0">
            <title>
               <p>The early evolution of cellular life.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Trends Ecol Evol</source>
            <pubdate>1995</pubdate>
            <volume>10</volume>
            <fpage>147</fpage>
            <lpage>151</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1016/S0169-5347(00)89024-2</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2" rating="0">
            <title>
               <p>Bootstrap, Bayesian probability and maximum likelihood mapping: Exploring new tools for comparative genome analyses.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>4</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">100357</pubid>
                  <pubid idtype="pmpid" link="fulltext">11918828</pubid>
                  <pubid idtype="doi" link="fulltext">10.1186/1471-2164-3-4</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3" rating="0">
            <title>
               <p>Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>von Haeseler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1997</pubdate>
            <volume>94</volume>
            <fpage>6815</fpage>
            <lpage>6819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">21241</pubid>
                  <pubid idtype="pmpid" link="fulltext">9192648</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.94.13.6815</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4" rating="0">
            <title>
               <p>An improved probability mapping approach to assess genome mosaicism.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>37</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">222983</pubid>
                  <pubid idtype="pmpid" link="fulltext">12974984</pubid>
                  <pubid idtype="doi" link="fulltext">10.1186/1471-2164-4-37</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5" rating="0">
            <title>
               <p>Whole-genome analysis of photosynthetic prokaryotes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Raymond</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhaxybayeva</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gerdes</snm>
                  <fnm>SY</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Blankenship</snm>
                  <fnm>RE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>298</volume>
            <fpage>1616</fpage>
            <lpage>1620</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.1075558</pubid>
                  <pubid idtype="pmpid" link="fulltext">12446909</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6" rating="0">
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Li</snm>
                  <fnm>W-H</fnm>
               </au>
            </aug>
            <source>Molecular Evolution</source>
            <publisher>Sunderland, MA: Sinauer Associates</publisher>
            <pubdate>1997</pubdate>
         </bibl>
         <bibl id="B7" rating="0">
            <title>
               <p>Geometry of the space of phylogenetic trees.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Billera</snm>
                  <fnm>LJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Holmes</snm>
                  <fnm>SP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vogtmann</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Adv Appl Math</source>
            <pubdate>2001</pubdate>
            <volume>27</volume>
            <fpage>733</fpage>
            <lpage>767</lpage>
            <xrefbib>
               <pubid idtype="doi" link="fulltext">10.1006/aama.2001.0759</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8" rating="0">
            <title>
               <p>Comparison of bayesian and maximum likelihood bootstrap measures of phylogenetic reliability.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Douady</snm>
                  <fnm>CJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Delsuc</snm>
                  <fnm>F</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Boucher</snm>
                  <fnm>Y</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Douzery</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>248</fpage>
            <lpage>254</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/molbev/msg042</pubid>
                  <pubid idtype="pmpid" link="fulltext">12598692</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9" rating="0">
            <title>
               <p>Bayes or bootstrap? A simulation study comparing the performance of bayesian markov chain monte carlo sampling and bootstrapping in assessing phylogenetic confidence.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Alfaro</snm>
                  <fnm>ME</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zoller</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lutzoni</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2003</pubdate>
            <volume>20</volume>
            <fpage>255</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/molbev/msg028</pubid>
                  <pubid idtype="pmpid" link="fulltext">12598693</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10" rating="0">
            <title>
               <p>Is it better to add taxa or characters to a difficult phylogenetic problem?</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Graybeal</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>1998</pubdate>
            <volume>47</volume>
            <fpage>9</fpage>
            <lpage>17</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1080/106351598260996</pubid>
                  <pubid idtype="pmpid" link="fulltext">12064243</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11" rating="0">
            <title>
               <p>Is sparse taxon sampling a problem for phylogenetic inference?</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hillis</snm>
                  <fnm>DM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Pollock</snm>
                  <fnm>DD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>McGuire</snm>
                  <fnm>JA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zwickl</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>124</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1080/10635150309356</pubid>
                  <pubid idtype="pmpid" link="fulltext">12554446</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12" rating="0">
            <title>
               <p>Taxon sampling, bioinformatics, and phylogenomics.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rosenberg</snm>
                  <fnm>MS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kumar</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2003</pubdate>
            <volume>52</volume>
            <fpage>119</fpage>
            <lpage>124</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1080/10635150309344</pubid>
                  <pubid idtype="pmpid" link="fulltext">12554445</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13" rating="0">
            <title>
               <p>Phylogenetics and the cohesion of bacterial genomes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Daubin</snm>
                  <fnm>V</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Moran</snm>
                  <fnm>NA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2003</pubdate>
            <volume>301</volume>
            <fpage>829</fpage>
            <lpage>832</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.1086568</pubid>
                  <pubid idtype="pmpid" link="fulltext">12907801</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14" rating="0">
            <title>
               <p>Molecular evidence for the early evolution of photosynthesis.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Xiong</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Fischer</snm>
                  <fnm>WM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Inoue</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Nakahara</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Bauer</snm>
                  <fnm>CE</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2000</pubdate>
            <volume>289</volume>
            <fpage>1724</fpage>
            <lpage>1730</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1126/science.289.5485.1724</pubid>
                  <pubid idtype="pmpid" link="fulltext">10976061</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15" rating="0">
            <title>
               <p>Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Woese</snm>
                  <fnm>CR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kandler</snm>
                  <fnm>O</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wheelis</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <fpage>4576</fpage>
            <lpage>4579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">54159</pubid>
                  <pubid idtype="pmpid" link="fulltext">2112744</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16" rating="0">
            <title>
               <p>Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lake</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1988</pubdate>
            <volume>331</volume>
            <fpage>184</fpage>
            <lpage>186</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1038/331184a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">3340165</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17" rating="0">
            <title>
               <p>Phylogenetic analysis of carbamoylphosphate synthetase genes: complex evolutionary history includes an internal duplication within a gene which can root the tree of life.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lawson</snm>
                  <fnm>FS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Charlebois</snm>
                  <fnm>RL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Dillon</snm>
                  <fnm>JA</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>970</fpage>
            <lpage>977</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8752005</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B18" rating="0">
            <title>
               <p>Molecular studies on an ancient gene encoding for carbamoyl-phosphate synthetase.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schofield</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Clin Sci (Lond)</source>
            <pubdate>1993</pubdate>
            <volume>84</volume>
            <fpage>119</fpage>
            <lpage>128</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8382576</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B19" rating="0">
            <title>
               <p>Evolutionary relationships of the carbamoylphosphate synthetase genes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>van den Hoff</snm>
                  <fnm>MJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jonker</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Beintema</snm>
                  <fnm>JJ</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lamers</snm>
                  <fnm>WH</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1995</pubdate>
            <volume>41</volume>
            <fpage>813</fpage>
            <lpage>832</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8587126</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20" rating="0">
            <title>
               <p>Deciphering the molecular record for the early evolution of life: Gene duplication and horizontal gene transfer.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Olendzenski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>In: Thermophiles: The Keys to Molecular Evolution and the Origin of Life?</source>
            <publisher>Philadelphia: Taylor &amp; Francis</publisher>
            <editor>Wiegel J, Adams MWW</editor>
            <pubdate>1998</pubdate>
            <fpage>165</fpage>
            <lpage>176</lpage>
         </bibl>
         <bibl id="B21" rating="0">
            <title>
               <p>Horizontal gene transfer and fusing lines of descent: the archaebacteria - a chimera?</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Olendzenski</snm>
                  <fnm>L</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hilario</snm>
                  <fnm>E</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>In: Horizontal Gene Transfer</source>
            <publisher>London: Chapman and Hall</publisher>
            <editor>Syvanen M, Kado C</editor>
            <edition>1</edition>
            <pubdate>1998</pubdate>
            <fpage>349</fpage>
            <lpage>362</lpage>
         </bibl>
         <bibl id="B22" rating="0">
            <title>
               <p>Updating carbamoylphosphate synthase (CPS) phylogenies: occurrence and phylogenetic identity of archaeal CPS genes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cammarano</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gribaldo</snm>
                  <fnm>S</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Johann</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>2002</pubdate>
            <volume>55</volume>
            <fpage>153</fpage>
            <lpage>160</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00239-002-2312-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">12107592</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23" rating="0">
            <title>
               <p>A model of the early evolution of organisms: the arisal of the three domains of life from the common ancestor.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zillig</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Palm</snm>
                  <fnm>P</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Klenk</snm>
                  <fnm>H-P</fnm>
               </au>
            </aug>
            <source>In: The Origin and Evolution of the Cell</source>
            <publisher>Singapore: World Scientific Publishing</publisher>
            <editor>Hartman H, Matsuno K</editor>
            <pubdate>1992</pubdate>
            <fpage>163</fpage>
            <lpage>182</lpage>
         </bibl>
         <bibl id="B24" rating="0">
            <title>
               <p>Evolution of HSP70 gene and its implications regarding relationships between archaebacteria, eubacteria, and eukaryotes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gupta</snm>
                  <fnm>RS</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Golding</snm>
                  <fnm>GB</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1993</pubdate>
            <volume>37</volume>
            <fpage>573</fpage>
            <lpage>582</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/BF00182743</pubid>
                  <pubid idtype="pmpid" link="fulltext">8114110</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25" rating="0">
            <title>
               <p>You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>307</fpage>
            <lpage>311</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1016/S0168-9525(98)01494-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9724962</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26" rating="0">
            <title>
               <p>The origin of the eukaryotic cell.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hartman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Speculations Sci Technol</source>
            <pubdate>1984</pubdate>
            <volume>7</volume>
            <fpage>77</fpage>
            <lpage>81</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11541973</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B27" rating="0">
            <title>
               <p>Early evolution and the origin of eukaryotes.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sogin</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>1991</pubdate>
            <volume>1</volume>
            <fpage>457</fpage>
            <lpage>463</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(05)80192-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">1822277</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28" rating="0">
            <title>
               <p>Was the nucleus the first endosymbiont?</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lake</snm>
                  <fnm>JA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Rivera</snm>
                  <fnm>MC</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1994</pubdate>
            <volume>91</volume>
            <fpage>2880</fpage>
            <lpage>2881</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">43475</pubid>
                  <pubid idtype="pmpid" link="fulltext">8159671</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29" rating="0">
            <title>
               <p>Use of spectral analysis to test hypotheses on the origin of pinnipeds.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lento</snm>
                  <fnm>GM</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Hickson</snm>
                  <fnm>RE</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Chambers</snm>
                  <fnm>GK</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Penny</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1995</pubdate>
            <volume>12</volume>
            <fpage>28</fpage>
            <lpage>52</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">7877495</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30" rating="0">
            <title>
               <p>Integrated Genomics</p>
            </title>
            <url>http://www.integratedgenomics.com</url>
         </bibl>
         <bibl id="B31" rating="0">
            <title>
               <p>The Institute for Genomic Research</p>
            </title>
            <url>http://www.tigr.org</url>
         </bibl>
         <bibl id="B32" rating="0">
            <title>
               <p>DOE Joint Genome Institute</p>
            </title>
            <url>http://www.jgi.doe.gov/JGI_microbial/html/index.html</url>
         </bibl>
         <bibl id="B33" rating="0">
            <title>
               <p>National Center for Biotechnology Information</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov</url>
         </bibl>
         <bibl id="B34" rating="0">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi" link="fulltext">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35" rating="0">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">308517</pubid>
                  <pubid idtype="pmpid" link="fulltext">7984417</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36" rating="0">
            <title>
               <p>TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Schmidt</snm>
                  <fnm>HA</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Vingron</snm>
                  <fnm>M</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>von Haeseler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>502</fpage>
            <lpage>504</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/18.3.502</pubid>
                  <pubid idtype="pmpid" link="fulltext">11934758</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37" rating="0">
            <title>
               <p>PHYLIP (Phylogeny Inference Package) version 3.6.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Distributed by the author: Department of Genetics, University of Washington, Seattle</source>
            <pubdate>1993</pubdate>
         </bibl>
         <bibl id="B38" rating="0">
            <title>
               <p>MathWord: circular permutations</p>
            </title>
            <url>http://mathworld.wolfram.com/CircularPermutation.html</url>
         </bibl>
         <bibl id="B39" rating="0">
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Goldberg</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Genetic Algorithms in Search, Optimization and Machine Learning</source>
            <publisher>Boston, MA: Addison-Wesley</publisher>
            <pubdate>1989</pubdate>
         </bibl>
         <bibl id="B40" rating="0">
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Holland</snm>
                  <fnm>JH</fnm>
               </au>
            </aug>
            <source>Adaptation in Natural and Artificial Systems</source>
            <publisher>Ann Arbor: University of Michigan Press</publisher>
            <pubdate>1975</pubdate>
         </bibl>
         <bibl id="B41" rating="0">
            <title>
               <p>GNUPLOT central</p>
            </title>
            <url>http://www.gnuplot.info</url>
         </bibl>
         <bibl id="B42" rating="0">
            <title>
               <p>MRBAYES: Bayesian inference of phylogenetic trees.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>754</fpage>
            <lpage>755</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/17.8.754</pubid>
                  <pubid idtype="pmpid" link="fulltext">11524383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43" rating="0">
            <title>
               <p>The rapid generation of mutation data matrices from protein sequences.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Jones</snm>
                  <fnm>DT</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Taylor</snm>
                  <fnm>WR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1992</pubdate>
            <volume>8</volume>
            <fpage>275</fpage>
            <lpage>282</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1633570</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44" rating="0">
            <title>
               <p>SEALS: a system for easy analysis of lots of sequences.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Walker</snm>
                  <fnm>DR</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Proc Int Conf Intell Syst Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>5</volume>
            <fpage>333</fpage>
            <lpage>339</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9322058</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B45" rating="0">
            <title>
               <p>PAL: an object-oriented programming library for molecular evolution and phylogenetics.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Drummond</snm>
                  <fnm>A</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Strimmer</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>662</fpage>
            <lpage>663</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi" link="fulltext">10.1093/bioinformatics/17.7.662</pubid>
                  <pubid idtype="pmpid" link="fulltext">11448888</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46" rating="0">
            <title>
               <p>GALIB: A C++ library of genetic algorithm components</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Wall</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <url>http://lancet.mit.edu/ga</url>
         </bibl>
         <bibl id="B47" rating="0">
            <title>
               <p>The bioenergetics of the last common ancestor and the origin of the eukaryotic endomembrane systems.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Kibak</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>In: The Origin and Evolution of the Cell</source>
            <publisher>Singapore: World Scientific Publishing</publisher>
            <editor>Hartman H, Matsuno K</editor>
            <pubdate>1992</pubdate>
            <fpage>131</fpage>
            <lpage>154</lpage>
         </bibl>
         <bibl id="B48" rating="0">
            <title>
               <p>Origin of the cytoskeleton.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Cavalier-Smith</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>In: The Origin and Evolution of the Cell</source>
            <publisher>Singapore: World Scientific Publishing</publisher>
            <editor>Hartman H, Matsuno K</editor>
            <pubdate>1992</pubdate>
            <fpage>79</fpage>
            <lpage>106</lpage>
         </bibl>
         <bibl id="B49" rating="0">
            <title>
               <p>On the origin of mitosing cells.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Sagan</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Theor Biol</source>
            <pubdate>1967</pubdate>
            <volume>14</volume>
            <issue>3</issue>
            <fpage>255</fpage>
            <lpage>274</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11541392</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B50" rating="0">
            <title>
               <p>Gene transfer from organelles to the nucleus: Frequent and in big chunks.</p>
            </title>
            <aug>
               <au ca="no" ce="no" pa="no" da="no">
                  <snm>Martin</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2003</pubdate>
            <volume>100</volume>
            <fpage>8612</fpage>
            <lpage>8614</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid" link="fulltext">166356</pubid>
                  <pubid idtype="pmpid" link="fulltext">12861078</pubid>
                  <pubid idtype="doi" link="fulltext">10.1073/pnas.1633606100</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
