<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2003-4-3-r19</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Koonin</snm>
               <mi>V</mi>
               <fnm>Eugene</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Makarova</snm>
               <mi>S</mi>
               <fnm>Kira</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3">
               <snm>Rogozin</snm>
               <mi>B</mi>
               <fnm>Igor</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A4">
               <snm>Davidovic</snm>
               <fnm>Laetitia</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A5">
               <snm>Letellier</snm>
               <fnm>Marie-Claude</fnm>
               <insr iid="I2"/>
            </au>
            <au id="A6" ca="yes">
               <snm>Pellegrini</snm>
               <fnm>Luca</fnm>
               <insr iid="I2"/>
               <email>Luca.Pellegrini@crulrg.ulaval.ca</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA</p>
            </ins>
            <ins id="I2">
               <p>Centre de Recherche Universit&#233; Laval Robert Giffard, Universit&#233; Laval, Chemin de la Canardiere, G1J 2G3 Quebec, Canada</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>3</issue>
         <fpage>R19</fpage>
         <url>http://genomebiology.com/2003/4/3/R19</url>
         <note>A previous version of this manuscript was made available before peer review at  <url>http://genomebiology.com/2002/3/11/preprint/0010/</url></note>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/gb-2003-4-3-r19</pubid>
               <pubid idtype="pmpid">12620104</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>30</day>
               <month>9</month>
               <year>2002</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>20</day>
               <month>12</month>
               <year>2002</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>3</day>
               <month>2</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>28</day>
               <month>2</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Koonin et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <shorttitle>
         <p>The rhomboids: a nearly ubiquitous family of intramembrane serine proteases that probably evolved by multiple ancient horizontal gene transfers</p>
      </shorttitle>
      <shortabs>
         <p>The near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The rhomboid family of polytopic membrane proteins shows a level of evolutionary conservation unique among membrane proteins. They are present in nearly all the sequenced genomes of archaea, bacteria and eukaryotes, with the exception of several species with small genomes. On the basis of experimental studies with the developmental regulator rhomboid from <it>Drosophila </it>and the AarA protein from the bacterium <it>Providencia stuartii</it>, the rhomboids are thought to be intramembrane serine proteases whose signaling function is conserved in eukaryotes and prokaryotes.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Phylogenetic tree analysis carried out using several independent methods for tree constructions and the corresponding statistical tests suggests that, despite its broad distribution in all three superkingdoms, the rhomboid family was not present in the last universal common ancestor of extant life forms. Instead, we propose that rhomboids evolved in bacteria and have been acquired by archaea and eukaryotes through several independent horizontal gene transfers. In eukaryotes, two distinct, ancient acquisitions apparently gave rise to the two major subfamilies, typified by rhomboid and PARL (presenilins-associated rhomboid-like protein), respectively. Subsequent evolution of the rhomboid family in eukaryotes proceeded by multiple duplications and functional diversification through the addition of extra transmembrane helices and other domains in different orientations relative to the conserved core that harbors the protease activity.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Although the near-universal presence of the rhomboid family in bacteria, archaea and eukaryotes appears to suggest that this protein is part of the heritage of the last universal common ancestor, phylogenetic tree analysis indicates a likely bacterial origin with subsequent dissemination by horizontal gene transfer. This emphasizes the importance of explicit phylogenetic analysis for the reconstruction of ancestral life forms. A hypothetical scenario for the origin of intracellular membrane proteases from membrane transporters is proposed.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Polytopic transmembrane proteins are, in general, not particularly strongly conserved during evolution. Inspection of the database of Clusters of Orthologous Groups of proteins (COGs) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> revealed only one family of such proteins that is represented in most of the sequenced bacterial, archaeal and eukaryotic genomes. The prototype of this family is the rhomboid (RHO) protein from <it>Drosophila melanogaster</it>, a developmental regulator involved in epidermal growth factor (EGF)-dependent signaling pathways <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. Not only were homologs of rhomboid detected in prokaryotes and eukaryotes, but the pattern of sequence conservation in this family appeared uncharacteristic of nonenzymatic membrane proteins, such as transporters <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Specifically, several polar amino-acid residues are conserved in nearly all members of the rhomboid family, suggesting the possibility of an enzymatic activity. As three of these conserved residues were histidines, it has been hypothesized that rhomboid-family proteins could function as metal-dependent membrane proteases <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. Recently, however, it has been shown that RHO cleaves a transmembrane helix (TMH) in the membrane-bound precursor of the TGF&#945;-like growth factor Spitz, enabling the released Spitz to activate the EGF receptor, and that a conserved serine and a conserved histidine in RHO are essential for this cleavage <abbrgrp><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. Thus, it appears that rhomboid-family proteins are a distinct group of intramembrane serine proteases. Altogether, the genome of <it>Drosophila </it>encodes seven RHO paralogs (now designated RHO1-7, with the original rhomboid becoming RHO-1), at least three of which are involved in distinct EGF-dependent pathways, apparently through proteolytic activation of diverse ligands of the EGF receptor <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>.</p>
         <p>The newly discovered intramembrane proteolytic activity of RHO places the rhomboid family within the framework of regulated intramembrane proteolysis (RIP), a new paradigm of signal transduction, which appears to be prominent in all forms of life <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Under RIP, signaling proteins undergo site-specific proteolysis within TMH, resulting in the release of active fragments, which are the actual effectors in signal tranduction cascades. Until recently, the only characterized cases of RIP in eukaryotes involved presenilin-1, an aspartyl protease, which cleaves a transmembrane helix in type-1 membrane proteins such as amyloid &#946;-precursor protein (A&#946;PP), Notch and Ire1 <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, and the metalloprotease S2P, which cleaves a TMH in a type-2 transmembrane protein, the sterol-dependent transcription factor SREBP <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Notably, S2P has highly conserved bacterial homologs, and the protease domain of presenilins also might be homologous to bacterial and archaeal type IV prepilin peptidases, although, in this case, the sequence similarity is low <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>.</p>
         <p>In the case of the rhomboid family, the existence of homologs of RHO in most prokaryotes is particularly remarkable because animal RHO proteins are involved in signaling pathways that are not found outside metazoa, which seems to make functional conservation in prokaryotes a remote possibility. The only prokaryotic protein of the rhomboid family that has been characterized experimentally in considerable detail is AarA from the bacterium <it>Providencia stuartii </it><abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr></abbrgrp>. This protein is involved in the export of a quorum-sensing peptide, a function that, in physiological terms, resembles that of RHO, although the signaling molecules, other than RHO and AarA, are obviously unrelated <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. In a striking recent development, two independent research groups have shown that several bacterial rhomboid-family proteins, including AarA, can cleave the EGF receptor ligands (Spitz, Keren and Gurken) that are normally cleaved by RHO paralogs <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>. The cleavage depended on the conserved serine and histidine residues <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> and, moreover, transgenic flies that expressed AarA developed a phenotype indistinguishable from that induced by overexpression of RHO, whereas RHO could substitute for AarA in <it>Providencia stuartii </it><abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. These unexpected findings demonstrated the conservation of a RIP mechanism producing extracellular signals in eukaryotes and prokaryotes. Eukaryotic rhomboid family proteins seem to show considerable functional variability; in particular, cross-talk might exist between different RIP pathways. A distinct representative of the rhomboid family has been shown to physically interact with presinilins 1 and 2, and was accordingly named presenilins-associated rhomboid-like protein (PARL) <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The yeast ortholog of PARL has been suggested to participate in the processing of cytochrome <it>c </it>peroxidase precursor during its import into the mitochondrion <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>.</p>
         <p>The near ubiquity of the rhomboid family among bacteria, archaea and eukaryotes, along with the remarkable functional conservation, suggests that a signaling mechanism mediated by rhomboids might have functioned already in the last common ancestor of all extant life forms, with subsequent loss in several lineages. To address this possibility, we performed a detailed phylogenetic analysis of the rhomboid family.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Sequence and structural features and phyletic distribution of the rhomboid family</p>
            </st>
            <p>Although the sequence similarity between eukaryotic and prokaryotic rhomboid family proteins is relatively low (around 10-15% identity in the conserved region), the entire superfamily could be retrieved from the protein sequence databases within three iterations of the PSI-BLAST program with a high statistical significance and without any false positives. The conserved core of the rhomboid family consists of six conserved TMHs (Figure <figr fid="F1">1</figr>). The predicted catalytic serine is located in TMH5, whereas the predicted catalytic histidine is in TMH7; TMH3 contains two additional histidines and an asparagine, which are conserved in the great majority of the rhomboid-family proteins (Figure <figr fid="F1">1</figr>). The roles of these conserved residues are not known, but, given the remarkable evolutionary conservation, it seems likely that they also contribute to catalysis; indeed, it has been shown that the conserved asparagine is required for the cleavage of Spitz by RHO <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Multiple alignment of the conserved core of the rhomboid family proteins</p>
               </caption>
               <text>
                  <p>Multiple alignment of the conserved core of the rhomboid family proteins. The alignment includes the majority of the detected rhomboid family proteins; some closely related sequences were omitted. Only the six conserved (predicted) transmembrane helices (TMH) and short surrounding regions are shown. The boundaries of the predicted TMH are indicated by gray shading and overline and they are numbered 1-6. The number of amino-acid residues in the omitted terminal and internal regions are indicated. The consensus shows amino-acid residues present in at least 90% of the aligned sequences; h stands for hydrophobic residues (A, C, I, L, V, M, F, Y, W in the single-letter amino-acid code) and s for small residues (G, A, S, D, N, V). The proposed catalytic serine (TMH4) and histidine (TMH6) as well as conserved residues in TMH2 with possible ancillary roles in catalysis are highlighted in color. The proteins are identified with the gene identification (GI) number from the nonredundant database and an abbreviated species name. Bacterial species are color-coded green, eukaryotic species blue and archaeal species yellow. Species name abbreviations: Aerpe, <it>Aeropyrum pernix</it>; Agrtu, <it>Agrobacterium tumefaciens</it>; Anoga, <it>Anopheles gambiae</it>; Arath, <it>Arabidopsis thaliana</it>; Arcfu, <it>Archaeoglobus fulgidus</it>; Bacsu, <it>Bacillus subtilis</it>; Brume, <it>Brucella melitensis</it>; Caeel, <it>Caenorhabditis elegans</it>; Caucr, <it>Caulobacter crescentus</it>; Chlte, <it>Chlorobium tepidum</it>; Cloac, <it>Clostridium acetobutilicum</it>; Corgl, <it>Corynebacterium glutamicum</it>; Deira, <it>Deinococcus radiodurans</it>; Dicdi, <it>Dictyostelium discoideum</it>; Drome, <it>Drosophila melanogaster</it>; Escco, <it>Escherichia coli</it>; Haein, <it>Haemophilus influenzae</it>; Halsp, <it>Halobacterium </it>sp.; Homsa, <it>Homo sapiens</it>; Lacla, <it>Lactococcus lactis</it>; Lisin, <it>Listeria innocua</it>; Metja, <it>Methanoccocus jannaschii</it>; Metka, <it>Methanopyrus kandleri</it>; Metma, <it>Methanosarcina mazei</it>; Meslo, <it>Mesorhizobium loti</it>; Mycle, <it>Mycobacterium leprae</it>; Myctu, <it>Mycobacterium tuberculosis</it>; Neucr, <it>Neurospora crassa</it>; Nossp, <it>Nostoc </it>sp.; Prost, <it>Providencia stuartii</it>; Pyrab, <it>Pyrococcus abyssi</it>; Pyrae, <it>Pyrobaculum aerophilum</it>; Ralso, <it>Ralstonia solanaraceum</it>; Sacce, <it>Saccharomyces cerevisiae</it>; Schpo, <it>Schizosaccharomyces pombe</it>; Sinme, <it>Sinorhizobium meliloti</it>; Strco, <it>Streptomyces coelicolor</it>; Strpn, <it>Streptococcus pneumoniae</it>; Sulso, <it>Sulfolobus solfataricus</it>; Sulto, <it>Sulfolobus tokodaii</it>; Synsp, <it>Synechocystis </it>sp.; Theac, <it>Thermoplasma acidophilum</it>; Thema, <it>Thermotoga maritima</it>; Thete, <it>Thermus thermophilus</it>; Vibch, <it>Vibrio cholerae</it>; Xanca, <it>Xanthomonas campestris</it>; Xylfa, <it>Xylella fastidiosa</it>.</p>
               </text>
               <graphic file="gb-2003-4-3-r19-1"/>
            </fig>
            <p>When examining the multiple alignment of the rhomboid superfamily proteins, we noticed that several eukaryotic members appear to be inactivated proteases, as indicated by the loss of the predicted catalytic serine or histidine (Figure <figr fid="F1">1</figr>, and data not shown); these inactivated forms could be regulators of active rhomboid proteases. Several other proteins lack one or more of the conserved residues in TMH3; it remains unclear whether or not these are active proteases.</p>
            <p>Bacterial and archaeal members of the rhomboid superfamily contain six TMH, whereas the eukaryotic members typically have an additional seventh TMH, which may be attached to the core either from the amino terminus or from the carboxyl terminus as discussed below.</p>
            <p>The phyletic distribution pattern of the rhomboid family shows that this intramembrane protease is extremely common in all three kingdoms of life, but is not necessarily essential for cell function. Rhomboids are missing in the microsporidian <it>Encephalitozoon cuniculi</it>, a eukaryotic intracellular parasite with a highly degraded genome, the archaea <it>Methanothermobacter thermoautotrophicus </it>and <it>Thermoplasma volcanium</it>, and several bacterial species, primarily parasites with small genomes but also species with moderately sized genomes, such as <it>Xylella fastidiosum </it>(see COG0705 at <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>). In two instances, a representative of the rhomboid family is present in only one of a pair of relatively close genomes (present in <it>T. acidophilum </it>but missing in <it>T. volcanium</it>; present in the spirochete <it>Treponema pallidum </it>but missing in the related bacterium <it>Borrelia burgdorferi</it>), which suggests relatively recent, repeated losses of this gene. Most of the prokaryotic species have a single gene coding for a rhomboid-family protein, although some have two or three paralogs (see COG0705 <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>); in contrast, eukaryotes show expansion of the rhomboid family, with seven members in <it>Drosophila</it>, and as many as 13 in <it>Arabidopsis</it>.</p>
         </sec>
         <sec>
            <st>
               <p>Phylogeny and evolutionary history of the rhomboid family</p>
            </st>
            <p>The multiple alignment of the 6-TMH core of the rhomboid family (Figure <figr fid="F1">1</figr>) was employed to construct a phylogenetic tree using the least-squares algorithm with subsequent optimization using the maximum likelihood (ML) method (see Materials and methods). Only the conserved regions including the TMH and short adjacent stretches shown in Figure <figr fid="F1">1</figr> were used as the input for tree building, whereas the poorly conserved intervening regions were omitted to avoid noise from potentially misaligned residues (except for the Bayesian analysis, which used the complete alignment; see Materials and methods). The alignment used for phylogenetic reconstructions included 87 sequences and 149 aligned sites. The phylogenetic tree of the rhomboid family presents a complex and unexpected picture (Figure <figr fid="F2">2</figr>). Neither the eukaryotic nor the archaeal subsets of the family appear to form monophyletic clades. Instead, the eukaryotic rhomboids are split between two major subfamilies, which are positioned in the midst of different prokaryotic branches (Figure <figr fid="F2">2</figr>). The first subfamily, which includes six of the seven <it>Drosophila </it>rhomboids, clusters with a distinct prokaryotic assemblage, consisting primarily of Gram-positive bacteria as well as a subset of archaea; this clade is strongly supported by bootstrap analysis (Figure <figr fid="F2">2</figr>). The proteins in this group of eukaryotic rhomboids, which we designated the RHO subfamily, typically have an extra TMH added carboxy-terminally to the 6-TMH core; some of these proteins also contain EF-hand calcium-binding domains amino-terminally of the core (Figure <figr fid="F2">2</figr>).</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Phylogenetic tree of the rhomboid family</p>
               </caption>
               <text>
                  <p>Phylogenetic tree of the rhomboid family. The sequences and their regions used to construct the tree are exactly those shown in Figure <figr fid="F1">1</figr>. The color coding and abbreviations are as in Figure <figr fid="F1">1</figr>. The two major eukaryotic subfamilies are denoted as RHO and PARL (see text) and four clusters containing unexpected (from a phylogenetic viewpoint) sets of species are denoted 1-4. The clades that were investigated in the KH test are denoted A through D. Although the tree is shown in a pseudorooted form for convenience, this is an unrooted tree. Internal nodes with at least 70% RELL bootstrap supported are denoted by black circles and nodes with a 50-70% support by blue circles. The posterior probabilities reported by the MRBAYES program are indicated for some key internal branches. Domain architectures are connected to the respective proteins by brackets or lines. The domain key is shown at the bottom of the figure.</p>
               </text>
               <graphic file="gb-2003-4-3-r19-2"/>
            </fig>
            <p>The second eukaryotic subfamily, which we designated the PARL subfamily, after PARL, the human ortholog of <it>Drosophila </it>RHO7 <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, resides within a large, heterogeneous prokaryotic cluster (Figure <figr fid="F2">2</figr>). Within this subfamily, PARL and its orthologs from other animals and from fungi have distinct domain architecture, with an extra TMH added to the amino terminus of the core, whereas the rest have only the core (a carboxy-terminal TMH and a ubiquitin-associated domain are appended in one <it>Arabidopsis </it>protein; Figure <figr fid="F2">2</figr>). Thus, the existence of two distinct subfamilies of eukaryotic rhomboids is supported by features of domain architectures that appear to comprise shared derived characters. Within these two major eukaryotic subfamilies, evolution apparently proceeded by both ancient and more recent duplications. Several lineage-specific expansions of paralogs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> are noticeable, in insects, mammals and plants (Figure <figr fid="F2">2</figr>).</p>
            <p>Archaeal rhomboids are scattered over the phylogenetic tree, with two major clusters and, in addition, three isolated proteins joining different bacterial branches (Figure <figr fid="F2">2</figr>). There is no indication of an affinity between any of the archaeal and eukaryotic rhomboids. Although many of the bacterial rhomboids form phylogenetically coherent clusters corresponding to the established bacterial lineages, there are also several clusters that have an odd composition, such as the grouping of proteobacterial and Gram-positive species; some of these clusters are well supported by bootstrap (see clusters 1-4 in Figure <figr fid="F2">2</figr>).</p>
            <p>Unexpected tree topologies often emerge due to artifacts of phylogenetic analysis methods. This concern is particularly serious for highly divergent families of membrane proteins, such as the rhomboids, in which parallel amino-acid substitutions are likely. Therefore we investigated the phylogeny of the rhomboid family in greater detail using several independent phylogenetic methods and the corresponding statistical tests. First, we assessed the robustness of the topology of the tree shown in Figure <figr fid="F2">2</figr> using the Kishino-Hasegawa (KH) test whereby the clade of interest is forced into various positions on the tree and the likelihoods of the resulting topologies are estimated. Specifically, the KH test was used to evaluate two alternative topologies, in which the RHO and PARL subfamilies formed a clade, and two topologies, in which the RHO subfamily formed a clade with archaeal rhomboids (Figure <figr fid="F2">2</figr> and Table <tblr tid="T1">1</tblr>). Each of these alternative topologies had a significantly lower likelihood than the original topology shown in Figure <figr fid="F2">2</figr> (see Table <tblr tid="T1">1</tblr>).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Log-likelihood analysis of possible placements of selected branches of maximum likelihood trees for the proteins analyzed</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>Tree*</p>
                     </c>
                     <c ca="center">
                        <p>Diff lnL<sup>&#8224;</sup></p>
                     </c>
                     <c ca="center">
                        <p>SE<sup>&#8225;</sup></p>
                     </c>
                     <c ca="center">
                        <p>RELL-BP<sup>&#167;</sup></p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Original tree</p>
                     </c>
                     <c ca="center">
                        <p>0.0</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>0.9702</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A &#8594; B</p>
                     </c>
                     <c ca="center">
                        <p>-18.9</p>
                     </c>
                     <c ca="center">
                        <p>10.2</p>
                     </c>
                     <c ca="center">
                        <p>0.0264</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>B &#8594; A</p>
                     </c>
                     <c ca="center">
                        <p>-46.6</p>
                     </c>
                     <c ca="center">
                        <p>14.6</p>
                     </c>
                     <c ca="center">
                        <p>0.0003</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A &#8594; C</p>
                     </c>
                     <c ca="center">
                        <p>-30.3</p>
                     </c>
                     <c ca="center">
                        <p>12.8</p>
                     </c>
                     <c ca="center">
                        <p>0.0031</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>A &#8594; D</p>
                     </c>
                     <c ca="center">
                        <p>-47.9</p>
                     </c>
                     <c ca="center">
                        <p>15.6</p>
                     </c>
                     <c ca="center">
                        <p>0.0000</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*A-D, clades that were subjected to local rearrangements in the tree as indicated in Figure <figr fid="F2">2</figr> and discussed in the text. <sup>&#8224;</sup>Difference of the log-likelihoods relative to the best tree. <sup>&#8225;</sup>Standard error of Diff lnL. <sup>&#167;</sup>Bootstrap probability of the given tree calculated using the RELL method (resampling of estimated log-likelihoods).</p>
               </tblfn>
            </tbl>
            <p>In addition, a tree of the rhomboid family was constructed using the Bayesian inference method, which has recently become a practical alternative to the more traditional methods of phylogenetic analysis <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. The tree produced using the MRBAYES package <abbrgrp><abbr bid="B26">26</abbr></abbrgrp> showed the same major clades as the tree in Figure <figr fid="F2">2</figr> (data not shown); moreover, clustering of the RHO and PARL subfamilies of eukaryotic rhomboids with the respective prokaryotic clades was supported by high posterior probabilities (Figure <figr fid="F2">2</figr>).</p>
            <p>We also attempted to construct a phylogenetic tree of the rhomboid family by using the maximum parsimony method <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The resulting tree contained the same major clades as the trees constructed using ML and MRBAYES; however, the number of parsimony-informative sites was insufficient to obtain high bootstrap support with this approach (data not shown).</p>
            <p>We also tested alternative phylogenies using neighbor-joining search with constraint trees <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The alternative phylogenies reflected two distinct hypotheses: first, clustering of the RHO and PARL subfamilies of eukaryotic rhomboids with the prokaryotic rhomboid families as suggested by the tree topology in Figure <figr fid="F2">2</figr>; and second, monophyly of the eukaryotic rhomboids (Figure <figr fid="F3">3</figr>). The phylogenies corresponding to these alternative hypotheses were compared to the best phylogeny using three statistical tests (Table <tblr tid="T2">2</tblr>). The hypothesis 1 tree was not significantly different from the best tree under any of these tests whereas the hypothesis 2 tree was significantly (<it>p </it>&lt; 0.05) worse than the best tree according to each of the tests (Table <tblr tid="T2">2</tblr>).</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Hypothesis-specific constraint tree for the rhomboid family</p>
               </caption>
               <text>
                  <p>Hypothesis-specific constraint tree for the rhomboid family. <b>(a) </b>Hypothesis 1, polyphyletic origin of eukaryotic rhomboids from prokaryotic progenitors. The RHO and PARL subfamilies are denoted; the remaining clusters include prokaryotic rhomboids designated as in Figure <figr fid="F2">2</figr> (with 'a' added to the GI number). Within each cluster, the branches were collapsed into a multifurcation. <b>(b) </b>Hypothesis 2, monophyletic origin of eukaryotic rhomboids. All eukaryotic and prokaryotic sequences were collapsed into the two respective clusters. The trees are unrooted, although shown in a pseudorooted form.</p>
               </text>
               <graphic file="gb-2003-4-3-r19-3"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Statistical comparisons of the best neighbor-joining tree with the hypothesis 1 and hypothesis 2 trees</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c cspan="6" ca="left">
                        <p>
									Kishino-Hasegawa test
								</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tree</p>
                     </c>
                     <c ca="center">
                        <p>Length</p>
                     </c>
                     <c ca="center">
                        <p>Length difference</p>
                     </c>
                     <c ca="center">
                        <p>SD (difference)</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>t</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>p</it>*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Best</p>
                     </c>
                     <c ca="center">
                        <p>4951</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 1</p>
                     </c>
                     <c ca="center">
                        <p>4966</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>11.9</p>
                     </c>
                     <c ca="center">
                        <p>1.26</p>
                     </c>
                     <c ca="center">
                        <p>0.211</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 2</p>
                     </c>
                     <c ca="center">
                        <p>4974</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>10.8</p>
                     </c>
                     <c ca="center">
                        <p>2.12</p>
                     </c>
                     <c ca="center">
                        <p>0.036</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6" ca="left">
                        <p>
									Templeton (Wilcoxon signed-ranks) test
								</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tree</p>
                     </c>
                     <c ca="center">
                        <p>Length</p>
                     </c>
                     <c ca="center">
                        <p>Rank sums</p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>N</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <it>z</it>
                        </p>
                     </c>
                     <c ca="center">
                        <p><it>p</it>*</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Best</p>
                     </c>
                     <c ca="center">
                        <p>4951</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 1</p>
                     </c>
                     <c ca="center">
                        <p>4966</p>
                     </c>
                     <c ca="center">
                        <p>1418.0</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                     <c ca="center">
                        <p>-1.33</p>
                     </c>
                     <c ca="center">
                        <p>0.185</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>-997.0</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 2</p>
                     </c>
                     <c ca="center">
                        <p>4974</p>
                     </c>
                     <c ca="center">
                        <p>1244.5</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>-1.97</p>
                     </c>
                     <c ca="center">
                        <p>0.048</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>-708.5</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6" ca="left">
                        <p>
									Winning-sites (sign) test
								</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Tree</p>
                     </c>
                     <c ca="center">
                        <p>Length</p>
                     </c>
                     <c ca="center">
                        <p>Counts</p>
                     </c>
                     <c ca="center">
                        <p><it>p</it>*</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Best</p>
                     </c>
                     <c ca="center">
                        <p>4951</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 1</p>
                     </c>
                     <c ca="center">
                        <p>4966</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                     <c ca="center">
                        <p>0.810</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>-33</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Hypothesis 2</p>
                     </c>
                     <c ca="center">
                        <p>4974</p>
                     </c>
                     <c ca="center">
                        <p>40</p>
                     </c>
                     <c ca="center">
                        <p>0.031</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>-22</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*Probability of getting a more extreme test statistic under the null hypothesis of no difference between the two trees (two-tailed test).</p>
               </tblfn>
            </tbl>
            <p>The concordance of the results obtained with several independent methods for phylogenetic tree construction and statistical analysis specifically aimed at testing the alternative hypothesis of monophyletic origin of eukaryotic rhomboids shows strong support for the major aspects of the tree topology in Figure <figr fid="F2">2</figr> and, in particular, for the polyphyly of eukaryotic rhomboids.</p>
            <p>The phylogenetic tree of the rhomboid family shown in Figure <figr fid="F2">2</figr> and supported by the additional tests described above follows neither the 'standard model' scenario <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, with the major split between the archaeo-eukaryotic and bacterial lineages nor the 'mitochondrial' scenario, which postulates acquisition of a gene by eukaryotes from the pro-mitochondrial endosymbiont. Neither can this tree be explained by postulating a small number of lineage-specific gene losses. The parsimonious interpretation of the rhomboid family tree seems to be that the evolutionary history of this family had been replete with horizontal gene transfer (HGT) and lineage-specific gene loss events. In particular, in spite of the presence of rhomboids in the majority of modern life forms from all three primary superkingdoms, phylogenetic analysis suggests that this family has not been inherited from the last universal common ancestor (LUCA). Instead, the tree topology seems to indicate that this family emerged in some bacterial lineage and afterwards had been widely disseminated by HGT, and then lost in some lineages. Both archaea and eukaryotes seem to have acquired rhomboids on several independent occasions. In particular, at least two HGT events seem to have contributed to the origin of eukaryotic rhomboids, one of them yielding the RHO subfamily and the other one the PARL subfamily, with a possible additional HGT in plants (Figures <figr fid="F2">2</figr>,<figr fid="F3">3</figr>).</p>
            <p>Given the broad phyletic representation of both subfamilies of eukaryotic rhomboids, both the RHO subfamily and the PARL subfamily must have been acquired through HGT at an early stage of eukaryotic evolution, definitely before the divergence of the major crown-group lineages. This early epoch in eukaryotic evolution is thought to have been dominated by HGT from multiple bacterial symbionts <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>.</p>
            <p>An alternative to this multiple-HGT scenario is that LUCA already had multiple, paralogous rhomboids, which evolved by a series of ancient gene duplications, and the odd topology of the phylogenetic tree is due primarily to differential loss of these ancient paralogs. Although this cannot be ruled out formally, this hypothesis implies the existence of an elaborate signaling system in LUCA and, accordingly, suggests that LUCA was a complex organism, which might have had as many genes as modern bacteria. Theoretical analysis of evolutionary scenarios constructed on the basis of the phyletic patterns of COGs by applying the parsimony principle shows that the complexity of the inferred gene set of LUCA critically depends on the relative rates of gene loss and HGT at the early stages of evolution <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. A complex LUCA with around 2,000 genes is predicted only when one assumes that the rate of gene loss is an order of magnitude greater than the rate of HGT. However, explicit reconstruction of the gene set of LUCA under the assumption of equal rates of gene loss and HGT leads to a hypothetical genome that consists of only around 600 genes but appears to be 'compatible with life', that is, it includes genes responsible for most, if not all, essential cellular functions <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. We currently believe that this is the most realistic, albeit inevitably imprecise, reconstruction of LUCA's gene set. With respect to the rhomboid family and other families whose phylogenetic trees show similar patterns, this makes the multiple-HGT interpretation the scenario of choice. Further theoretical, comparative-genomic and experimental analyses aimed at determining relative rates of gene loss and HGT will help in a more objective assessment of the validity of this argument.</p>
            <p>The multiple-HGT interpretation of the evolutionary history of the rhomboid family, while supported by the above argument, seems, at least at first glance, distinctly counter-intuitive, given that this family is nearly ubiquitous among extant life forms. Indeed, when attempts are made to construct parsimonious evolutionary scenarios on the basis of phyletic patterns alone <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp>, there is no chance that such a widespread family is not assigned to LUCA. It should be realized, however, that these approaches are inherently probabilistic, and extensive HGT can fool them <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. For the rhomboid family, the multiple-HGT mode of evolution seems to be particularly plausible. It seems likely that the ultimate ancestor of the rhomboid family evolved from a nonenzymatic integral membrane protein, probably a transporter that might have been involved in an early primitive form of export of signaling peptides in bacteria. The protease active center might have evolved in such a transporter by chance emergence of the suitable catalytic amino acids within two or three of the TMHs (Figure <figr fid="F4">4</figr>). This would enable the transition from simple transport to the RIP mode of controlled export of signaling molecules. Emergence of RIP could have conferred a major selective advantage on the respective bacteria and might have resulted in an evolutionary sweep whereby the gene carrying this trait was repeatedly fixed, rather than eliminated, after HGT. In terms of the evolution of sequence itself, the requirements for the conservation of the protease activity apparently 'locked' the rhomboid family in a regime of relatively slow evolution, which ensures significant sequence similarity between all family members (Figure <figr fid="F1">1</figr>). The scenario of origin from non-catalytic transporters might potentially apply to other integral membrane enzymes, including intramembrane proteases involved in RIP, such as presenilins and their homologs <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp> and the archaeo-eukaryotic signal peptide peptidase <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>A hypothetical scenario for the origin and dissemination of the rhomboid family proteases</p>
               </caption>
               <text>
                  <p>A hypothetical scenario for the origin and dissemination of the rhomboid family proteases. The figure schematically shows the proposed three stages of evolution of the rhomboid family. In <b>(a)</b>, the progenitor of the rhomboid family functions as a transporter for a regulatory peptide in some bacterial lineage. In <b>(b)</b>, the catalytic site of the intramembrane protease evolves, allowing the switch to RIP as the mechanism of the regulatory peptide release. In <b>(c)</b>, the emergence of RIP is followed by a burst of HGT. R, regulatory peptide. The transmembrane helices of rhomboid are designated as in Figure <figr fid="F1">1</figr>; their topology in the membrane is based on that proposed in <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. The catalytic histidine and serine are shown and connected by a dotted line to indicate the proposed charge-relay system of the protease; possible ancillary catalytic residues are not shown.</p>
               </text>
               <graphic file="gb-2003-4-3-r19-4"/>
            </fig>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>The rhomboid family might be the most widespread and conserved group of integral membrane proteins. In and by itself, this would suggest that this family is part of the gene repertoire of LUCA. However, phylogenetic analysis suggests a different scenario, one of emergence in a bacterial lineage with subsequent multiple, independent HGT events and gene losses. Although caution is due in the evolutionary interpretation of phylogenetic trees for large families, particularly when membrane proteins with a relatively small number of conserved positions, such as the rhomboids, are involved, the multiple-HGT scenario seemed to be supported by several methods of tree analysis and statistical tests.</p>
         <p>Eukaryotes probably acquired their two major rhomboid subfamilies, RHO and PARL, as the result of two independent, early HGT events. These events, which might have introduced RIP as a means of intercellular communication, could have been pivotal in the evolution of eukaryotic multicellularity along the lines discussed previously with regard to the apparent bacterial origin of key components of eukaryotic programmed cell death machinery <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. Subsequent evolution of rhomboids in eukaryotes proceeded by lineage-specific expansion of paralogs <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> followed by diversification through the addition of an extra TMH in different positions relative to the catalytic core, some limited domain accretion (see Figure <figr fid="F2">2</figr>) and sequence divergence.</p>
         <p>Phylogenetic analysis of the rhomboid family described here carries a general message for studies aimed at the reconstruction of ancestral life forms, particularly LUCA. Although most of the (nearly) ubiquitous protein families probably do derive from LUCA, explicit phylogenetic analysis is required to ascertain this in each case.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <p>The nonredundant (NR) protein sequence database at the National Center for Biotechnology Information (NIH, Bethesda) was searched iteratively using the PSI-BLAST program with multiple starting queries <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. PSI-BLAST was normally run with expectation (E) value of 0.01 as the cut-off for inclusion of sequences into the position-specific scoring matrix. Multiple alignments of protein sequences were constructed using the ClustalW program <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and manually adjusted on the basis of the examination of PSI-BLAST search outputs and the superposition of the predicted TMHs, which were identified using the programs TMpred <abbrgrp><abbr bid="B39">39</abbr></abbrgrp> and TMAP <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>.</p>
         <p>Phylogenetic trees were built using the least-squares method <abbrgrp><abbr bid="B41">41</abbr></abbrgrp> implemented in the FITCH program of the PHYLIP package <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, with subsequent local rearrangement using the PROTML program of the MOLPHY package to obtain the maximum likelihood tree <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>. The reliability of the tree topology was assessed using the RELL (resampling of estimated log-likelihoods) bootstrap method of MOLPHY, with 10,000 replications <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. Alternative placements of selected clades in maximum-likelihood trees were compared by using the rearrangement optimization method (Kishino-Hasegawa test) as implemented in the ProtML program <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Maximum parsimony trees were constructed using the heuristic search option of PAUP* <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. In addition, trees were constructed by Bayesian inference using the Markov chain Monte Carlo method as implemented in the MRBAYES package <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B26">26</abbr></abbrgrp>. The complete alignment information, including columns with gaps, was used for the MRBAYES analysis.</p>
         <p>Constraint trees for phylogenetic hypothesis testing were generated using the TreeView program <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>. Constraint trees were imported into PAUP* <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and subjected to neighbor-joining search to generate the phylogenies corresponding to alternative hypotheses. These phylogenies were compared using the KH <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, Templeton (Wilcoxon signed-ranks) <abbrgrp><abbr bid="B47">47</abbr></abbrgrp> and Winning-sites (sign) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> tests implemented in PAUP*.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>L.P. is supported by a grant from the Natural Sciences and Engineering Research Council of Canada.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>The COG database: new developments in phylogenetic classification of proteins from complete genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Tatusov</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Natale</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Garkavtsev</snm>
                  <fnm>IV</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>TA</fnm>
               </au>
               <au>
                  <snm>Shankavaram</snm>
                  <fnm>UT</fnm>
               </au>
               <au>
                  <snm>Rao</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Kiryutin</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>ND</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>22</fpage>
            <lpage>28</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29819</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125040</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.22</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The <it>Drosophila </it>rhomboid gene mediates the localized formation of wing veins and interacts genetically with components of the EGF-R signaling pathway.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Roark</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bier</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>1993</pubdate>
            <volume>7</volume>
            <fpage>961</fpage>
            <lpage>973</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8504935</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The <it>Drosophila </it>rhomboid protein is concentrated in patches at the apical cell surface.</p>
            </title>
            <aug>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Roark</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>O'Neill</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Biehs</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Colley</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Bier</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>1996</pubdate>
            <volume>174</volume>
            <fpage>298</fpage>
            <lpage>309</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.1996.0075</pubid>
                  <pubid idtype="pmpid" link="fulltext">8631502</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>rhomboid and Star interact synergistically to promote EGFR/MAPK signaling during <it>Drosophila </it>wing vein development.</p>
            </title>
            <aug>
               <au>
                  <snm>Guichard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Biehs</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sturtevant</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Wickline</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chacko</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Howard</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Bier</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Development</source>
            <pubdate>1999</pubdate>
            <volume>126</volume>
            <fpage>2663</fpage>
            <lpage>2676</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10331978</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Sequence analysis of eukaryotic developmental proteins: ancient and novel domains.</p>
            </title>
            <aug>
               <au>
                  <snm>Mushegian</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1996</pubdate>
            <volume>144</volume>
            <fpage>817</fpage>
            <lpage>828</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8889542</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>PAMP and PARL, two novel putative metalloproteases interacting with the COOH-terminus of Presenilin-1 and -2.</p>
            </title>
            <aug>
               <au>
                  <snm>Pellegrini</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Passer</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Canelles</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Lefterov</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Ganjei</snm>
                  <fnm>JK</fnm>
               </au>
               <au>
                  <snm>Fowlkes</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>D'Adamio</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>J Alzheimers Dis</source>
            <pubdate>2001</pubdate>
            <volume>3</volume>
            <fpage>181</fpage>
            <lpage>190</lpage>
            <xrefbib>
               <pubid idtype="pmpid">12214059</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p><it>Drosophila</it> rhomboid-1 defines a family of putative intramembrane serine proteases.</p>
            </title>
            <aug>
               <au>
                  <snm>Urban</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2001</pubdate>
            <volume>107</volume>
            <fpage>173</fpage>
            <lpage>182</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">11672525</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>EGF receptor signalling: roles of star and rhomboid revealed.</p>
            </title>
            <aug>
               <au>
                  <snm>Klambt</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>R21</fpage>
            <lpage>R23</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00642-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">11790319</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>brother of rhomboid, a rhomboid-related gene expressed during early <it>Drosophila </it>oogenesis, promotes EGF-R/MAPK signaling.</p>
            </title>
            <aug>
               <au>
                  <snm>Guichard</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Roark</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ronshaugen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bier</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Dev Biol</source>
            <pubdate>2000</pubdate>
            <volume>226</volume>
            <fpage>255</fpage>
            <lpage>266</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/dbio.2000.9851</pubid>
                  <pubid idtype="pmpid" link="fulltext">11023685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>A family of rhomboid-like genes: <it>Drosophila </it>rhomboid-1 and roughoid/rhomboid-3 cooperate to activate EGF receptor signaling.</p>
            </title>
            <aug>
               <au>
                  <snm>Wasserman</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Urban</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genes Dev</source>
            <pubdate>2000</pubdate>
            <volume>14</volume>
            <fpage>1651</fpage>
            <lpage>1663</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10887159</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Regulated intramembrane proteolysis: a control mechanism conserved from bacteria to humans.</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Ye</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rawson</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>JL</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2000</pubdate>
            <volume>100</volume>
            <fpage>391</fpage>
            <lpage>398</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10693756</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Intramembrane proteolysis controls diverse signalling pathways throughout evolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Urban</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>512</fpage>
            <lpage>518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00334-9</pubid>
                  <pubid idtype="pmpid" link="fulltext">12200155</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Two transmembrane aspartates in presenilin-1 required for presenilin endoproteolysis and gamma-secretase activity.</p>
            </title>
            <aug>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Xia</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Ostaszewski</snm>
                  <fnm>BL</fnm>
               </au>
               <au>
                  <snm>Diehl</snm>
                  <fnm>TS</fnm>
               </au>
               <au>
                  <snm>Kimberly</snm>
                  <fnm>WT</fnm>
               </au>
               <au>
                  <snm>Selkoe</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1999</pubdate>
            <volume>398</volume>
            <fpage>513</fpage>
            <lpage>517</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/19077</pubid>
                  <pubid idtype="pmpid" link="fulltext">10206644</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Glycine 384 is required for presenilin-1 function and is conserved in bacterial polytopic aspartyl proteases.</p>
            </title>
            <aug>
               <au>
                  <snm>Steiner</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Kostka</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Romig</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Basset</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pesold</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hardy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Capell</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Meyn</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Grim</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Baumeister</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Cell Biol</source>
            <pubdate>2000</pubdate>
            <volume>2</volume>
            <fpage>848</fpage>
            <lpage>851</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35041097</pubid>
                  <pubid idtype="pmpid" link="fulltext">11056541</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Computational analysis of human disease-associated genes and their protein products.</p>
            </title>
            <aug>
               <au>
                  <snm>Sreekumar</snm>
                  <fnm>KR</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <fpage>247</fpage>
            <lpage>257</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(00)00186-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11377959</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Characterization of <it>aarA</it>, a pleiotrophic negative regulator of the 2'-N-acetyltransferase in <it>Providencia stuartii</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Rather</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Orosz</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1994</pubdate>
            <volume>176</volume>
            <fpage>5140</fpage>
            <lpage>5144</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">196357</pubid>
                  <pubid idtype="pmpid">8051030</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p><it>Providencia stuartii </it>genes activated by cell-to-cell signaling and identification of a gene required for production or activity of an extracellular factor.</p>
            </title>
            <aug>
               <au>
                  <snm>Rather</snm>
                  <fnm>PN</fnm>
               </au>
               <au>
                  <snm>Ding</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Baca-DeLancey</snm>
                  <fnm>RR</fnm>
               </au>
               <au>
                  <snm>Siddiqui</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1999</pubdate>
            <volume>181</volume>
            <fpage>7185</fpage>
            <lpage>7191</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">103678</pubid>
                  <pubid idtype="pmpid" link="fulltext">10572119</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p><it>Providencia </it>may help find a function for a novel, widespread protein family.</p>
            </title>
            <aug>
               <au>
                  <snm>Gallio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kylsten</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>R693</fpage>
            <lpage>R694</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(00)00722-3</pubid>
                  <pubid idtype="pmpid" link="fulltext">11050401</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Conservation of intramembrane proteolytic activity and substrate specificity in prokaryotic and eukaryotic rhomboids.</p>
            </title>
            <aug>
               <au>
                  <snm>Urban</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Schlieper</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Freeman</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1507</fpage>
            <lpage>1512</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(02)01092-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12225666</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>A conserved mechanism for extracellular signaling in eukaryotes and prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Gallio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sturgill</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rather</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Kylsten</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>12208</fpage>
            <lpage>12213</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.192138799</pubid>
                  <pubid idtype="pmpid" link="fulltext">12221285</pubid>
                  <pubid idtype="pmcid">129423</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>A novel two-step mechanism for removal of a mitochondrial signal sequence involves the mAAA complex and the putative rhomboid protease Pcp1.</p>
            </title>
            <aug>
               <au>
                  <snm>Esser</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Tursun</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ingenhoven</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Michaelis</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Pratje</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2002</pubdate>
            <volume>323</volume>
            <fpage>835</fpage>
            <lpage>843</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0022-2836(02)01000-8</pubid>
                  <pubid idtype="pmpid" link="fulltext">12417197</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>COGS: phylogenetic classification of proteins encoded in complete genomes</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/COG</url>
         </bibl>
         <bibl id="B23">
            <title>
               <p>The role of lineage-specific gene family expansion in the evolution of eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lespinet</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wolf</snm>
                  <fnm>YI</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1048</fpage>
            <lpage>1059</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186617</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097341</pubid>
                  <pubid idtype="doi">10.1101/gr.174302</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Bayesian inference of phylogeny and its impact on evolutionary biology.</p>
            </title>
            <aug>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Nielsen</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bollback</snm>
                  <fnm>JP</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>2310</fpage>
            <lpage>2314</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1065889</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743192</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Potential applications and pitfalls of bayesian inference of phylogeny.</p>
            </title>
            <aug>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Larget</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Syst Biol</source>
            <pubdate>2002</pubdate>
            <volume>51</volume>
            <fpage>673</fpage>
            <lpage>688</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1080/10635150290102366</pubid>
                  <pubid idtype="pmpid" link="fulltext">12396583</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>MRBAYES: Bayesian inference of phylogenetic trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Huelsenbeck</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Ronquist</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <fpage>754</fpage>
            <lpage>755</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.8.754</pubid>
                  <pubid idtype="pmpid" link="fulltext">11524383</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>PAUP*: Phylogenetic Analysis Using Parsimony (*and other methods)</p>
            </title>
            <aug>
               <au>
                  <snm>Swofford</snm>
                  <fnm>DL</fnm>
               </au>
            </aug>
            <publisher>Sunderland, MA: Sinauer</publisher>
            <pubdate>1998</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Archaea and the prokaryote-to-eukaryote transition.</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1997</pubdate>
            <volume>61</volume>
            <fpage>456</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">232621</pubid>
                  <pubid idtype="pmpid" link="fulltext">9409149</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya.</p>
            </title>
            <aug>
               <au>
                  <snm>Woese</snm>
                  <fnm>CR</fnm>
               </au>
               <au>
                  <snm>Kandler</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Wheelis</snm>
                  <fnm>ML</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1990</pubdate>
            <volume>87</volume>
            <fpage>4576</fpage>
            <lpage>4579</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">54159</pubid>
                  <pubid idtype="pmpid" link="fulltext">2112744</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>You are what you eat: a gene transfer ratchet could account for bacterial genes in eukaryotic nuclear genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>307</fpage>
            <lpage>311</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(98)01494-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">9724962</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Sequence - Evolution - Function. Computational Approaches in Comparative Genomics</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
            </aug>
            <publisher>Boston: Kluwer</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Mirkin</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Fenner</snm>
                  <fnm>TI</fnm>
               </au>
               <au>
                  <snm>Galperin</snm>
                  <fnm>MY</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
            </aug>
            <source>BMC Evol Biol</source>
            <pubdate>2003</pubdate>
            <volume>3</volume>
            <fpage>2</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">149225</pubid>
                  <pubid idtype="pmpid" link="fulltext">12515582</pubid>
                  <pubid idtype="doi">10.1186/1471-2148-3-2</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Genomes in flux: the evolution of archaeal and proteobacterial gene content.</p>
            </title>
            <aug>
               <au>
                  <snm>Snel</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Huynen</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>17</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.176501</pubid>
                  <pubid idtype="pmpid" link="fulltext">11779827</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Prokaryotic evolution in light of gene transfer.</p>
            </title>
            <aug>
               <au>
                  <snm>Gogarten</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Doolittle</snm>
                  <fnm>WF</fnm>
               </au>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>JG</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2002</pubdate>
            <volume>19</volume>
            <fpage>2226</fpage>
            <lpage>2238</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12446813</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Identification of signal peptide peptidase, a presenilin-type aspartic protease.</p>
            </title>
            <aug>
               <au>
                  <snm>Weihofen</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Binns</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Lemberg</snm>
                  <fnm>MK</fnm>
               </au>
               <au>
                  <snm>Ashman</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Martoglio</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>2215</fpage>
            <lpage>2218</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1070925</pubid>
                  <pubid idtype="pmpid" link="fulltext">12077416</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Origin and evolution of eukaryotic apoptosis: the bacterial connection.</p>
            </title>
            <aug>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Cell Death Differ</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>394</fpage>
            <lpage>404</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.cdd.4400991</pubid>
                  <pubid idtype="pmpid" link="fulltext">11965492</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="pmcid">146917</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.</p>
            </title>
            <aug>
               <au>
                  <snm>Thompson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
               <au>
                  <snm>Gibson</snm>
                  <fnm>TJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>4673</fpage>
            <lpage>4680</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7984417</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>TMbase - A database of membrane spanning protein segments.</p>
            </title>
            <aug>
               <au>
                  <snm>Hofmann</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Stoffel</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Biol Chem Hoppe-Seyler</source>
            <pubdate>1993</pubdate>
            <volume>374</volume>
            <fpage>166</fpage>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Prediction of membrane protein topology utilizing multiple sequence alignments.</p>
            </title>
            <aug>
               <au>
                  <snm>Persson</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Argos</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Protein Chem</source>
            <pubdate>1997</pubdate>
            <volume>16</volume>
            <fpage>453</fpage>
            <lpage>457</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1023/A:1026353225758</pubid>
                  <pubid idtype="pmpid">9246628</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>Construction of phylogenetic trees.</p>
            </title>
            <aug>
               <au>
                  <snm>Fitch</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Margoliash</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1967</pubdate>
            <volume>155</volume>
            <fpage>279</fpage>
            <lpage>284</lpage>
            <xrefbib>
               <pubid idtype="pmpid">5334057</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods.</p>
            </title>
            <aug>
               <au>
                  <snm>Felsenstein</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Methods Enzymol</source>
            <pubdate>1996</pubdate>
            <volume>266</volume>
            <fpage>418</fpage>
            <lpage>427</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8743697</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>MOLPHY: Programs for Molecular Phylogenetics</p>
            </title>
            <aug>
               <au>
                  <snm>Adachi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <publisher>Tokyo: Institute of Statistical Mathematics</publisher>
            <pubdate>1992</pubdate>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Maximum likelihood inference of protein phylogeny and the origin of chloroplasts.</p>
            </title>
            <aug>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Miyata</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1990</pubdate>
            <volume>31</volume>
            <fpage>151</fpage>
            <lpage>160</lpage>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea.</p>
            </title>
            <aug>
               <au>
                  <snm>Kishino</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hasegawa</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1989</pubdate>
            <volume>29</volume>
            <fpage>170</fpage>
            <lpage>179</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2509717</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>TreeView: an application to display phylogenetic trees on personal computers.</p>
            </title>
            <aug>
               <au>
                  <snm>Page</snm>
                  <fnm>RD</fnm>
               </au>
            </aug>
            <source>Comput Appl Biosci</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>357</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8902363</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the humans and apes.</p>
            </title>
            <aug>
               <au>
                  <snm>Templeton</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Evolution</source>
            <pubdate>1983</pubdate>
            <volume>37</volume>
            <fpage>221</fpage>
            <lpage>244</lpage>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences.</p>
            </title>
            <aug>
               <au>
                  <snm>Prager</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Wilson</snm>
                  <fnm>AC</fnm>
               </au>
            </aug>
            <source>J Mol Evol</source>
            <pubdate>1988</pubdate>
            <volume>27</volume>
            <fpage>326</fpage>
            <lpage>335</lpage>
            <xrefbib>
               <pubid idtype="pmpid">3146643</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
