<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2004-5-2-r7</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Koonin</snm>
					<mi>V</mi>
					<fnm>Eugene</fnm>
					<insr iid="I1"/>
					<email>koonin@ncbi.nlm.nih.gov</email>
				</au>
				<au id="A2">
					<snm>Fedorova</snm>
					<mi>D</mi>
					<fnm>Natalie</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A3">
					<snm>Jackson</snm>
					<mi>D</mi>
					<fnm>John</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A4">
					<snm>Jacobs</snm>
					<mi>R</mi>
					<fnm>Aviva</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A5">
					<snm>Krylov</snm>
					<mi>M</mi>
					<fnm>Dmitri</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A6">
					<snm>Makarova</snm>
					<mi>S</mi>
					<fnm>Kira</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A7">
					<snm>Mazumder</snm>
					<fnm>Raja</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
				</au>
				<au id="A8">
					<snm>Mekhedov</snm>
					<mi>L</mi>
					<fnm>Sergei</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A9">
					<snm>Nikolskaya</snm>
					<mi>N</mi>
					<fnm>Anastasia</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A10">
					<snm>Rao</snm>
					<mnm>Sridhar</mnm>
					<fnm>B</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A11">
					<snm>Rogozin</snm>
					<mi>B</mi>
					<fnm>Igor</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A12">
					<snm>Smirnov</snm>
					<fnm>Sergei</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A13">
					<snm>Sorokin</snm>
					<mi>V</mi>
					<fnm>Alexander</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A14">
					<snm>Sverdlov</snm>
					<mi>V</mi>
					<fnm>Alexander</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A15">
					<snm>Vasudevan</snm>
					<fnm>Sona</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A16">
					<snm>Wolf</snm>
					<mi>I</mi>
					<fnm>Yuri</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A17">
					<snm>Yin</snm>
					<mi>J</mi>
					<fnm>Jodie</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A18">
					<snm>Natale</snm>
					<mi>A</mi>
					<fnm>Darren</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA</p>
				</ins>
				<ins id="I2">
					<p>Current address: Protein Identification Resource, Georgetown University Medical Center, 3900 Reservoir Road, NW, Washington, DC 20007, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2004</pubdate>
			<volume>5</volume>
			<issue>2</issue>
			<fpage>R7</fpage>
			<url>http://genomebiology.com/2004/5/2/R7</url>
			<xrefbib>
				<pubid idtype="pmpid">14759257</pubid>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>23</day>
					<month>10</month>
					<year>2003</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>1</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>4</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>15</day>
					<month>1</month>
					<year>2004</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2004</year>
			<collab>Koonin et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
		</cpyrt>
		<shorttitle>
			<p>A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes</p>
		</shorttitle>
		<shortabs>
			<p>We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: <it>Caenorhabditis elegans</it>, <it>Drosophila melanogaster</it>, <it>Homo sapiens</it>, <it>Arabidopsis thaliana</it>, <it>Saccharomyces cerevisiae</it>, <it>Schizosaccharomyces pombe </it>and <it>Encephalitozoon cuniculi</it>. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010015">Model organisms</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Comparative analysis of genomes from distant species provides new insights into gene functions, genome evolution and phylogeny. In particular, the comparative genomics of prokaryotes has revealed previously underappreciated major trends in genome evolution, namely, extensive lineage-specific gene loss and horizontal gene transfer (HGT) <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. To efficiently extract functional and evolutionary information from multiple genomes, rational classification of genes based on homologous relationships is indispensable. The two principal classes of homologs are orthologs and paralogs <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. Orthologs are defined as homologous genes that evolved via vertical descent from a single ancestral gene in the last common ancestor of the compared species. Paralogs are homologous genes, which, at some stage of evolution, have evolved by duplication of an ancestral gene. Orthology and paralogy are intimately linked because, if a duplication (or a series of duplications) occurs after the speciation event that separated the compared species, orthology becomes a relationship between sets of paralogs, rather than individual genes (in which case, such genes are called co-orthologs).</p>
			<p>Correct identification of orthologs and paralogs is of central importance for both the functional and evolutionary aspects of comparative genomics <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. Orthologs typically occupy the same functional niche in different organisms; in contrast, paralogs evolve to functional diversification as they diverge after the duplication <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>. Therefore, robustness of genome annotation depends on accurate identification of orthologs. A clear demarcation of orthologs and paralogs is also required for constructing evolutionary scenarios, which include, along with vertical inheritance, lineage-specific gene loss and HGT <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>.</p>
			<p>In principle, orthologs, including co-orthologs, should be identified by means of phylogenetic analysis of entire families of homologous proteins, which is expected to define orthologous protein sets as clades <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr></abbrgrp>. However, for genome-wide protein sets, such analysis remains extremely labor-intensive, and error-prone as well. Accordingly, procedures have been developed for identifying sets of likely orthologs without explicit referral to phylogenetic analysis. These procedures are based on the notion of a genome-specific best hit (BeT), that is, the protein from a target genome that is most similar (typically in terms of similarity scores computed using BLAST or another sequence-comparison method) to a given protein from the query genome <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. The assumption central to this approach is that orthologs have a greater similarity to each other than to any other protein from the respective genomes. When multiple genomes are analyzed, pairs of probable orthologs detected on the basis of BeTs are combined into orthologous clusters represented in all or a subset of the analyzed genomes <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B22">22</abbr></abbrgrp>. This approach, amended with additional procedures for detecting co-orthologous protein sets and for treating multidomain proteins, was implemented in the database of Clusters of Orthologous Groups (COGs) of proteins <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. The current COG set includes approximately 70% of the proteins encoded in 69 genomes of prokaryotes and unicellular eukaryotes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The COGs have been used for functional annotation of new genomes <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr></abbrgrp>, target selection in structural genomics <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr><abbr bid="B32">32</abbr></abbrgrp>, identification of potential drug targets <abbrgrp><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp> and genome-wide evolutionary studies <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B13">13</abbr><abbr bid="B35">35</abbr><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr><abbr bid="B38">38</abbr></abbrgrp>. Sonnhammer and co-workers independently developed a similar methodology for identification of co-orthologous protein sets from pairwise genome comparisons and applied it to the sequenced eukaryotic genomes <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
			<p>A central notion introduced in the context of the COG analysis is that of a phyletic pattern, that is, the pattern of representation (presence-absence) of analyzed species in each COG <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B20">20</abbr></abbrgrp>. Similar concepts have been independently developed and applied by others <abbrgrp><abbr bid="B40">40</abbr><abbr bid="B41">41</abbr></abbrgrp>. The COGs show a remarkable scatter of phyletic patterns, with only a small minority represented in all sequenced genomes. A recent quantitative study showed that parsimonious evolutionary scenarios for most COGs involve multiple events of gene loss and HGT <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Both similarity and complementarity among the phyletic patterns of COGs, in conjunction with other information, such as conservation of gene order, have been successfully employed to predict gene functions <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B42">42</abbr><abbr bid="B43">43</abbr></abbrgrp>. The comparison of phyletic pattern has been formalized in set-theoretical algorithms and systematically applied to the computational and experimental analysis of bacterial flagellar systems, which demonstrated the considerable robustness of this approach <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>.</p>
			<p>We recently extended the system of orthologous protein clusters to complex, multicellular eukaryotes <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. Here, we examine the phyletic patterns of KOGs in connection with known and predicted protein functions. In-depth analysis of some of these KOGs resulted in prediction of previously uncharacterized, but apparently essential, conserved eukaryotic protein functions. We also reconstruct the parsimonious scenario of evolution of the crown-group eukaryotes by assigning the loss of genes (KOGs) and emergence of new genes to the branches of the phylogenetic tree and explicitly delineate the minimal gene sets for various ancestral forms. To our knowledge, this is the first systematic, genome-wide examination of the sets of orthologous genes in eukaryotes.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>KOGs for seven sequenced eukaryotic genomes: functional and evolutionary implications of phyletic patterns</p>
				</st>
				<p>Eukaryotic KOGs were constructed on the basis of the comparison of proteins encoded in the genomes of three animals (<it>Homo sapiens </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, the fruit fly <it>Drosophila melanogaster </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and the nematode <it>Caenorhabditis elegans </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>), the green plant <it>Arabidopsis thaliana </it>(thale cress) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, two fungi (budding yeast <it>Saccharomyces cerevisiae </it><abbrgrp><abbr bid="B49">49</abbr></abbrgrp> and fission yeast <it>Schizosaccharomyces pombe </it><abbrgrp><abbr bid="B50">50</abbr></abbrgrp>) and the microsporidian <it>Encephalitozoon cuniculi </it><abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. The procedure for KOG construction was a modification of the one previously used for COGs <abbrgrp><abbr bid="B20">20</abbr><abbr bid="B24">24</abbr></abbrgrp> and is described in greater detail elsewhere (<abbrgrp><abbr bid="B25">25</abbr></abbrgrp>; see also Materials and methods). An important difference stems from the fact that complex eukaryotes encode many more multidomain proteins than prokaryotes and, furthermore, orthologous eukaryotic proteins often differ in domain composition, with additional domains accrued in more complex forms <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B45">45</abbr></abbrgrp>. Accordingly, and unlike the original COG construction procedure, probable orthologs with different domain architectures were assigned to one KOG and were not split if they shared a common core of domains. In addition to the KOGs, which consisted of at least three species, clusters of putative orthologs from two species (TWOGs) and lineage-specific expansions (LSEs) of paralogs from each of the analyzed genomes were identified (<abbrgrp><abbr bid="B25">25</abbr><abbr bid="B52">52</abbr></abbrgrp>; see also Materials and methods). In most of the analyses discussed below, KOGs and TWOGs are treated together, unless otherwise specified.</p>
				<p>Figure <figr fid="F1">1</figr> shows the assignment of the proteins from each of the analyzed eukaryotes to KOGs with different numbers of species, TWOGs and LSEs. The fraction of proteins assigned to KOGs tends to decrease with the increasing genome size, from 81% for <it>S. pombe </it>to 51% for the largest, the human genome. (For reasons that remain unclear, but might be related to its intracellular parasitic lifestyle, <it>E. cuniculi </it>has a relatively small fraction of conserved proteins that belonged to KOGs: approximately 60%.) The contribution of LSEs shows the opposite trend, being the greatest in the largest genomes, that is, human and <it>Arabidopsis</it>, and minimal in the microsporidian (Figure <figr fid="F1">1</figr>). A notable difference was observed between eukaryotes in terms of their representation in KOGs found in different numbers of species. While the three unicellular organisms are represented mainly in the highly conserved seven- or six-species KOGs, a much larger fraction of the gene set in animals and <it>Arabidopsis </it>is accounted for by LSEs, and by KOGs found in three or four genomes. These include animal-specific genes and genes that are shared by plants and animals but not by fungi and the microsporidian (Figure <figr fid="F1">1</figr>). The large number of KOGs in the latter group (700 KOGs represented in <it>Arabidopsis </it>and at least two animal species) is notable and probably results from massive, lineage-specific loss of genes during eukaryotic evolution (see below).</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Assignment of proteins from each of the seven analyzed eukaryotic genomes to KOGs with different numbers of species and to LSEs</p>
					</caption>
					<text>
						<p>Assignment of proteins from each of the seven analyzed eukaryotic genomes to KOGs with different numbers of species and to LSEs. 0, Proteins without detectable homologs (singletons); 1, LSEs. Species abbreviations: Ath, <it>Arabidopsis thaliana</it>; Cel, <it>Caenorhabditis elegans</it>; Dme, <it>Drosophila melanogaster</it>; Ecu, <it>Encephalitozoon cuniculi</it>; Hsa, <it>Homo sapiens</it>; Sce, <it>Saccharomyces cerevisisae</it>; Spo, <it>Schizosaccharomyces pombe</it>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-1"/>
				</fig>
				<p>The phyletic patterns of KOGs reveal both the existence of a conserved eukaryotic gene core and substantial diversity. The 'pan-eukaryotic' genes, which are represented in each of the seven analyzed genomes, account for around 20% of the KOGs, and approximately the same number of KOGs include all species except for the microsporidian, an intracellular parasite with a highly degraded genome <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>. Among the remaining KOGs, a large group includes representatives of the three analyzed animal species (worm, fly and humans) but a substantial fraction (approximately 30%) are KOGs with unexpected patterns, for example, one animal, one plant and one fungal species (see <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> and examples in Table <tblr tid="T1">1</tblr>).</p>
				<tbl id="T1" hint_layout="double">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>KOGs and TWOGs with unexpected phyletic patterns (examples)</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c ca="left">
								<p>KOG/TWOG number</p>
							</c>
							<c ca="left">
								<p>Phyletic pattern*</p>
							</c>
							<c ca="left">
								<p>(Predicted) structure and function</p>
							</c>
							<c ca="left">
								<p>Prokaryotic homologs</p>
							</c>
							<c ca="left">
								<p>Comments</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0892</p>
							</c>
							<c ca="left">
								<p>---H--E</p>
							</c>
							<c ca="left">
								<p>Discoidin domain protein, potential regulator of proteasome activity</p>
							</c>
							<c ca="left">
								<p>Detected in a few phylogenetically scattered bacteria, no COG so far <abbrgrp><abbr bid="B69">69</abbr></abbrgrp></p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0263</p>
							</c>
							<c ca="left">
								<p>A-----E</p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocase</p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocases of chlamydia, rickettsia, <it>Xylella fastidiosa</it></p>
							</c>
							<c ca="left">
								<p>ATP/ADP translocase is a hallmark of intracellular parasites and symbionts, which allows them to scavenge ATP from the host cell; chloroplast protein in plants. Could be acquired by plants and microsporidia via independent HGT from bacteria. <abbrgrp><abbr bid="B58">58</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0689</p>
							</c>
							<c ca="left">
								<p>---HY--</p>
							</c>
							<c ca="left">
								<p>Uncharacterized protein essential for propionate metabolism</p>
							</c>
							<c ca="left">
								<p>PrpD protein of several bacteria and archaea (COG2079)</p>
							</c>
							<c ca="left">
								<p>The yeast and human (and the orthologs from other vertebrates) proteins show the greatest similarity to different subsets of bacterial orthologs, which might suggest independent HGT events.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0871</p>
							</c>
							<c ca="left">
								<p>---H-P-</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein, probably enzyme</p>
							</c>
							<c ca="left">
								<p>COG4336, sporadic representation in several bacterial lineages</p>
							</c>
							<c ca="left">
								<p>The human (and mouse) protein has an additional domain conserved in the archaeon <it>Pyrococcus</it>. Human and <it>S. pombe </it>proteins are most similar to different subsets of bacterial homologs, which suggests the possibility of independent HGT events.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>TWOG0788</p>
							</c>
							<c ca="left">
								<p>A----P-</p>
							</c>
							<c ca="left">
								<p>Urease</p>
							</c>
							<c ca="left">
								<p>Ureases of many bacterial species</p>
							</c>
							<c ca="left">
								<p>Highly conserved enzyme present in plants and many fungi but not <it>S. cerevisiae</it>. Plant and fungal ureases have a common domain architecture distinct from that of bacterial orthologs, which suggests monophyletic origin. Might have evolved via early HGT from bacteria (proto-mitochondria?) with subsequent loss in animals and some fungi.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4751</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>Recombination repair protein BRCA2, contains varying number of BRCA2 repeats</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Although sequence conservation is limited to the BRC repeats <abbrgrp><abbr bid="B101">101</abbr></abbrgrp> the number of which varies substantially, statistical significance of the observed sequence similarity and the absence of other homologs suggests that the proteins in this KOG are true orthologs. Apparent orthologs of BRCA2 are detectable also in other species from the taxa represented in the KOGs (mosquito <it>Anopheles gambiae</it>, fungus <it>Ustilago maydis</it>) <abbrgrp><abbr bid="B102">102</abbr></abbrgrp> and in early-branching eukaryotes (<it>Leishmania</it>, <it>Trypanosoma</it>; E.V.K., unpublished work), suggesting that evolution of BRCA2 involved multiple gene losses</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4597</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>TATA-binding protein 1-interacting protein</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Probable multiple gene losses</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4486</p>
							</c>
							<c ca="left">
								<p>A--H--E</p>
							</c>
							<c ca="left">
								<p>3-methyl-adenine DNA glycosylase</p>
							</c>
							<c ca="left">
								<p>Orthologs in many bacteria (COG2094)</p>
							</c>
							<c ca="left">
								<p>The plant protein and those from mammals and microsporidia show the greatest similarity to different subsets of bacterial orthologs. Evolution might have included a combination of gene loss and independent HGT events</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1594</p>
							</c>
							<c ca="left">
								<p>A-D-Y--</p>
							</c>
							<c ca="left">
								<p>Predicted epimerase related to aldose 1-epimerase</p>
							</c>
							<c ca="left">
								<p>Bacterial orthologs, primarily proteobacteria (COG0676)</p>
							</c>
							<c ca="left">
								<p>Eukaryotic proteins are more closely related to each other than to bacterial orthologs, indicating monophyletic origin. Function remains unknown; might be involved in a distinct and still uncharacterized pathway of polysaccharide biosynthesis. LSE in <it>Arabidopsis </it>(seven paralogs).</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4141</p>
							</c>
							<c ca="left">
								<p>---HYPE</p>
							</c>
							<c ca="left">
								<p>Rad52/22, protein involved in double-strand break repair</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="left">
								<p>Probable gene loss in plants, insects and nematodes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4528</p>
							</c>
							<c ca="left">
								<p>-CDH--E</p>
							</c>
							<c ca="left">
								<p>Uncharacterized predicted enzyme, possibly a polynucleotide kinase (structure of the ortholog from the bacterium <it>Thermotoga maritima </it>has been determined - pdb code 1j5u)</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea and several bacteria (COG1371)</p>
							</c>
							<c ca="left">
								<p>Context analysis of archaeal and bacterial genomes suggests functional interaction between proteins of KOG5324 and KOG4246, RNA 3'-terminal phosphate cyclase (KOG4398, COG0430), and tRNA/rRNA cytosine C5-methylase (KOG1299/COG0144) (<abbrgrp><abbr bid="B103">103</abbr></abbrgrp> and E.V.K., unpublished observations). Taken together, the observations appear to implicate KOG5324 and KOG4246 in a still uncharacterized pathway of rRNA and/or tRNA processing and modification. Conservation of these proteins in archaea and early-branching eukaryotes suggests lineage-specific gene loss in plants and fungi.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3833</p>
							</c>
							<c ca="left">
								<p>-CDH--E</p>
							</c>
							<c ca="left">
								<p>Uncharacterized predicted enzyme, possibly a polynuclotide phosphatase</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea and several bacteria (COG1690)</p>
							</c>
							<c ca="left">
								<p>See comment for KOG5324</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Abbreviations: A, thale cress <it>A. thaliana</it>; C, nematode <it>C. elegans</it>; D, fruit fly <it>D. melanogaster</it>; E, microsporidian <it>Encephalitozoon cuniculi</it>; H, <it>Homo sapiens</it>; S, budding yeast <it>S. cerevisiae</it>; P, fission yeast <it>S. pombe</it>; a letter indicates the presence of the respective species in the given KOG and a dash indicates its absence.</p>
					</tblfn>
				</tbl>
				<p>During the manual curation of the KOG set, the KOGs with unexpected patterns were scrutinized in an effort to detect potential highly diverged members from one or more of the analyzed genomes. Some of these unexpected patterns might indicate that a gene is still missing in the analyzed set of protein sequences from one or more of the species included; reports of newly discovered genes have appeared since the release of the initial reports on genome sequences of complex eukaryotes, for example, as a result of massive sequencing of human cDNAs <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>, exhaustive annotation of the <it>Drosophila </it>genome <abbrgrp><abbr bid="B55">55</abbr></abbrgrp> and comparative analysis of closely related yeast genomes <abbrgrp><abbr bid="B56">56</abbr></abbrgrp>. The unexpected phyletic patterns seem, however, largely to reflect the extensive, lineage-specific gene loss that is characteristic of eukaryotic evolution <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>; on many occasions, this scenario is supported by the presence of orthologs in other eukaryotic lineages and/or in prokaryotes (Table <tblr tid="T1">1</tblr>). However, interesting exceptions to the multiple loss explanation might exist as exemplified by the ATP/ADP-translocase, which is present in <it>Arabidopsis </it>and <it>Encephalitozoon </it>and could have evolved via independent HGT from intracellular bacterial parasites (<abbrgrp><abbr bid="B58">58</abbr></abbrgrp> and Table <tblr tid="T2">2</tblr>).</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>KOGs represented by exactly one ortholog in seven analyzed eukaryotic genomes (examples)</p>
					</caption>
					<tblbdy cols="8">
						<r>
							<c ca="left">
								<p>KOG number</p>
							</c>
							<c ca="left">
								<p>(Predicted) function</p>
							</c>
							<c ca="left">
								<p>Multiprotein complex</p>
							</c>
							<c ca="center">
								<p>Functional class*</p>
							</c>
							<c ca="left">
								<p>Prokaryotic homologs</p>
							</c>
							<c cspan="2" ca="center">
								<p>Fitness class<sup>&#8224;</sup></p>
							</c>
							<c ca="left">
								<p>Comments</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c cspan="2">
								<hr/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Yeast<sup>&#8225;</sup></p>
							</c>
							<c ca="center">
								<p>Worm<sup>&#167;</sup></p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c cspan="8" ca="left">
								<p>
									<b>Genes experimentally or computationally characterized previously</b>
								</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0392</p>
							</c>
							<c ca="left">
								<p>SNF2 family DNA-dependent ATPase</p>
							</c>
							<c ca="left">
								<p>TBP-DNA complex</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>Many bacteria and archaea (COG0553)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Involved in regulation of transcription from POL II promoters <abbrgrp><abbr bid="B104">104</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0121</p>
							</c>
							<c ca="left">
								<p>Nuclear cap-binding protein complex, subunit CBP20 (RRM-domain-containing RNA-binding protein)</p>
							</c>
							<c ca="left">
								<p>Cap-binding complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Several bacteria (COG0724)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>RRM-domain proteins show scattered presence in bacteria and might have been horizontally transferred from eukaryotes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0213</p>
							</c>
							<c ca="left">
								<p>U2-snRNP associated splicing factor 3b, subunit 1</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0227</p>
							</c>
							<c ca="left">
								<p>snRNA-associated protein, splicing factor 3a, subunit b (Prp11p)</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2268</p>
							</c>
							<c ca="left">
								<p>Predicted nucleic-acid-binding protein kinase of the RIO1 family; 40S ribosomal subunit biogenesis/18S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs in most archaea but not in bacteria (COG0478)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>One of the very small number of protein kinases that show a clear-cut orthologous relationship between all eukaryotes and most archaea, and, apparently, the only one containing a helix-turn-helix nucleic-acid-binding domain. <abbrgrp><abbr bid="B105">105</abbr></abbrgrp> Associated with yeast pre-40S subunit and required for its maturation. <abbrgrp><abbr bid="B106">106</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3031</p>
							</c>
							<c ca="left">
								<p>Protein required for 60S ribosomal subunit biogenesis; <abbrgrp><abbr bid="B107">107</abbr></abbrgrp> contains the IMP4 domain, which is involved in rRNA processing <abbrgrp><abbr bid="B108">108</abbr></abbrgrp>; paralog of KOG3095 and KOG3292, which are also represented in all analyzed genomes.</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Distantly related to COG2136, represented by orthologs in most archaea, but not in bacteria (KSM, unpublished)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>The COG2136 proteins appear to be subunits of the predicted archaeal exosome <abbrgrp><abbr bid="B109">109</abbr></abbrgrp>. Apparently, this gene has undergone at least two ancient duplications in eukaryotes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3045</p>
							</c>
							<c ca="left">
								<p>Predicted RNA methylase involved in rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Distantly related to numerous Rossmann-fold methylases but prokaryotic orthologs could not be confidently identified</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>This protein (Rrp8p in yeast) has been shown to participate in the processing of rRNA and sequence analysis reveals the presence of a Rossmann-fold methylase domain <abbrgrp><abbr bid="B110">110</abbr></abbrgrp>. Therefore Rrp8p probably methylates either snoRNA or rRNA itself.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3064</p>
							</c>
							<c ca="left">
								<p>RNA-binding nuclear protein containing a distinct C4 Zn-finger; implicated in the biogenesis of 60S ribosomal subunits <abbrgrp><abbr bid="B111">111</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Initially identified in yeast as the MAK16 protein required for dsRNA virus reproduction <abbrgrp><abbr bid="B112">112</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0291, 0302, 0306, 310, 0319, 0650, 1272</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins, subunits of rRNA processing complexes <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>all 0</p>
							</c>
							<c ca="center">
								<p>X,X,1,X,1,1,1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0284</p>
							</c>
							<c ca="left">
								<p>Polyadenylation factor I complex, subunit PFS2, WD40-repeat protein</p>
							</c>
							<c ca="left">
								<p>Poly-adenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Same as above (COG2319)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0337</p>
							</c>
							<c ca="left">
								<p>RNA helicase involved in 28S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most of the archaea and bacteria (COG0513)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0343</p>
							</c>
							<c ca="left">
								<p>RNA helicase involved in 28S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most of the archaea and bacteria (COG0513)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1069</p>
							</c>
							<c ca="left">
								<p>3'-5' exoribonuclease (RNAse PH), exosome subunit Rrp46</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria and archaea (COG0689)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1070</p>
							</c>
							<c ca="left">
								<p>Exosome subunit Rrp5 (RNA-binding S1 domain fused to TPR repeats)</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria (COG0539, COG0457)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1135</p>
							</c>
							<c ca="left">
								<p>mRNA cleavage and polyadenylation complex subunit CFT2 (CPSF)</p>
							</c>
							<c ca="left">
								<p>Cleavage and polyadenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea and some bacteria (COG1236)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1914</p>
							</c>
							<c ca="left">
								<p>mRNA cleavage and polyadenylation factor I complex, subunit RNA14</p>
							</c>
							<c ca="left">
								<p>Cleavage and polyadenylation complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1975</p>
							</c>
							<c ca="left">
								<p>RNA (guanine-7-) methyltransferase (capping enzyme subunit)</p>
							</c>
							<c ca="left">
								<p>Capping enzyme</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Numerous methyltrans-ferases (COG0500) but no ortholog</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2051</p>
							</c>
							<c ca="left">
								<p>Nonsense-mediated mRNA decay complex, subunit 2</p>
							</c>
							<c ca="left">
								<p>NMD complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2554</p>
							</c>
							<c ca="left">
								<p>Pseudouridylate synthase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0101)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2613</p>
							</c>
							<c ca="left">
								<p>Upf1p-interacting protein, NMD complex subunit Nmd3p</p>
							</c>
							<c ca="left">
								<p>NMD complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1499)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2771</p>
							</c>
							<c ca="left">
								<p>tRNA-specific adenosine-34 deaminase subunit Tad3p</p>
							</c>
							<c ca="left">
								<p>Heterodimeric RNA-specific deaminase</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0590)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2780</p>
							</c>
							<c ca="left">
								<p>Protein involved in ribosomal large subunit assembly (RPF1), contains IMP4 domain</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2136)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2781</p>
							</c>
							<c ca="left">
								<p>Subunit of the small (ribosomal) subunit (SSU) processosome (snoRNP), IMP4</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2136)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2874</p>
							</c>
							<c ca="left">
								<p>Protein involved in rRNA processing and ribosomal assembly</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1094)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing KH domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3013</p>
							</c>
							<c ca="left">
								<p>Exosome subunit Rrp4</p>
							</c>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, on bacteria (COG1097)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3031</p>
							</c>
							<c ca="left">
								<p>Protein involved in large ribosome subunit assembly and 28S rRNA processing (Rrf2)</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Contains the BRIX domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3322</p>
							</c>
							<c ca="left">
								<p>RNAse P/MRP subunit, involved in processing of pre-tRNAs and the 5.8S rRNA</p>
							</c>
							<c ca="left">
								<p>RNAse P/MRP holoenzyme</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3448</p>
							</c>
							<c ca="left">
								<p>Predicted snRNP core protein</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1958)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3482</p>
							</c>
							<c ca="left">
								<p>Small nuclear ribonucleoprotein (snRNP) SMF subunit</p>
							</c>
							<c ca="left">
								<p>Spliceosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1958)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2463</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein, consisting of a PIN domain and a Zn-ribbon. Involved in 26S proteasome assembly</p>
							</c>
							<c ca="left">
								<p>26S proteasome, pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A,O</p>
							</c>
							<c ca="left">
								<p>Represented by orthologs in all archaea but no bacteria (COG1349)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>PIN domain has been detected in exosome subunits and is thought to have RNA-binding properties or even nuclease activity <abbrgrp><abbr bid="B113">113</abbr><abbr bid="B114">114</abbr></abbrgrp>. The demonstration of the role of this protein (Nob1p) in proteasome assembly <abbrgrp><abbr bid="B115">115</abbr></abbrgrp>, 40S ribosome subunit assembly, and the processing of 18S rRNA 3'-end <abbrgrp><abbr bid="B116">116</abbr></abbrgrp> supports the connection between degradation of RNA and proteins that seems to have been established already in archaea <abbrgrp><abbr bid="B109">109</abbr></abbrgrp>.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3273</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing KH domain, interacts with Nob1p</p>
							</c>
							<c ca="left">
								<p>26S proteasome, pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A,O</p>
							</c>
							<c ca="left">
								<p>Orthologs in all archaea but no bacteria (COG1094)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>This is the second predicted RNA-binding protein involved in proteasome assembly, <abbrgrp><abbr bid="B115">115</abbr></abbrgrp> which emphasizes the aforementioned link between RNA and protein processing</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1831</p>
							</c>
							<c ca="left">
								<p>Deadenylating 3'-5' exonuclease, negative regulator of PolII transcription</p>
							</c>
							<c ca="left">
								<p>CCR4-NOT core complex</p>
							</c>
							<c ca="center">
								<p>AK</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1159</p>
							</c>
							<c ca="left">
								<p>NADP-dependent flavoprotein reductase, probably sulfite reductase subunit</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>CL</p>
							</c>
							<c ca="left">
								<p>Many bacteria (COG0369)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Genetic evidence of a role in DNA replication <abbrgrp><abbr bid="B117">117</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1800</p>
							</c>
							<c ca="left">
								<p>Ferredoxin/adrenodoxin reductase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>C</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0493)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1173</p>
							</c>
							<c ca="left">
								<p>Anaphase-promoting complex (APC), Cdc16 subunit (TPR-repeat protein)</p>
							</c>
							<c ca="left">
								<p>APC</p>
							</c>
							<c ca="center">
								<p>D</p>
							</c>
							<c ca="left">
								<p>Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of Cdc16</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3437</p>
							</c>
							<c ca="left">
								<p>Anaphase-promoting complex (APC), subunit 10</p>
							</c>
							<c ca="left">
								<p>APC</p>
							</c>
							<c ca="center">
								<p>D</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1358</p>
							</c>
							<c ca="left">
								<p>Serine palmitoyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0156)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1511</p>
							</c>
							<c ca="left">
								<p>Mevalonate kinase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>Most archaea and some bacteria (COG1577)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3059</p>
							</c>
							<c ca="left">
								<p>N-acetylglucosaminyltransferase complex, subunit PIG-C/GPI2, involved in phosphatidylinositol biosynthesis</p>
							</c>
							<c ca="left">
								<p>N-acetylglucos-aminyltransferase complex</p>
							</c>
							<c ca="center">
								<p>I</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0467</p>
							</c>
							<c ca="left">
								<p>Translation elongation factor 2 paralog (GTPase)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0480)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Involved in 60S ribosomal subunit maturation <abbrgrp><abbr bid="B118">118</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1147</p>
							</c>
							<c ca="left">
								<p>Glutamyl-tRNA synthetase</p>
							</c>
							<c ca="left">
								<p>Multispecificity aminoacyl-tRNA synthetase complex</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0008)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2784</p>
							</c>
							<c ca="left">
								<p>Phenylalanyl-tRNA synthetase, beta subunit</p>
							</c>
							<c ca="left">
								<p>Heterodimeric phenylalanyl-tRNA synthetase</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All (COG0016)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3123</p>
							</c>
							<c ca="left">
								<p>Diphtamide synthase (methyltransferase)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1798)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0261</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III, largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPIII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0086)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0262</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I, largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPI holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0086)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0215</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III, second largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPIII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0085)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0216</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I, second largest subunit</p>
							</c>
							<c ca="left">
								<p>RNAPI holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All (COG0085)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1063</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex, subunit ELP2, WD repeat protein</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1131</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, 5'-3' helicase subunit RAD3</p>
							</c>
							<c ca="left">
								<p>RNAPII holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG1199)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1920</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II Elongator subunit</p>
							</c>
							<c ca="left">
								<p>RNAP II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1932</p>
							</c>
							<c ca="left">
								<p>TBP-associated factor (Taf2p)</p>
							</c>
							<c ca="left">
								<p>TFIID complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2009</p>
							</c>
							<c ca="left">
								<p>Transcription initiation factor TFIIIB, Bdp1 subunit (Myb domain)</p>
							</c>
							<c ca="left">
								<p>TFIIIB</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2076</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III transcription factor TFIIIC, TPR-repeat-containing protein</p>
							</c>
							<c ca="left">
								<p>TFIIIC</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most of archaea and bacteria have TPR-repeat proteins (COG0457) but no orthologs of TFIIC</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2487</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB4</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2691</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II subunit 9</p>
							</c>
							<c ca="left">
								<p>RNAP II holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG1594)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2807</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, SSL1 subunit</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Consists of a von Willebrand A domain most closely related to those in the proteasome subunit RPN10 <abbrgrp><abbr bid="B119">119</abbr></abbrgrp> and a Zn-finger domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2907</p>
							</c>
							<c ca="left">
								<p>RNA polymerase I transcription factor TFIIS, subunit A12.2/RPA12</p>
							</c>
							<c ca="left">
								<p>TFIIS</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1594)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3169</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcriptional regulation mediator</p>
							</c>
							<c ca="left">
								<p>Mediator complex <abbrgrp><abbr bid="B120">120</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3233</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III subunit C34</p>
							</c>
							<c ca="left">
								<p>RNAP III holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3297</p>
							</c>
							<c ca="left">
								<p>RNA polymerase III subunit C25</p>
							</c>
							<c ca="left">
								<p>RNAP III holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1095)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3438</p>
							</c>
							<c ca="left">
								<p>Subunit common to RNA polymerases I (A) and III (C); Rpc19p</p>
							</c>
							<c ca="left">
								<p>RNAP I and III holoenzymes</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3471</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II transcription initiation/nucleotide excision repair factor TFIIH, subunit TFB2</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3490</p>
							</c>
							<c ca="left">
								<p>Transcription elongation factor SPT4, Zn-ribbon protein</p>
							</c>
							<c ca="left">
								<p>Chromatin-associated transcription complexes</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3497</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II subunit; Rpb10p</p>
							</c>
							<c ca="left">
								<p>RNAP II holoenzyme</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1644)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3901</p>
							</c>
							<c ca="left">
								<p>Transcription initiation factor IID subunit (Taf13p)</p>
							</c>
							<c ca="left">
								<p>TFIID</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3949</p>
							</c>
							<c ca="left">
								<p>RNA polymerase II elongator complex, subunit ELP4</p>
							</c>
							<c ca="left">
								<p>RNAP II elongator complex</p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>4086</p>
							</c>
							<c ca="left">
								<p>SOH1 protein potentially involved in Pol II transcription regulation and repair</p>
							</c>
							<c ca="left">
								<p>SMCC complex <abbrgrp><abbr bid="B121">121</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>K</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1532</p>
							</c>
							<c ca="left">
								<p>Predicted GTPase of the XAB1 family <abbrgrp><abbr bid="B122">122</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>TBP-free TAF(II) complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea and several bacteria (COG1100)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>XP-A-binding protein in humans, thus implicated in repair (<abbrgrp><abbr bid="B122">122</abbr></abbrgrp> and references therein).</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1533</p>
							</c>
							<c ca="left">
								<p>Predicted GTPase of the XAB1 family (paralog of KOG1757) <abbrgrp><abbr bid="B122">122</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>TBP-free TAF(II) complex?</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea and several bacteria (COG1100)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Might have a function in repair given the paralogous relationship with KOG1757.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1625</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945; processivity subunit, inactivated phosphatase</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945; holoenzyme</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Small subunit of archaeal DNA polymerase II (COG1311)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>The small, regulatory subunit of DNA polymerase &#945; also forms a pan-eukaryotic KOG3044, which is a paralog of KOG0861 (the only recent duplication in KOG3044 is seen in vertebrates). In contrast, another paralog, the small subunit of DNA polymerase &#949;, is represented in animals, fungi and the early-branching protozoan <it>Plasmodium</it>, but not in plants or Microsporidia. Thus, the history of this polymerase subunit apparently involved inactivation of the phosphatase (or nuclease) inherited from archaea, with subsequent duplications at early stages of eukaryotic evolution <abbrgrp><abbr bid="B123">123</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0479</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM3</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0481</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM5</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0482</p>
							</c>
							<c ca="left">
								<p>DNA replication licensing factor MCM7</p>
							</c>
							<c ca="left">
								<p>Pre-replication complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1241)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0964</p>
							</c>
							<c ca="left">
								<p>Structural maintenance of chromosome protein 3 (cohesin subunit SMC3)</p>
							</c>
							<c ca="left">
								<p>Sister chromatid cohesion complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Many archaea and bacteria (COG1196)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0979</p>
							</c>
							<c ca="left">
								<p>Structural maintenance of chromosome protein 5 (cohesin subunit SMC5)</p>
							</c>
							<c ca="left">
								<p>Sister chromatid cohesion complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Many archaea and bacteria (COG1196)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1942</p>
							</c>
							<c ca="left">
								<p>TBP-interacting protein TIP49 (DNA helicase)</p>
							</c>
							<c ca="left">
								<p>chromatin remodeling complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Most of the archaea, no bacteria (COG1224)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1979</p>
							</c>
							<c ca="left">
								<p>DNA mismatch repair ATPase, MLH1</p>
							</c>
							<c ca="left">
								<p>Mismatch repair complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0323)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2267</p>
							</c>
							<c ca="left">
								<p>DNA primase, large subunit</p>
							</c>
							<c ca="left">
								<p>DNA polymerase &#945;:primase complex</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG2219)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2299</p>
							</c>
							<c ca="left">
								<p>Ribonuclease HI</p>
							</c>
							<c ca="left">
								<p>Replisome</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0164)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2310</p>
							</c>
							<c ca="left">
								<p>DNA repair exonuclease MRE11</p>
							</c>
							<c ca="left">
								<p>MRN complex involved in double-strand break repair</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0420)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2929</p>
							</c>
							<c ca="left">
								<p>Origin recognition complex, subunit 2 (ORC2)</p>
							</c>
							<c ca="left">
								<p>ORC</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0179</p>
							</c>
							<c ca="left">
								<p>20S proteasome, regulatory subunit beta type PSMB1/PRE7 (paralog of KOG0185)</p>
							</c>
							<c ca="left">
								<p>20S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea but only actinomycetes among bacteria (COG0638)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0185</p>
							</c>
							<c ca="left">
								<p>20S proteasome, regulatory subunit beta type PSMB4/PRE4 (paralog of KOG0179)</p>
							</c>
							<c ca="left">
								<p>20S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea but only actinomycetes among bacteria (COG0638)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2708</p>
							</c>
							<c ca="left">
								<p>Predicted metalloprotease with chaperone activity (RNAse H/HSP70 fold) <abbrgrp><abbr bid="B124">124</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Putative complex involved in translation regulation <abbrgrp><abbr bid="B125">125</abbr></abbrgrp></p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Represented by orthologs in all archaea and bacteria (COG0533)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>One of the few remaining uncharacterized proteins that are universally conserved in all cellular life forms. The only experimentally demonstrated activity is that of sialoglycoprotease but fusion with a distinct protein kinase in several archaea and analysis of gene neighborhood suggest a fundamental role in signal transduction, possibly translation regulation. <abbrgrp><abbr bid="B125">125</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0301</p>
							</c>
							<c ca="left">
								<p>Protein required for normal rates of ubiquitin-dependent proteolysis, contains WD40 repeats</p>
							</c>
							<c ca="left">
								<p>Proteasome?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Same as above (COG2319)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0358</p>
							</c>
							<c ca="left">
								<p>Chaperonin complex component, TCP-1 delta subunit (CCT4)</p>
							</c>
							<c ca="left">
								<p>TCP-1</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea and nearly all bacteria (COG0459)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0363</p>
							</c>
							<c ca="left">
								<p>Chaperonin complex component, TCP-1 beta subunit (CCT2)</p>
							</c>
							<c ca="left">
								<p>TCP-1</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>All archaea and nearly all bacteria (COG0459)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0687</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN7/PSMD6</p>
							</c>
							<c ca="left">
								<p>26S proteasome</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1299</p>
							</c>
							<c ca="left">
								<p>Vacuolar sorting protein VPS45/Stt10 (Sec1 family)</p>
							</c>
							<c ca="left">
								<p>t-SNARE complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Involved in t-SNARE complex assembly <abbrgrp><abbr bid="B126">126</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1349</p>
							</c>
							<c ca="left">
								<p>GPI-anchor transamidase complex, GPI8 subunit</p>
							</c>
							<c ca="left">
								<p>GPI-anchor transamidase complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Distantly related proteases in some bacteria (no COG)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1943</p>
							</c>
							<c ca="left">
								<p>Beta-tubulin folding cofactor D, involved in chromosome segregation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2015</p>
							</c>
							<c ca="left">
								<p>NEDD8-activating complex, UBA3 subunit</p>
							</c>
							<c ca="left">
								<p>NEDD8-activating complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0476)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2126</p>
							</c>
							<c ca="left">
								<p>Phosphoethanolamine <it>N</it>-methyltransferase involved in GPI-anchor biosynthesis</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>Several bacteria and archaea (COG1524)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2884</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN10/PSMD4</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>No orthologs although von Willebrand A domains are present in a variety of prokaryotic proteins</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Contains von Willebrand A domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2908</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex, subunit RPN9/PSMD13</p>
							</c>
							<c ca="left">
								<p>26S proteasome regulatory complex</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>Contains PINT domain</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0209</p>
							</c>
							<c ca="left">
								<p>Endoplasmic reticulum membrane P-type ATPase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>P</p>
							</c>
							<c ca="left">
								<p>Many bacteria and some archaea (COG0474)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3379</p>
							</c>
							<c ca="left">
								<p>Uncharacterized member of the histidine triad superfamily of nucleotide hydorlases</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0537)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Only biochemical function predicted.</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2635</p>
							</c>
							<c ca="left">
								<p>Coatomer (COPI) complex delta subunit</p>
							</c>
							<c ca="left">
								<p>COPI complex</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2927</p>
							</c>
							<c ca="left">
								<p>Membrane component of ER protein translocation apparatus (Sec62)</p>
							</c>
							<c ca="left">
								<p>Sec complex</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2978</p>
							</c>
							<c ca="left">
								<p>Dolichol-phosphate mannosyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>All archaea, most bacteria (COG0463)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3198</p>
							</c>
							<c ca="left">
								<p>Signal recognition particle, subunit Srp19</p>
							</c>
							<c ca="left">
								<p>Signal recognition particle</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>All archaea, no bacteria (COG1400)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3315</p>
							</c>
							<c ca="left">
								<p>Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking</p>
							</c>
							<c ca="left">
								<p>TRAPP</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3369</p>
							</c>
							<c ca="left">
								<p>Subunit of the targeting complex (TRAPP) involved in ER to Golgi trafficking</p>
							</c>
							<c ca="left">
								<p>TRAPP</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1992</p>
							</c>
							<c ca="left">
								<p>Nuclear export receptor CSE1/CAS (importin beta)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>YU</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="2" ca="left">
								<p>
									<b>New functional predictions</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2316</p>
							</c>
							<c ca="left">
								<p>PP-loop family ATP pyrophosphatase domain, which in fungi, plants and insects is fused to a duplicated translation inhibitor domain. The fusion, along with the phyletic pattern of the PP-ATPase domain, suggests an essential function in translation regulation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs of the PP-loop domain are present in all archaea (COG2102) but not in bacteria. Orthologs of the translation inhibitor domain are found in most bacteria and several archaea (COG0251)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>PP-loop ATPases have been previously implicated in base thiolation in various RNAs <abbrgrp><abbr bid="B127">127</abbr></abbrgrp> and proteins in this K/COG might have a similar function, which is likely to be conserved in eukaryotes and archaea. However, the fusion with translation inhibitor, which has been reported to have endoribonuclease activity <abbrgrp><abbr bid="B128">128</abbr></abbrgrp> is a eukaryote-specific feature</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2523</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing a PUA domain, probable role in RNA modification <abbrgrp><abbr bid="B129">129</abbr></abbrgrp></p>
							</c>
							<c ca="left">
								<p>Putative novel RNA modification complex</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Orthologs present in all archaea (COG2016) but not in bacteria</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Several of the archaeal orthologs of this protein form fusions with a PP-loop ATPase domain implicated in base thiolation <abbrgrp><abbr bid="B127">127</abbr></abbrgrp>. Thus, the proteins of this KOG might interact with those of KOG2840 (pan-eukaryotic, duplications in <it>Arabidopsis </it>and worm) or KOG2594 (missing in humans and microsporidia) to form a novel enzymatic complex involved in RNA modification</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0270, 0271, 1539</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins</p>
							</c>
							<c ca="left">
								<p>Processosome</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>all 0</p>
							</c>
							<c ca="center">
								<p>X,1,X</p>
							</c>
							<c ca="left">
								<p>By analogy with other conserved WD40-repeat proteins, predicted to be subunits of rRNA processing/ribosome assembly complexes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2321</p>
							</c>
							<c ca="left">
								<p>Nucleolar protein, contains WD40 repeats</p>
							</c>
							<c ca="left">
								<p>rRNA processosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>WD40-repeat proteins are present in several bacterial lineages and are particularly abundant in cyanobacteria but are missing in most archaea; none of them appear to be obvious orthologs of this protein (COG2319)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Probable subunit of an rRNA-processing complex</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1763</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein containing a CCCH Zn-finger; possible role in RNA processing or splicing</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>CCCH fingers have been shown to bind 3' untranslated regions in various mRNAs <abbrgrp><abbr bid="B130">130</abbr><abbr bid="B131">131</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2837</p>
							</c>
							<c ca="left">
								<p>Protein containing a U1-type, RNA-binding C2H2 Zn-finger. Probable role in RNA splicing/processing</p>
							</c>
							<c ca="left">
								<p>Spliceosome?</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>U1-type fingers are essential for the assembly of U1 RNP <abbrgrp><abbr bid="B132">132</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3073</p>
							</c>
							<c ca="left">
								<p>Predicted RNA-binding protein containing PIN domain and involved in 18S rRNA processing</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit</p>
							</c>
							<c ca="center">
								<p>A</p>
							</c>
							<c ca="left">
								<p>Most archaea, no in bacteria (COG1412)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Interacts with Nop14p and is required for 40S subunit biogenesis and 18S rRNA maturation (11694595). The presence of the PIN domain suggests RNA-binding and, possibly, RNAse activity</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3154</p>
							</c>
							<c ca="left">
								<p>Uncharacterized protein with potential function in translation or ribosomal biogenesis</p>
							</c>
							<c ca="left">
								<p>Pre-40S subunit?</p>
							</c>
							<c ca="center">
								<p>A?</p>
							</c>
							<c ca="left">
								<p>Most archaea, no bacteria (COG2042)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>The general functional prediction stems from the observation that the gene for this protein forms a predicted conserved operon with the gene for ribosomal protein L40E in several archaeal genomes</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3214</p>
							</c>
							<c ca="left">
								<p>Small protein containing a Zn-ribbon, possibly RNA-binding; potential role in RNA processing or transcription regulation</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>A?</p>
							</c>
							<c ca="left">
								<p>Conserved in Crenarchaeota (COG4888)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3800</p>
							</c>
							<c ca="left">
								<p>Predicted E3 ubiquitin ligase containing RING finger, subunit of transcription/repair factor TFIIH and CDK-activating kinase assembly factor</p>
							</c>
							<c ca="left">
								<p>TFIIH</p>
							</c>
							<c ca="center">
								<p>KO</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3176</p>
							</c>
							<c ca="left">
								<p>Predicted &#945;-helical protein, possibly involved in replication/repair; paralog of KOG3636</p>
							</c>
							<c ca="left">
								<p>A novel complex with PCNA involved in replication?</p>
							</c>
							<c ca="center">
								<p>L?</p>
							</c>
							<c ca="left">
								<p>Conserved in most (possibly all) archaea but not in bacteria (COG1711)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished work)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3303</p>
							</c>
							<c ca="left">
								<p>Predicted &#945;-helical protein, possibly involved in replication/repair transcription; paralog of KOG3508</p>
							</c>
							<c ca="left">
								<p>A novel complex with PCNA involved in replication?</p>
							</c>
							<c ca="center">
								<p>L?</p>
							</c>
							<c ca="left">
								<p>Conserved in most (possibly all) archaea but not in bacteria (COG1711)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="left">
								<p>A function in DNA replication/repair and/or transcription is suggested by the analysis of the genome context of archaeal orthologs which form an evolutionarily conserved association with the genes for replication sliding clamp (PCNA ortholog) (K.S.M. and E.V.K., unpublished.work)</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>0396</p>
							</c>
							<c ca="left">
								<p>Predicted E3 ubiquitin ligase</p>
							</c>
							<c ca="left">
								<p>Ub ligase</p>
							</c>
							<c ca="center">
								<p>O</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>The proteins in this KOG contain a modified RING domain, which might not be capable of metal-binding similarly to the U-box domain <abbrgrp><abbr bid="B133">133</abbr></abbrgrp> that has been shown to function as E3 <abbrgrp><abbr bid="B134">134</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>1443</p>
							</c>
							<c ca="left">
								<p>Multitransmembrane protein, predicted drug/metabolite transporter</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0697)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2647</p>
							</c>
							<c ca="left">
								<p>Multitransmembrane protein, potential transporter</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most bacteria and some archaea (COG0628)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2488</p>
							</c>
							<c ca="left">
								<p>Predicted N-acetyltransferase</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Most archaea and bacteria (COG0454)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>X</p>
							</c>
							<c ca="left">
								<p>Putative role in ribosomal maturation?</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3347</p>
							</c>
							<c ca="left">
								<p>Predicted nucleotide kinase; nuclear protein (Fap7p)</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>Conserved in all archaea but not in bacteria (COG1936)</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Involved in oxidative stress reponse in yeast <abbrgrp><abbr bid="B135">135</abbr></abbrgrp></p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3974</p>
							</c>
							<c ca="left">
								<p>Predicted sugar kinase</p>
							</c>
							<c ca="left">
								<p>Putative novel complex with KOG2585 proteins</p>
							</c>
							<c ca="center">
								<p>R</p>
							</c>
							<c ca="left">
								<p>All archaea and most bacteria (COG0063)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Based on fusions seen in prokaryotes, predicted to interact functionally and, possibly, physically with uncharacterized proteins of KOG2585 (represented in all eukaryotes but includes paralogs in some species)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="2" ca="left">
								<p>
									<b>No functional prediction</b>
								</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>2318</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>3237</p>
							</c>
							<c ca="left">
								<p>Uncharacterized conserved protein containing coiled-coil domain</p>
							</c>
							<c ca="left">
								<p>?</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
							<c ca="left">
								<p>None</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="left">
								<p>Coiled-coil domains are often involved in complex assembly; this could be an uncharacterized component of the chromatin or the spliceosome</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Abbreviations for the functional categories are as in Figure <figr fid="F3">3</figr>. <sup>&#8224;</sup>0, essential gene (lethal knockout); 1, non-essential gene (non-lethal knockout); X indicates that no data is available for the given gene. <sup>&#8225;</sup>Data from <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>. <sup>&#167;</sup>Data from <abbrgrp><abbr bid="B86">86</abbr></abbrgrp>.</p>
					</tblfn>
				</tbl>
				<p>Common phyletic patterns of genes that otherwise were not suspected to be functionally linked might suggest the existence of such connections and prompt additional analysis leading to concrete functional predictions <abbrgrp><abbr bid="B42">42</abbr><abbr bid="B59">59</abbr><abbr bid="B60">60</abbr><abbr bid="B61">61</abbr></abbrgrp>. The pair of KOG5324 and KOG4246 is a case in point that has not been described previously. The initial observation that these KOGs share the same unusual pattern of presence-absence in eukaryotes, and have similar phyletic patterns in prokaryotes, with a ubiquitous presence in archaea, prompted a more detailed examination of the multiple alignments of the respective proteins and the conservation of the (predicted) operon organization in archaea and bacteria (Table <tblr tid="T2">2</tblr> and data not shown). The combination of clues from these analyses suggests that the two proteins interact in a still uncharacterized pathway of RNA processing, which also includes RNA 3'-phosphate cyclase (KOG3980)) <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> and cytosine-C5-methylase (NOL1/NOP2 in eukaryotes; KOG1122). The proteins in KOG3833 and KOG4528 are likely to represent novel enzyme families, possibly a kinase-phosphatase pair (E.V.K. and L. Aravind, unpublished data). Notably, these predicted new enzymes are present in animals and <it>E. cuniculi </it>but not in <it>Arabidopsis </it>or yeasts. In contrast, KOG3980 is present in all analyzed eukaryotic genomes except for <it>Arabidopsis</it>, whereas KOG1122 is pan-eukaryotic. These differences in the phyletic patterns of the components of the predicted pathway are concordant with the patterns in eukaryotes in that.</p>
				<p>Figure <figr fid="F2">2</figr> shows the distribution of known and predicted functions of eukaryotic proteins among 20 functional categories for the entire set of KOGs and, separately, for KOGs represented in six or seven species and the animal-specific KOGs. Compared to the functional breakdown of prokaryotic COGs <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, the prevalence of signal transduction is notable among eukaryotes. This feature is particularly prominent in animal-specific KOGs, whereas the highly conserved set is comparatively enriched in proteins that are involved in translation, transcription, chaperone-like functions, cell cycle control and chromatin dynamics (Figure <figr fid="F2">2</figr>). The large number of KOGs for which only general functional prediction was feasible, and those whose functions remain unknown, even among the subset that is represented in six or seven eukaryotic species, emphasizes that our current understanding of eukaryotic biology is seriously lacking with even in respect of the functions of highly conserved genes.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Distribution of the KOGs by the number of paralogs in each of the analyzed eukaryotic genomes</p>
					</caption>
					<text>
						<p>Distribution of the KOGs by the number of paralogs in each of the analyzed eukaryotic genomes. The species abbreviations are as in Figure <figr fid="F1">1</figr>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-2"/>
				</fig>
				<p>The distribution of KOGs by the number of paralogs in each genome is shown in Figure <figr fid="F3">3</figr>. The preponderance of lineage-specific duplication of conserved genes, that is, intra-KOG LSEs, in multicellular eukaryotes is obvious. Cases when a single gene in yeast or, particularly, <it>Encephalitozoon</it>, has two or more co-orthologs in animals and/or plants are most common in KOGs, whereas the reverse situation is rare. These observations support the notion of the major contribution of LSE to the evolution of eukaryotic complexity <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>. However, 131 KOGs are represented by a single ortholog in all genomes compared (Table <tblr tid="T2">2</tblr>) and a substantial number of KOGs have one member from a majority of the genomes (data not shown). Recent theoretical modeling of the evolution of paralogous families has suggested that, in general, ancient protein families tend to have multiple paralogs <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B63">63</abbr></abbrgrp>. Therefore, whenever a KOG has a single member in all or most species, this should be attributed to selection against duplication of this particular gene. A prominent cause of such selection could be the involvement of the respective gene products in essential multisubunit complexes, such that imbalance between subunits leads to deleterious effects <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Functional breakdown of the KOGs</p>
					</caption>
					<text>
						<p>Functional breakdown of the KOGs. Designations of functional categories: A, RNA processing and modification; B, chromatin structure and dynamics; C, energy production and conversion; D, cell-cycle control and mitosis; E, amino acid metabolism and transport; F, nucleotide metabolism and transport; G, carbohydrate metabolism and transport; H, coenzyme metabolism; I, lipid metabolism; J, translation; K, transcription; L, replication and repair; M, membrane and cell wall structure and biogenesis; O, post-translational modification, protein turnover, chaperone functions; P, inorganic ion transport and metabolism; Q, secondary metabolites biosynthesis, transport and catabolism; T, signal transduction; U, intracellular trafficking and secretion; Y, nuclear structure; Z, cytoskeleton; R, general functional prediction only (typically, prediction of biochemical activity), S, function unknown. This breakdown is only for KOGs that included at least three species.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Known and new functions of single-member, pan-eukaryotic KOGs</p>
				</st>
				<p>We examined in greater detail the 131 KOGs that are represented by a single gene in each of the seven genomes (Table <tblr tid="T2">2</tblr>). As can be envisaged from their presence in diverse eukaryotic taxa, including the 'minimal' genome of <it>Encephalitozoon</it>, and as shown by comparison with the knockout phenotype data (Table <tblr tid="T2">2</tblr> and see below), these pan-eukaryotic KOGs are of particular biological importance. For the great majority of these KOGs (113 of the 131), the function has been experimentally determined or confidently predicted to a varying degree of detail using computational methods (Table <tblr tid="T2">2</tblr>). However, around 20 KOGs from this set remained uncharacterized at the time of this analysis and, for all but two of these, substantial functional inferences could be drawn through a combination of sequence-profile analysis, structure prediction and genomic-context analysis of prokaryotic homologs (Table <tblr tid="T2">2</tblr>). Some of these predicted new functions are variations on well-known themes, such as two predicted PP-loop ATPases, which are probably involved in novel, essential RNA modifications (KOGs 2522 and 2316) or two predicted E3 components of ubiquitin ligases (KOGs 0396 and 3800). Other predicted functions appear to be completely new, such as proteins in KOG3176 and 3303 which are likely to be essential components of eukaryotic replication and/or repair systems. Each of these uncharacterized but ubiquitous and largely essential eukaryotic genes is an attractive target for experimental studies.</p>
				<p>Examination of the experimentally characterized and predicted functions of pan-eukaryotic, single-member KOGs leads to interesting conclusions. Nearly all the functionally characterized KOGs in this set consist of proteins that are subunits of known multiprotein complexes (Table <tblr tid="T2">2</tblr>). The most prominent of these are the complexes involved in rRNA processing and ribosome assembly, such as the recently discovered rRNA processosome and the pre-40S subunit, as well as the spliceosome, and various complexes involved in transcription (Table <tblr tid="T2">2</tblr>). Accordingly, this set of KOGs is markedly enriched for proteins involved in various forms of RNA processing, assembly of ribonucleoprotein (RNP) particles and transcription. In addition, KOGs in the single-member pan-eukaryotic set include subunits of molecular complexes that are not directly related to RNA processing, such as the proteasome, the TCP-1 chaperonin complex <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and the TRAPP complex involved in protein trafficking <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Altogether, more than 80% of the yeast proteins in the pan-eukaryotic, single-member KOGs belong to known macromolecular complexes included in the MIPS database <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>, as compared to around 64% for all yeast proteins in the KOGs, which is a moderate but statistically highly significant excess (data not shown). This preponderance of multiprotein complex formation among the single-member pan-eukaryotic KOGs is fully compatible with the balance hypothesis <abbrgrp><abbr bid="B64">64</abbr></abbrgrp>.</p>
				<p>The most unexpected observation regarding the single-member, pan-eukaryotic KOGs, is probably that in 14 of these proteins, the only detectable domain was the WD40 repeat (Table <tblr tid="T2">2</tblr>). This is particularly notable because WD40-repeat proteins, which are extremely abundant in eukaryotes and are present in several prokaryotic lineages as well <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>, are not generally known to form well-defined, one-to-one orthologous relationships. The WD40 proteins in the pan-eukaryotic KOGs listed in Table <tblr tid="T2">2</tblr> are exceptions, which is probably due to their unique and essential roles in the assembly of RNA-processing complexes. It has recently been demonstrated that, in <it>S. cerevisiae</it>, seven of these proteins are subunits of the 18S rRNA processosome, or at least are involved in ribosomal assembly <abbrgrp><abbr bid="B69">69</abbr><abbr bid="B70">70</abbr></abbrgrp>. Taking these results together with the unusual phyletic pattern, it seems possible to predict with considerable confidence that those WD40 proteins in the 131-KOG set that remain uncharacterized belong to the same or similar RNA-processing complexes (Table <tblr tid="T2">2</tblr>).</p>
				<p>With some notable exceptions, such as the WD40 proteins, the KOGs in the single-member, pan-eukaryotic set show remarkable patterns of evolutionary conservation: they are either (nearly) ubiquitous in the three kingdoms of life, for example, RNA polymerase subunits, or are universally conserved in eukaryotes and archaea but missing in bacteria, such as most of the proteins implicated in RNA processing (Table <tblr tid="T2">2</tblr>). Thus, it appears that elaborate molecular machines central to the functioning of the eukaryotic cell have evolved, largely from ancestral archaeo-eukaryotic components, at the onset of eukaryotic evolution, and both loss and duplication of the respective genes have been strongly selected against throughout the rest of eukaryotic evolution.</p>
			</sec>
			<sec>
				<st>
					<p>Variation of evolutionary rates among KOGs</p>
				</st>
				<p>Genome-wide analysis of protein evolutionary rates shows a broad range of variation <abbrgrp><abbr bid="B71">71</abbr></abbrgrp>. Here, we investigate the variation of evolutionary rates among the ubiquitous KOGs represented in all seven analyzed genomes and the connection between the evolutionary rate and protein function in the KOG set. The characteristic evolutionary rate of each KOG, which included a member(s) from <it>Arabidopsis</it>, was determined by measuring the mean evolutionary distance from <it>Arabidopsis </it>(the outgroup in the phylogenetic tree; see below) to the other species. Even among the KOGs that include all seven species and, accordingly, appear to represent the conserved core of eukaryotic genes, the evolutionary rates differ by a factor of 20 between the fastest- and the slowest-evolving KOGs. Excluding 5% of the KOGs from each tail of the distribution still leaves almost a fourfold difference in evolutionary rates (Figure <figr fid="F4">4a</figr>).</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Variation of amino-acid substitution rates among KOGs</p>
					</caption>
					<text>
						<p>Variation of amino-acid substitution rates among KOGs. <b>(a) </b>Probability-density function for the distribution of evolutionary rates among the set of KOGs including all seven analyzed eukaryotic species. <b>(b) </b>Distribution functions for the evolutionary rates in different functional categories of KOGs. The designations of functional categories are as in Figure <figr fid="F3">3</figr>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-4"/>
				</fig>
				<p>We then compared the distributions of evolutionary rates for different functional categories of KOGs (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>). Although all the distributions substantially overlapped, there was a statistically highly significant difference between the evolutionary rates for proteins with different functions (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>). The slowest-evolving proteins are those involved in translation and RNA processing, the fastest-evolving ones are involved in cellular trafficking and transport, whereas components of replication and transcription systems have intermediate evolutionary rates (Tables <tblr tid="T3">3</tblr>,<tblr tid="T4">4</tblr> and Figure <figr fid="F4">4b</figr>).</p>
				<tbl id="T3" hint_layout="single">
					<title>
						<p>Table 3</p>
					</title>
					<caption>
						<p>Evolutionary rates in KOGs with different functions: evolutionary rates for different functional categories of KOGs*</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="left">
								<p>Functional category</p>
							</c>
							<c ca="center">
								<p>Number of KOGs</p>
							</c>
							<c ca="center">
								<p>Mean rate, substitutions per site</p>
							</c>
							<c ca="center">
								<p>Standard deviation</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>J</p>
							</c>
							<c ca="center">
								<p>227</p>
							</c>
							<c ca="center">
								<p>0.98</p>
							</c>
							<c ca="center">
								<p>0.37</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>H</p>
							</c>
							<c ca="center">
								<p>62</p>
							</c>
							<c ca="center">
								<p>0.98</p>
							</c>
							<c ca="center">
								<p>0.30</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>A</p>
							</c>
							<c ca="center">
								<p>167</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.36</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>C</p>
							</c>
							<c ca="center">
								<p>140</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>O</p>
							</c>
							<c ca="center">
								<p>307</p>
							</c>
							<c ca="center">
								<p>1.01</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>F</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>1.05</p>
							</c>
							<c ca="center">
								<p>0.34</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>E</p>
							</c>
							<c ca="center">
								<p>130</p>
							</c>
							<c ca="center">
								<p>1.07</p>
							</c>
							<c ca="center">
								<p>0.38</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>139</p>
							</c>
							<c ca="center">
								<p>1.11</p>
							</c>
							<c ca="center">
								<p>0.38</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>B</p>
							</c>
							<c ca="center">
								<p>56</p>
							</c>
							<c ca="center">
								<p>1.13</p>
							</c>
							<c ca="center">
								<p>0.33</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Z</p>
							</c>
							<c ca="center">
								<p>64</p>
							</c>
							<c ca="center">
								<p>1.13</p>
							</c>
							<c ca="center">
								<p>0.46</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>K</p>
							</c>
							<c ca="center">
								<p>209</p>
							</c>
							<c ca="center">
								<p>1.15</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>G</p>
							</c>
							<c ca="center">
								<p>115</p>
							</c>
							<c ca="center">
								<p>1.16</p>
							</c>
							<c ca="center">
								<p>0.43</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>I</p>
							</c>
							<c ca="center">
								<p>110</p>
							</c>
							<c ca="center">
								<p>1.16</p>
							</c>
							<c ca="center">
								<p>0.32</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>T</p>
							</c>
							<c ca="center">
								<p>200</p>
							</c>
							<c ca="center">
								<p>1.18</p>
							</c>
							<c ca="center">
								<p>0.39</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>D</p>
							</c>
							<c ca="center">
								<p>111</p>
							</c>
							<c ca="center">
								<p>1.19</p>
							</c>
							<c ca="center">
								<p>0.40</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>R</p>
							</c>
							<c ca="center">
								<p>415</p>
							</c>
							<c ca="center">
								<p>1.23</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>M</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="center">
								<p>1.26</p>
							</c>
							<c ca="center">
								<p>0.47</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>U</p>
							</c>
							<c ca="center">
								<p>196</p>
							</c>
							<c ca="center">
								<p>1.27</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Q</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="center">
								<p>1.27</p>
							</c>
							<c ca="center">
								<p>0.37</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>P</p>
							</c>
							<c ca="center">
								<p>69</p>
							</c>
							<c ca="center">
								<p>1.28</p>
							</c>
							<c ca="center">
								<p>0.45</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>N</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>1.30</p>
							</c>
							<c ca="center">
								<p>0.78</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>S</p>
							</c>
							<c ca="center">
								<p>348</p>
							</c>
							<c ca="center">
								<p>1.40</p>
							</c>
							<c ca="center">
								<p>0.41</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>All</p>
							</c>
							<c ca="center">
								<p>3203</p>
							</c>
							<c ca="center">
								<p>1.16</p>
							</c>
							<c ca="center">
								<p>0.42</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*Only the KOGs that included a member(s) from <it>Arabidopsis </it>were analyzed; the evolutionary rates are the average distances between the <it>Arabidopsis </it>representative in the given KOG and the proteins from other species (see Material and methods for details). The functional categories are designated as in Figure <figr fid="F5">5</figr>.</p>
					</tblfn>
				</tbl>
				<tbl id="T4" hint_layout="single">
					<title>
						<p>Table 4</p>
					</title>
					<caption>
						<p>Statistical significance of differences in evolutionary rates between selected functional categories of KOGs (t-test)</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>J</p>
							</c>
							<c ca="center">
								<p>L</p>
							</c>
							<c ca="center">
								<p>U</p>
							</c>
							<c ca="center">
								<p>S</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>J</p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>L</p>
							</c>
							<c ca="center">
								<p>3 &#215; 10<sup>-3</sup></p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>U</p>
							</c>
							<c ca="center">
								<p>1 &#215; 10<sup>-12</sup></p>
							</c>
							<c ca="center">
								<p>3 &#215; 10<sup>-4</sup></p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>S</p>
							</c>
							<c ca="center">
								<p>7 &#215; 10<sup>-33</sup></p>
							</c>
							<c ca="center">
								<p>5 &#215; 10<sup>-13</sup></p>
							</c>
							<c ca="center">
								<p>2 &#215; 10<sup>-4</sup></p>
							</c>
							<c ca="center">
								<p>-</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
			</sec>
			<sec>
				<st>
					<p>A parsimonious scenario of gene loss and emergence in eukaryotic evolution and reconstruction of ancestral eukaryotic gene sets</p>
				</st>
				<p>Assuming a particular species tree topology, methods of evolutionary parsimony analysis can be used to construct a parsimonious scenario of evolution, that is, mapping of different types of evolutionary events onto the branches of the tree. With prokaryotes, the problem is confounded by the major contributions from both lineage-specific gene loss and HGT to genome evolution, with the relative likelihoods of these events remaining uncertain <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>. The possibility of substantial HGT between major lineages of eukaryotes can apparently be safely disregarded, providing for an unambiguous most parsimonious scenario that includes only gene loss and emergence of new genes as elementary events.</p>
				<p>Some crucial aspects of the phylogenetic tree of the eukaryotic crown group remain a matter of contention. The consensus of many phylogenetic analyses appears to point to an animal-fungal clade and clustering of microsporidia with the fungi. However, a major uncertainty remains with respect to the topology of the animal tree: the majority of studies on protein phylogenies support a coelomate (chordate-arthropod) clade <abbrgrp><abbr bid="B72">72</abbr><abbr bid="B73">73</abbr><abbr bid="B74">74</abbr></abbrgrp>, whereas rRNA phylogeny and some protein family trees point to the so-called ecdysozoan (arthropod-nematode) clade <abbrgrp><abbr bid="B75">75</abbr><abbr bid="B76">76</abbr><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr></abbrgrp>. We treated the phyletic pattern of each KOG as a string of binary characters (1 for the presence of the given species and 0 for its absence in the given KOG) and constructed the parsimonious scenarios of gene loss and emergence during evolution of the eukaryotic crown group for both the coelomate and the ecdysozoan topologies of the phylogenetic tree. For the purpose of this reconstruction, the Dollo parsimony approach was adopted <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. Under this approach, gene loss is considered irreversible; thus, a gene (a KOG member) can be lost independently in several evolutionary lineages but cannot be regained. This assumption is justified by the implausibility of HGT between eukaryotes (the Dollo approach is not valid for reconstruction of prokaryotic ancestors).</p>
				<p>In the resulting parsimonious scenarios, each branch was associated with both gene loss and emergence of new genes, with the exception of the plant branch and the branch leading to the common ancestor of fungi and animals, to which gene losses could not be assigned with the current set of genomes (Figure <figr fid="F5">5a,b</figr>). There is little doubt that, once genomes of early-branching eukaryotes are included, gene loss associated with these branches will become apparent. The principal features of the reconstructed scenarios include massive gene loss in the fungal clade, with additional elimination of numerous genes in the microsporidian; emergence of a large set of new genes at the onset of the animal clade; and subsequent substantial gene loss in each of the animal lineages, particularly in the nematodes and arthropods (Figure <figr fid="F5">5a,b</figr>). The estimated number of genes lost in <it>S. cerevisiae </it>after its divergence from the common ancestor with the other yeast species, <it>S. pombe</it>, closely agreed with a previous estimate produced by a different approach <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The switch from the coelomate topology of the animal sub-tree to the ecdysozoan topology resulted in relatively small changes in the distribution of gains and losses: the most notable difference was the greater number of genes lost in the nematode lineage and the smaller number of genes lost in the insect lineage under the ecdysozoan scenario compared to the coelomate scenario (Figure <figr fid="F5">5a,b</figr>).</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Parsimonious scenarios of loss and emergence of genes (KOGs) in eukaryotic evolution</p>
					</caption>
					<text>
						<p>Parsimonious scenarios of loss and emergence of genes (KOGs) in eukaryotic evolution. <b>(a) </b>The coelomate topology of the phylogenetic tree of the eukaryotic crown group. <b>(b) </b>The ecdysozoan topology of the phylogenetic tree of the eukaryotic crown group. The numbers in boxes indicate the inferred number of KOGs in the respective ancestral forms. The numbers next to branches indicate the number of gene gains (emergence of KOGs) (numerator) and gene (KOG) losses (denominator) associated with the respective branches; a dash indicates that the number of losses for a given branch could not be determined. Proteins from each genome that did not belong to KOGs as well as LSEs were counted as gains on the terminal branches. The species abbreviations are as in Figure <figr fid="F1">1</figr>.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-5"/>
				</fig>
				<p>The parsimony analysis described above involves explicit reconstruction of the gene sets of ancestral eukaryotic genomes. Under the Dollo parsimony model, which was used for this analysis, an ancestral gene (KOG) set is the union of the KOGs that are shared by the respective outgroup and each of the remaining species. Thus, the gene set for the common ancestor of the crown group includes all the KOGs in which <it>Arabidopsis </it>co-occurs with any of the other analyzed species. Similarly, the reconstructed gene set for the common ancestor of fungi and animals consists of all KOGs in which at least one fungal species co-occurs with at least one animal species. These are conservative reconstructions of ancestral gene sets because, as already indicated, gene losses in the lineages branching off the deepest bifurcation could not be detected. Under this conservative approach, 3,413 genes (KOGs) were assigned to the last common ancestor of the crown group (Figure <figr fid="F5">5a,b</figr>). More realistically, it appears likely that a certain number of ancestral genes have been lost in all, or all but one, of the analyzed lineages during subsequent evolution, such that the gene set of the eukaryotic crown group ancestor might have been close in size to those of modern yeasts. In terms of the functional composition, the reconstructed core gene set of the crown-group ancestor resembled more the highly conserved KOGs than the animal-specific KOGs (Figure <figr fid="F3">3</figr>) in being enriched in housekeeping functions such as translation, transcription and RNA processing (data not shown).</p>
				<p>The functional profiles of the gene sets that were lost in different lineages showed substantial differences (Table <tblr tid="T5">5</tblr>). Thus, for example, in the lineage leading to the common ancestor of the animals, the greatest loss among genes assigned to functional categories was seen in amino acid and coenzyme metabolism; in contrast, in the fly and the nematode, more substantial degradation was observed among transcription factors and proteins with chaperone-like functions. Genes for proteins involved in RNA processing and translation are, in general, not heavily affected by loss except in the highly degraded parasite <it>E. cuniculi</it>. On many occasions, the switch from the coelomate to the ecdysozoan topology replaces two independent, parallel losses in the insect and nematode clades with a single loss at the base of the ecdysozoan branch, although, on the whole, trees based on gene content support the coelomate topology <abbrgrp><abbr bid="B74">74</abbr></abbrgrp>. In particular, the ecdysozoan topology, unlike the coelomate topology, implies early loss of several genes involved in translation, transcription and repair (Table <tblr tid="T6">6</tblr>). Notably, a large fraction of genes lost in each lineage has only a general functional prediction or no prediction at all (Table <tblr tid="T5">5</tblr>). This emphasizes the paucity of our current understanding of lineage-specific gene sets.</p>
				<tbl id="T5" hint_layout="double">
					<title>
						<p>Table 5</p>
					</title>
					<caption>
						<p>Functional profiles of genes lost in different eukaryotic lineages</p>
					</caption>
					<tblbdy cols="11">
						<r>
							<c ca="left">
								<p>Functional category</p>
							</c>
							<c cspan="10" ca="center">
								<p>Lost genes (KOGs)</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="11">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Hs*</p>
							</c>
							<c ca="center">
								<p>Dm*</p>
							</c>
							<c ca="center">
								<p>Coelomates/ Ecdysozoa</p>
							</c>
							<c ca="center">
								<p>Ce*</p>
							</c>
							<c ca="center">
								<p>Animals</p>
							</c>
							<c ca="center">
								<p>Sc</p>
							</c>
							<c ca="center">
								<p>Sp</p>
							</c>
							<c ca="center">
								<p>Yeasts</p>
							</c>
							<c ca="center">
								<p>Ec</p>
							</c>
							<c ca="center">
								<p>Fungi-Ec</p>
							</c>
						</r>
						<r>
							<c cspan="11">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Total</p>
							</c>
							<c ca="center">
								<p>162/114</p>
							</c>
							<c ca="center">
								<p>520/369</p>
							</c>
							<c ca="center">
								<p>37/188</p>
							</c>
							<c ca="center">
								<p>541/751</p>
							</c>
							<c ca="center">
								<p>193</p>
							</c>
							<c ca="center">
								<p>299</p>
							</c>
							<c ca="center">
								<p>202</p>
							</c>
							<c ca="center">
								<p>55</p>
							</c>
							<c ca="center">
								<p>1,969</p>
							</c>
							<c ca="center">
								<p>802</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>RNA processing and modification</p>
							</c>
							<c ca="center">
								<p>2/3</p>
							</c>
							<c ca="center">
								<p>9/8</p>
							</c>
							<c ca="center">
								<p>1/2</p>
							</c>
							<c ca="center">
								<p>10/11</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>88</p>
							</c>
							<c ca="center">
								<p>32</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Translation</p>
							</c>
							<c ca="center">
								<p>3/3</p>
							</c>
							<c ca="center">
								<p>16/11</p>
							</c>
							<c ca="center">
								<p>0/5</p>
							</c>
							<c ca="center">
								<p>13/10</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>122</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Transcription</p>
							</c>
							<c ca="center">
								<p>5/2</p>
							</c>
							<c ca="center">
								<p>16/12</p>
							</c>
							<c ca="center">
								<p>0/4</p>
							</c>
							<c ca="center">
								<p>29/33</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>83</p>
							</c>
							<c ca="center">
								<p>40</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Replication and repair</p>
							</c>
							<c ca="center">
								<p>4/5</p>
							</c>
							<c ca="center">
								<p>28/14</p>
							</c>
							<c ca="center">
								<p>1/15</p>
							</c>
							<c ca="center">
								<p>29/14</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>60</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Chromatin structure and dynamics</p>
							</c>
							<c ca="center">
								<p>1/1</p>
							</c>
							<c ca="center">
								<p>8/6</p>
							</c>
							<c ca="center">
								<p>0/2</p>
							</c>
							<c ca="center">
								<p>8/6</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Energy production and conversion</p>
							</c>
							<c ca="center">
								<p>7/10</p>
							</c>
							<c ca="center">
								<p>9/10</p>
							</c>
							<c ca="center">
								<p>5/4</p>
							</c>
							<c ca="center">
								<p>12/10</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>110</p>
							</c>
							<c ca="center">
								<p>37</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cell cycle control and mitosis</p>
							</c>
							<c ca="center">
								<p>3/3</p>
							</c>
							<c ca="center">
								<p>11/6</p>
							</c>
							<c ca="center">
								<p>0/5</p>
							</c>
							<c ca="center">
								<p>15/11</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>61</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Amino acid metabolism and transport</p>
							</c>
							<c ca="center">
								<p>5/6</p>
							</c>
							<c ca="center">
								<p>16/9</p>
							</c>
							<c ca="center">
								<p>1/8</p>
							</c>
							<c ca="center">
								<p>15/7</p>
							</c>
							<c ca="center">
								<p>38</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>110</p>
							</c>
							<c ca="center">
								<p>18</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Nucleotide metabolism and transport</p>
							</c>
							<c ca="center">
								<p>3/3</p>
							</c>
							<c ca="center">
								<p>6/3</p>
							</c>
							<c ca="center">
								<p>0/3</p>
							</c>
							<c ca="center">
								<p>8/5</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>38</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Carbohydrate metabolism and transport</p>
							</c>
							<c ca="center">
								<p>3/3</p>
							</c>
							<c ca="center">
								<p>13/10</p>
							</c>
							<c ca="center">
								<p>1/4</p>
							</c>
							<c ca="center">
								<p>18/14</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>70</p>
							</c>
							<c ca="center">
								<p>41</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Coenzyme metabolism</p>
							</c>
							<c ca="center">
								<p>0/2</p>
							</c>
							<c ca="center">
								<p>5/5</p>
							</c>
							<c ca="center">
								<p>2/2</p>
							</c>
							<c ca="center">
								<p>14/12</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>51</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Lipid metabolism</p>
							</c>
							<c ca="center">
								<p>1/5</p>
							</c>
							<c ca="center">
								<p>27/19</p>
							</c>
							<c ca="center">
								<p>4/12</p>
							</c>
							<c ca="center">
								<p>18/6</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>74</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Membrane and cell wall structure and biogenesis</p>
							</c>
							<c ca="center">
								<p>5/4</p>
							</c>
							<c ca="center">
								<p>10/10</p>
							</c>
							<c ca="center">
								<p>2/2</p>
							</c>
							<c ca="center">
								<p>9/11</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>37</p>
							</c>
							<c ca="center">
								<p>15</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Post-translational modification, protein turnover, chaperone functions</p>
							</c>
							<c ca="center">
								<p>3/5</p>
							</c>
							<c ca="center">
								<p>22/15</p>
							</c>
							<c ca="center">
								<p>2/9</p>
							</c>
							<c ca="center">
								<p>44/40</p>
							</c>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>29</p>
							</c>
							<c ca="center">
								<p>21</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>167</p>
							</c>
							<c ca="center">
								<p>69</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Inorganic ion transport and metabolism</p>
							</c>
							<c ca="center">
								<p>2/4</p>
							</c>
							<c ca="center">
								<p>8/8</p>
							</c>
							<c ca="center">
								<p>2/2</p>
							</c>
							<c ca="center">
								<p>8/7</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>50</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Secondary metabolites biosynthesis, transport and catabolism</p>
							</c>
							<c ca="center">
								<p>1/2</p>
							</c>
							<c ca="center">
								<p>6/5</p>
							</c>
							<c ca="center">
								<p>1/2</p>
							</c>
							<c ca="center">
								<p>5/3</p>
							</c>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>23</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Signal transduction</p>
							</c>
							<c ca="center">
								<p>5/3</p>
							</c>
							<c ca="center">
								<p>32/22</p>
							</c>
							<c ca="center">
								<p>0/10</p>
							</c>
							<c ca="center">
								<p>30/37</p>
							</c>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>110</p>
							</c>
							<c ca="center">
								<p>52</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Intracellular trafficking and secretion</p>
							</c>
							<c ca="center">
								<p>4/3</p>
							</c>
							<c ca="center">
								<p>10/8</p>
							</c>
							<c ca="center">
								<p>0/2</p>
							</c>
							<c ca="center">
								<p>14/14</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>116</p>
							</c>
							<c ca="center">
								<p>22</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Nuclear structure</p>
							</c>
							<c ca="center">
								<p>0/0</p>
							</c>
							<c ca="center">
								<p>3/3</p>
							</c>
							<c ca="center">
								<p>0/0</p>
							</c>
							<c ca="center">
								<p>5/6</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>16</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cytoskeleton</p>
							</c>
							<c ca="center">
								<p>0/0</p>
							</c>
							<c ca="center">
								<p>2/2</p>
							</c>
							<c ca="center">
								<p>0/0</p>
							</c>
							<c ca="center">
								<p>6/8</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>0</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>44</p>
							</c>
							<c ca="center">
								<p>6</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>General functional prediction only (typically, prediction of biochemical activity)</p>
							</c>
							<c ca="center">
								<p>14/13</p>
							</c>
							<c ca="center">
								<p>79/55</p>
							</c>
							<c ca="center">
								<p>5/29</p>
							</c>
							<c ca="center">
								<p>88/72</p>
							</c>
							<c ca="center">
								<p>30</p>
							</c>
							<c ca="center">
								<p>55</p>
							</c>
							<c ca="center">
								<p>24</p>
							</c>
							<c ca="center">
								<p>11</p>
							</c>
							<c ca="center">
								<p>241</p>
							</c>
							<c ca="center">
								<p>134</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Function unknown</p>
							</c>
							<c ca="center">
								<p>91/34</p>
							</c>
							<c ca="center">
								<p>184/128</p>
							</c>
							<c ca="center">
								<p>10/66</p>
							</c>
							<c ca="center">
								<p>143/414</p>
							</c>
							<c ca="center">
								<p>37</p>
							</c>
							<c ca="center">
								<p>75</p>
							</c>
							<c ca="center">
								<p>33</p>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>269</p>
							</c>
							<c ca="center">
								<p>205</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*For each of the animals, the numerator indicates the number of genes lost under the coelomate topology of the species tree and the denominator indicates the number of genes lost under the ecdysozoan topology of the tree.</p>
					</tblfn>
				</tbl>
				<tbl id="T6" hint_layout="double">
					<title>
						<p>Table 6</p>
					</title>
					<caption>
						<p>Groups of functionally linked genes co-eliminated during evolution of different eukaryotic lineages</p>
					</caption>
					<tblbdy cols="8">
						<r>
							<c ca="left">
								<p>Functional group/ complex</p>
							</c>
							<c cspan="7" ca="center">
								<p>Lost KOGs*</p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c cspan="7">
								<hr/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Hs</p>
							</c>
							<c ca="center">
								<p>Dm</p>
							</c>
							<c ca="center">
								<p>Ce</p>
							</c>
							<c ca="center">
								<p>Coelomates/ Ecdysozoa</p>
							</c>
							<c ca="center">
								<p>Animals</p>
							</c>
							<c ca="center">
								<p>Yeasts</p>
							</c>
							<c ca="center">
								<p>Fungi-Ec</p>
							</c>
						</r>
						<r>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Mitochondrial ribosomal proteins</p>
							</c>
							<c ca="center">
								<p>3331, 3435/ 3331, 3435</p>
							</c>
							<c ca="center">
								<p>3505, 4600, 4612/ None</p>
							</c>
							<c ca="center">
								<p>3505, 4122, 4600, 4612/ 4122</p>
							</c>
							<c ca="center">
								<p>None/ 3505, 4600, 4612</p>
							</c>
							<c ca="center">
								<p>0899, 0938, 1740, 3254, 3278, 4844</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0408,1686, 1708, 4707</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Spliceosome, including putative associated proteins</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>1847, 1960/ 1847</p>
							</c>
							<c ca="center">
								<p>1902, 1960, 2991, 3414</p>
							</c>
							<c ca="center">
								<p>None/ 1960</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0105, 0107, 0117, 1365, 1588, 1676, 1847, 1996, 2191, 2242, 2548, 2991, 4207, 4211</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Exosome</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>1004, 1613</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Replication origin-recognition complex</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>2228, 2538, 4557</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>4557</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Mismatch repair system</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0218, 0220, 221, 1977</p>
							</c>
							<c ca="center">
								<p>0218, 1977, 4120</p>
							</c>
							<c ca="center">
								<p>None/ 0218, 1977</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Ubiquitin system/ proteasome-signalosome components</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0170, 0428, 1814, 4116, 4185, 4412</p>
							</c>
							<c ca="center">
								<p>0168, 0170, 0320, 0421, 0423, 1364, 1571, 1645, 1871, 1873, 1887, 2561, 2932, 3061, 3250, 3268, 4146, 4159, 4275, 4412, 4413, 4414, 4692, 4761</p>
							</c>
							<c ca="center">
								<p>None/ 0170, 4412</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>0823, 1645, 1734</p>
							</c>
							<c ca="center">
								<p>0311, 0423, 0427, 0827, 0895, 1100, 1139, 1464, 1571, 1812, 1887, 2561, 2932, 3011, 3050, 3268, 4185, 4248, 4265, 4275, 4413, 4414, 4427, 4642, 4692, 4761</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>NADH-ubiquinone oxido-reductase/ NADH dehydro-genase</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>2865, 2870, 3256, 3300, 3365, 3382, 3389, 3426, 3446, 3456, 3458, 3466, 3468, 4009, 4662, 4668, 4669, 4770, 4845</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>*For each of the animals, the numerator indicates the KOGs lost under the coelomate topology of the species tree, and the denominator indicates KOGs lost under the ecdysozoan topology.</p>
					</tblfn>
				</tbl>
				<p>As noticed previously during the analysis of the genes lost in <it>S. cerevisiae </it>after its divergence from the common ancestor with <it>S. pombe</it>, functionally connected genes tend to be co-eliminated during evolution <abbrgrp><abbr bid="B57">57</abbr></abbrgrp>. The present study generalizes this conclusion as many functionally coherent groups of co-eliminated KOGs become apparent (Table <tblr tid="T5">5</tblr>). Importantly, different branches of the same complex systems tend to be eliminated in parallel in different lineages, for example, largely non-overlapping sets of genes for proteins of the ubiquitin-proteasome-signalosome systems are lost in the fungal-microsporidial lineage and in the nematodes (Table <tblr tid="T6">6</tblr>). It seems likely that elimination of these genes reflects independent trends for simplification of regulatory processes in these lineages.</p>
				<p>An interesting trend seen in these data is the deterioration of the mitochondrial ribosome, which occurred in several eukaryotic lineages and appears to have been partly parallel (as it occurred independently in fungi-microsporidia and in animals) and partly consecutive: early loss in the ancestral animal line was followed by elimination of additional genes for ribosomal proteins in individual lineages (Table <tblr tid="T6">6</tblr>). <it>C. elegans </it>has one of the shortest mitochondrial rRNAs and might have a 'minimal' mitochondrial ribosome <abbrgrp><abbr bid="B80">80</abbr></abbrgrp>; the present analysis details the stages leading to this ultimate degradation of the mitochondrial ribosome.</p>
				<p>An exhaustive analysis of the patterns of gene loss is beyond the scope of this work. It seems clear that it has potential of improving our understanding of eukaryotic evolution and functional predictions through examination of co-eliminated gene groups.</p>
			</sec>
			<sec>
				<st>
					<p>Evolutionary relationships between eukaryotic and prokaryotic orthologous gene sets</p>
				</st>
				<p>The prokaryotic COGs and eukaryotic KOGs were identified in separate genome comparisons, although an overlap existed because both sets included the unicellular eukaryotes, namely two yeasts and the microsporidian. To identify the prokaryotic counterparts of the KOGs, the sequences of the eukaryotic proteins included in the KOGs were compared using the RPS-BLAST program to the position-specific scoring matrices (PSSMs) constructed for all prokaryotic COGs (<abbrgrp><abbr bid="B81">81</abbr></abbrgrp> see Materials and methods for details). The results were checked manually and also by comparing the assignment of proteins from unicellular eukaryotes to each of the orthologous gene sets. Altogether, probable orthologous relationships were established between 2,456 eukaryotic KOGs and TWOGs (44% of the total) and 1,516 prokaryotic COGs. A more detailed breakdown of the relationships between eukaryotic and prokaryotic orthologous gene clusters could reveal important evolutionary trends. Figure <figr fid="F6">6a</figr> compares the occurrence of prokaryotic counterparts for the entire set of eukaryotic KOGs and its subsets conserved at different levels. Clearly, the reconstructed gene set of the common ancestor of the crown group and, particularly, the pan-eukaryotic KOGs are enriched in ancient KOGs (those with prokaryotic counterparts) as compared to the full KOG collection. In contrast, among KOGs that are inferred to have evolved in individual lineages within the crown group, a significantly lower fraction has detectable prokaryotic counterparts (Figure <figr fid="F6">6a</figr>).</p>
				<fig id="F6">
					<title>
						<p>Figure 6</p>
					</title>
					<caption>
						<p>Correspondence between eukaryotic and prokaryotic orthologous gene sets</p>
					</caption>
					<text>
						<p>Correspondence between eukaryotic and prokaryotic orthologous gene sets. <b>(a) </b>Representation of prokaryotic counterparts in different subsets of KOGs. CGA, crown group ancestor; non-CGA, KOGs not represented in the crown group ancestor; MSP, metazoa-specific KOGs. <b>(b) </b>Evidence of ancient duplications of eukaryotic genes revealed by the KOGs against COGs comparison. The connections between KOGs and COGs detected by using RPS-BLAST (see text) were analyzed by single linkage clustering.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-6"/>
				</fig>
				<p>Early evolution of eukaryotes is known to have involved duplication of ancient genes inherited from prokaryotes <abbrgrp><abbr bid="B82">82</abbr></abbrgrp>, and this was apparent in the KOGs against COGs comparison. Although one-to-one relationships were predominant, in around 30% of cases, two or more eukaryotic KOGs corresponded to the same prokaryotic COG (Figure <figr fid="F6">6b</figr>). This indicates extensive duplication of ancestral genes at early stages of eukaryotic evolution; moreover, a substantial fraction of these genes have undergone repeated duplications, resulting in a one-to-many relationship between prokaryotic and eukaryotic orthologs (Figure <figr fid="F6">6b</figr>).</p>
				<p>An in-depth analysis of the relationships between eukaryotic and prokaryotic orthologous gene clusters should include an attempt to decipher their evolutionary history, that is, classification of the C/KOGs represented both in eukaryotes and prokaryotes into: those that have been inherited from the last universal common ancestor; the archaeo-eukaryotic subset; and those that are shared because of HGT between bacteria and eukaryotes at various stages of eukaryotic evolution. This analysis is beyond the scope of the present work. Perhaps the principal message to stress here is that, using a fairly sensitive sequence comparison method, prokaryotic homologs could be detected for only some 44% of the eukaryotic KOGs, and this fraction increased to around 54% for those genes that could be traced to the last common ancestor of the crown group (Figure <figr fid="F6">6a</figr>). This observation emphasizes the major amount of innovation that accompanied the emergence and early evolution of eukaryotes; even those KOGs for which prokaryotic counterparts will be eventually identified through more sensitive sequence and structure comparison apparently experienced rapid evolution during the prokaryote-eukaryote transition.</p>
			</sec>
			<sec>
				<st>
					<p>Phyletic patterns of KOGs and dispensability of yeast and worm genes</p>
				</st>
				<p>There are 860 KOGs with at least one representative from each of the seven analyzed genomes. In accord with the 'knockout rate' hypothesis <abbrgrp><abbr bid="B83">83</abbr></abbrgrp>, which has been largely supported by recent, genome-wide analysis of gene conservation <abbrgrp><abbr bid="B38">38</abbr><abbr bid="B84">84</abbr></abbrgrp>, it could be expected that these highly conserved genes were essential for the survival of eukaryotic organisms. This appears particularly plausible given the near-minimal eukaryotic gene complement of the microsporidian. The prediction was put to the test using the recently published functional profile of the yeast <it>S. cerevisiae </it>genome, which includes the data on the growth rates of homozygous deletion strains for 96% of the open reading frames (ORFs) in the yeast genome <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>. Growth rates have been previously interpreted as a measure of fitness <abbrgrp><abbr bid="B84">84</abbr></abbrgrp>.</p>
				<p>When the phyletic patterns of the KOGs were superimposed on the data on gene dispensability (with essential genes operationally defined as those whose deletion had a lethal effect in a rich medium) <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>, it was found that 45% of the essential genes were conserved in all seven species and 25% were represented in six species (typically with the exception of <it>E. cuniculi</it>); 15% of the essential yeast genes had no orthologs in the other analyzed genomes (Figure <figr fid="F7">7a</figr>). In a striking contrast, among non-essential genes, only 16.5% were represented in all compared genomes and 28.5% had no detectable orthologs (Figure <figr fid="F7">7a</figr>). The reciprocal comparison is equally illustrative: essential genes composed 18.5% of the entire set of yeast genes but 35% of the genes (KOGs) represented in all seven species. This translates into a statistically highly significant dependence between a gene's (in)dispensability and conservation over long evolutionary distances. The probability of the set of highly conserved genes being so enriched for essential genes as a result of chance was estimated at &lt;&lt;10<sup>-10</sup>. Notably, an even greater enrichment for essential genes was seen among the KOGs that were represented by one, and only one, ortholog in each of the seven analyzed genomes: of the 131 such KOGs, 98 (75%) included an essential yeast gene (Table <tblr tid="T2">2</tblr>). Such preponderance of essential genes could be expected because, in this set of KOGs, the indispensability of the respective function could not have been masked by the presence of paralogs.</p>
				<fig id="F7">
					<title>
						<p>Figure 7</p>
					</title>
					<caption>
						<p>Gene dispensability in yeast and worm and phyletic patterns of the respective KOGs</p>
					</caption>
					<text>
						<p>Gene dispensability in yeast and worm and phyletic patterns of the respective KOGs. <b>(a) </b>Distribution of essential and non-essential genes among different size classes of KOGs and LSEs in yeast <it>Saccharomyces cerevisiae</it>. <b>(b) </b>Distribution of essential and non-essential genes among different size classes of KOGs and LSEs in the nematode <it>C. elegans</it>. The number of species in the KOGs and LSEs is color-coded as indicated to the right of each plot.</p>
					</text>
					<graphic file="gb-2004-5-2-r7-7"/>
				</fig>
				<p>For an additional set of around 15% non-essential yeast genes, knockout results in a measurable retardation of growth <abbrgrp><abbr bid="B85">85</abbr></abbrgrp>. Unexpectedly and in contrast to the result obtained with the essential genes, we failed to observe a correlation between the magnitude of a gene's knockout effect on yeast growth and the phyletic pattern (data not shown). This seems to indicate that the measured effect on yeast growth might not translate into an effect on fitness that the loss of the ortholog of the given gene has in distant species.</p>
				<p>In <it>C. elegans</it>, much as in yeast, essentiality of genes appears to correlate with strong evolutionary conservation, as already noticed in the recent genome-wide study on inhibition of worm gene expression by RNA interference (RNAi) <abbrgrp><abbr bid="B86">86</abbr></abbrgrp>. We compared this dataset, which covers around 86% of <it>C. elegans </it>genes, to the phyletic patterns of the respective KOGs. Of the essential worm genes, 38% were conserved in all seven compared species and 19% were conserved in six species (Figure <figr fid="F7">7b</figr>). In contrast, only 6% of the non-essential <it>C. elegans </it>genes were represented in seven species and 7% were conserved in six species (Figure <figr fid="F7">7b</figr>). Thus, there seems to be a strong and robust connection between a gene's essentiality and its tendency to be conserved in evolution over a wide span of taxa; this connection was established using two independent datasets from biologically extremely different model organisms.</p>
			</sec>
			<sec>
				<st>
					<p>Domain accretion in orthologous sets of eukaryotic proteins</p>
				</st>
				<p>As noticed previously, the complexity of domain architecture of proteins in some orthologous sets increases with increasing organismic complexity; this phenomenon has been dubbed domain accretion <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. With the KOG set in hand, we sought to assess the extent of accretion quantitatively by using the data on the presence of domains from the CDD (conserved domain alignments database) collection in each of the KOG members. The results summarized in Table <tblr tid="T7">7</tblr> show a relatively small but statistically significant excess of domains in proteins from multicellular organisms compared to the orthologs from unicellular organisms. Furthermore, among the multicellular eukaryotes, human proteins have the greatest complexity of domain architectures, followed by <it>Drosophila </it>and <it>Arabidopsis </it>(Table <tblr tid="T6">6</tblr>), in agreement with preliminary results reported previously. Among the unicellular eukaryotes, <it>Encephalitozoon </it>had by far the least complex domain architectures (Table <tblr tid="T6">6</tblr>), which reflects the general genome reduction in this intracellular parasite.</p>
				<tbl id="T7" hint_layout="double">
					<title>
						<p>Table 7</p>
					</title>
					<caption>
						<p>Domain accretion in complex eukaryotes</p>
					</caption>
					<tblbdy cols="8">
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>Hsa</p>
							</c>
							<c ca="center">
								<p>Dme</p>
							</c>
							<c ca="center">
								<p>Ath</p>
							</c>
							<c ca="center">
								<p>Cel</p>
							</c>
							<c ca="center">
								<p>Sce</p>
							</c>
							<c ca="center">
								<p>Spo</p>
							</c>
							<c ca="center">
								<p>Ecu</p>
							</c>
						</r>
						<r>
							<c cspan="8">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Hsa</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>470</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Dme</p>
							</c>
							<c ca="center">
								<p>3214</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>2 &#215; 10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>805</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>327</p>
							</c>
							<c ca="center">
								<p>354</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Ath</p>
							</c>
							<c ca="center">
								<p>2224</p>
							</c>
							<c ca="center">
								<p>2085</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>3 &#215; 10<sup>-1</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>530</p>
							</c>
							<c ca="center">
								<p>403</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>347</p>
							</c>
							<c ca="center">
								<p>428</p>
							</c>
							<c ca="center">
								<p>334</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Cel</p>
							</c>
							<c ca="center">
								<p>2986</p>
							</c>
							<c ca="center">
								<p>2962</p>
							</c>
							<c ca="center">
								<p>2052</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>1 &#215; 10<sup>-8</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>880</p>
							</c>
							<c ca="center">
								<p>650</p>
							</c>
							<c ca="center">
								<p>376</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>149</p>
							</c>
							<c ca="center">
								<p>161</p>
							</c>
							<c ca="center">
								<p>183</p>
							</c>
							<c ca="center">
								<p>197</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Sce</p>
							</c>
							<c ca="center">
								<p>1789</p>
							</c>
							<c ca="center">
								<p>1704</p>
							</c>
							<c ca="center">
								<p>1769</p>
							</c>
							<c ca="center">
								<p>1715</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>1 &#215; 10<sup>-2</sup></p>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>504</p>
							</c>
							<c ca="center">
								<p>411</p>
							</c>
							<c ca="center">
								<p>374</p>
							</c>
							<c ca="center">
								<p>336</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>100</p>
							</c>
							<c ca="center">
								<p>123</p>
							</c>
							<c ca="center">
								<p>135</p>
							</c>
							<c ca="center">
								<p>150</p>
							</c>
							<c ca="center">
								<p>158</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Spo</p>
							</c>
							<c ca="center">
								<p>1880</p>
							</c>
							<c ca="center">
								<p>1807</p>
							</c>
							<c ca="center">
								<p>1886</p>
							</c>
							<c ca="center">
								<p>1808</p>
							</c>
							<c ca="center">
								<p>2360</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>&lt;1 &#215; 10<sup>-10</sup></p>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>549</p>
							</c>
							<c ca="center">
								<p>426</p>
							</c>
							<c ca="center">
								<p>388</p>
							</c>
							<c ca="center">
								<p>359</p>
							</c>
							<c ca="center">
								<p>216</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>17</p>
							</c>
							<c ca="center">
								<p>12</p>
							</c>
							<c ca="center">
								<p>14</p>
							</c>
							<c ca="center">
								<p>13</p>
							</c>
							<c ca="center">
								<p>19</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Ecu</p>
							</c>
							<c ca="center">
								<p>700</p>
							</c>
							<c ca="center">
								<p>738</p>
							</c>
							<c ca="center">
								<p>739</p>
							</c>
							<c ca="center">
								<p>748</p>
							</c>
							<c ca="center">
								<p>816</p>
							</c>
							<c ca="center">
								<p>835</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c ca="center">
								<p>332</p>
							</c>
							<c ca="center">
								<p>254</p>
							</c>
							<c ca="center">
								<p>235</p>
							</c>
							<c ca="center">
								<p>244</p>
							</c>
							<c ca="center">
								<p>158</p>
							</c>
							<c ca="center">
								<p>140</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>For a given pair of species the numbers in each cell below the diagonal represent, from top to bottom: the number of KOGs in which the average number of detected domains from the CDD collection (cut-off E = 10<sup>-3</sup>) in the proteins from the species to the left is greater than that for the species to the right; the number of KOGs with equal average number of domains; the number of KOGs in which the average number of domains is greater for the species to the right (for example, <it>D. melanogaster </it>has a greater number of detected domains than <it>H. sapiens </it>in 470 KOGs, the same number in 3,214 KOGs, and a smaller number in 805 KOGs). The numbers above the diagonal are the statistical significance of the difference, P(&#967;<sup>2</sup>).</p>
					</tblfn>
				</tbl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>The present analysis of KOGs provides quantitative backing for many trends in the evolution of eukaryotic genomes that previously have been noticed on the general, qualitative level. The important quantities reported here include the size of the conserved core of eukaryotic genes, the conservative reconstructions of ancestral gene sets, the numbers of genes that appear to have been lost and gained in individual eukaryotic lineages, and the extent of correlation between gene dispensability and evolutionary conservation, which is reflected in phyletic patterns. In addition, we evaluated the range of variation of evolutionary rates of genes in different functional categories and obtained statistical support for the important evolutionary phenomenon of domain accretion. Furthermore, we observed that only a minority of eukaryotic KOGs have readily detectable prokaryotic counterparts, which emphasizes the extent of innovation linked to the origin of eukaryotes and subsequent major transitions in eukaryotic evolution, such as the origin of multicellularity and the origin of animals.</p>
			<p>The case study of the KOGs that are represented by just one member in all eukaryotic genomes compared shows the potential of KOGs for functional prediction by inferring the probable functions for almost all KOGs in this set that had remained uncharacterized. This analysis also revealed unexpected facets of evolution of widespread and essential eukaryotic proteins, such as the counterintutitive preponderance of WD40-repeat proteins among the single-member pan-eukaryotic KOGs.</p>
			<p>The current KOG set includes proteins from seven genomes whose sequences were available as of 1 July, 2002. The genomes of the mouse <abbrgrp><abbr bid="B87">87</abbr></abbrgrp>, the fugu fish <abbrgrp><abbr bid="B88">88</abbr></abbrgrp>, the <it>Anopheles </it>mosquito <abbrgrp><abbr bid="B89">89</abbr></abbrgrp>, the urochordate <it>Ciona instestinalis </it><abbrgrp><abbr bid="B90">90</abbr></abbrgrp> and the malarial parasite <it>Plasmodium falciparum </it><abbrgrp><abbr bid="B91">91</abbr></abbrgrp> have become available since then but were not included, partly because of problems with protein annotation for some of these genomes, and partly due to the time-consuming and labor-intensive nature of KOG analysis. Inclusion of these and other newly sequenced genomes should proceed at a faster rate once the system itself is established, and will enable further, deeper studies into the functional and evolutionary patterns of eukaryotic life.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<sec>
				<st>
					<p>Construction and annotation of KOGs</p>
				</st>
				<p>A more detailed description of the procedures employed for this purpose is presented elsewhere <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The protein sets for all eukaryotic species, with the exception of <it>C. elegans </it>and <it>H. sapiens</it>, were from the genome division of the National Center for Biotechnology Information (NCBI). The protein sequences for <it>C. elegans </it>were from the WormPep67 database and the human sequences were from NCBI build 30. Briefly, the KOG construction protocol included: First, the detection and masking of common, repetitive domains using the RPS-BLAST program and the PSSMs for the respective domains from the CDD collection <abbrgrp><abbr bid="B81">81</abbr></abbrgrp>; second, all-against-all comparison of protein sequences from the analyzed genomes using the BLASTP program <abbrgrp><abbr bid="B92">92</abbr></abbrgrp>, with masking of low sequence complexity regions using the SEG program <abbrgrp><abbr bid="B93">93</abbr></abbrgrp>; third, identification of triangles of mutually consistent BeTs; merging triangles of BeTs with a common side to form preliminary KOGs; forth, adding members of co-orthologous sets missed at previous step using the COGNITOR procedure <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>; fifth, manual examination of each candidate KOG, aimed at eliminating the false positives incorporated into the KOGs by the automatic procedure and inclusion of false negatives that were missed originally; sixth, assignment of proteins containing promiscuous domains masked at the first step to Fuzzy Orthologous Groups (FOGs), named after the respective domains (when a sequence assigned to a KOG contained one or more masked domains, the sequences of these domains were restored); and finally, examination of the largest preliminary KOGs, which included numerous proteins from all or several genomes by using phylogenetic trees, cluster analysis with the BLASTCLUST program <abbrgrp><abbr bid="B94">94</abbr></abbrgrp>, comparison of domain architectures, and visual inspection of alignments. As a result, some of these preliminary KOGs were split into two or more smaller final KOGs.</p>
				<p>Annotation of KOGs included critical assessment of the annotations available through GenBank, other public databases and the primary literature and additional, in-depth sequence analysis aimed at detection of previously unnoticed homologous relationships. The annotated functions of KOGs were classified into 23 categories (see legend to Figure <figr fid="F3">3</figr>), which were adapted from the functional classification previously used for COGs <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> by including several specific eukaryotic categories.</p>
			</sec>
			<sec>
				<st>
					<p>Other sequence analysis procedures</p>
				</st>
				<p>During KOG annotation, proteins that are currently annotated as 'hypothetical' or 'unknown', or otherwise had a vague or suspect annotation, were subject to additional sequence analysis, which included iterative sequence similarity searches with the PSI-BLAST program <abbrgrp><abbr bid="B92">92</abbr></abbrgrp>, RPS-BLAST searches for conserved domains <abbrgrp><abbr bid="B80">80</abbr></abbrgrp>, and additional domain architecture analysis using the SMART system <abbrgrp><abbr bid="B95">95</abbr></abbrgrp>. To estimate sequence evolution rates, multiple alignments of KOGs were constructed using the MAP program <abbrgrp><abbr bid="B96">96</abbr></abbrgrp> and the pairwise evolutionary distances were calculated with the maximum likelihood method under the PAM model by using the PROTDIST program of the PHYLIP package <abbrgrp><abbr bid="B97">97</abbr></abbrgrp>. When a KOG included more than one member from the given species, the paralog with the greatest average similarity to proteins from other organisms was selected to represent the species in the given KOG. Since <it>A. thaliana </it>is the most likely outgroup species for the analyzed set of eukaryotes, distances from the <it>Arabidopsis </it>representative to proteins from all other species were averaged to estimate the characteristic evolutionary distance for the given KOG. Data from KOGs with excessive variability of the distances between <it>A. thaliana </it>and other species (standard deviation to mean ratio &gt; 0.5) were discarded. As the divergence times for all KOGs are presumed to be the same (and equal to the time elapsed since the last common ancestor for the eukaryotic crown group), the mean evolutionary distance in a KOG is a measure of the KOG's evolutionary rate.</p>
				<p>The parsimonious evolutionary scenario, which included gene losses and emergence of KOGs mapped to the branches of the eukaryotic phylogenetic tree, was constructed by using the DOLLOP program of the PHYLIP package <abbrgrp><abbr bid="B97">97</abbr></abbrgrp>; this program is based on the Dollo parsimony method, which assumes irreversibility of character loss <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>.</p>
				<p>For the analysis of domain accretion, conserved domains from the NCBI CDD database were detected in the eukaryotic proteins that belonged to the KOGs by using the RPS-BLAST program <abbrgrp><abbr bid="B81">81</abbr></abbrgrp> with an E-value cut-off of 0.001. Domains with biased amino acid sequence composition, which tend to produce a high false-positive rate in RPS-BLAST searches, were excluded from the analysis.</p>
				<p>The eukaryotic KOG set is accessible at <abbrgrp><abbr bid="B98">98</abbr></abbrgrp> and via ftp at <abbrgrp><abbr bid="B99">99</abbr></abbrgrp>. The reconstructed ancestral gene sets are available at <abbrgrp><abbr bid="B100">100</abbr></abbrgrp>.</p>
			</sec>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Roman Tatusov for his major contribution to the construction of the KOGs, Igor Garkavtsev for his participation in the initial stages of the KOG project, and L. Aravind and Wei Yang for useful discussions and sharing their unpublished observations.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Lateral genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
				</aug>
				<source>Trends Cell Biol</source>
				<pubdate>1999</pubdate>
				<volume>9</volume>
				<fpage>M5</fpage>
				<lpage>M8</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0962-8924(99)01664-5</pubid>
						<pubid idtype="pmpid" link="fulltext">10611671</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Phylogenetic classification and the universal tree.</p>
				</title>
				<aug>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1999</pubdate>
				<volume>284</volume>
				<fpage>2124</fpage>
				<lpage>2129</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.284.5423.2124</pubid>
						<pubid idtype="pmpid" link="fulltext">10381871</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>The impact of comparative genomics on our understanding of evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Kondrashov</snm>
						<fnm>AS</fnm>
					</au>
				</aug>
				<source>Cell</source>
				<pubdate>2000</pubdate>
				<volume>101</volume>
				<fpage>573</fpage>
				<lpage>576</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10892642</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Horizontal gene transfer in prokaryotes: quantification and classification.</p>
				</title>
				<aug>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>2001</pubdate>
				<volume>55</volume>
				<fpage>709</fpage>
				<lpage>742</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.micro.55.1.709</pubid>
						<pubid idtype="pmpid" link="fulltext">11544372</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Genomes in flux: the evolution of archaeal and proteobacterial gene content.</p>
				</title>
				<aug>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huynen</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>17</fpage>
				<lpage>25</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.176501</pubid>
						<pubid idtype="pmpid" link="fulltext">11779827</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Prokaryotic evolution in light of gene transfer.</p>
				</title>
				<aug>
					<au>
						<snm>Gogarten</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2002</pubdate>
				<volume>19</volume>
				<fpage>2226</fpage>
				<lpage>2238</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12446813</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Mirkin</snm>
						<fnm>BG</fnm>
					</au>
					<au>
						<snm>Fenner</snm>
						<fnm>TI</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>BMC Evol Biol</source>
				<pubdate>2003</pubdate>
				<volume>3</volume>
				<fpage>2</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">149225</pubid>
						<pubid idtype="pmpid" link="fulltext">12515582</pubid>
						<pubid idtype="doi">10.1186/1471-2148-3-2</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Distinguishing homologous from analogous proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Fitch</snm>
						<fnm>WM</fnm>
					</au>
				</aug>
				<source>Syst Zool</source>
				<pubdate>1970</pubdate>
				<volume>19</volume>
				<fpage>99</fpage>
				<lpage>106</lpage>
				<xrefbib>
					<pubid idtype="pmpid">5449325</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Homology a personal view on some of the problems.</p>
				</title>
				<aug>
					<au>
						<snm>Fitch</snm>
						<fnm>WM</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2000</pubdate>
				<volume>16</volume>
				<fpage>227</fpage>
				<lpage>231</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(00)02005-9</pubid>
						<pubid idtype="pmpid" link="fulltext">10782117</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Gene families: the taxonomy of protein paralogs and chimeras.</p>
				</title>
				<aug>
					<au>
						<snm>Henikoff</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Greene</snm>
						<fnm>EA</fnm>
					</au>
					<au>
						<snm>Pietrokovski</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Attwood</snm>
						<fnm>TK</fnm>
					</au>
					<au>
						<snm>Hood</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>278</volume>
				<fpage>609</fpage>
				<lpage>614</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.278.5338.609</pubid>
						<pubid idtype="pmpid" link="fulltext">9381171</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Orthology, paralogy and proposed classification for paralog subtypes.</p>
				</title>
				<aug>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>619</fpage>
				<lpage>620</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(02)02793-2</pubid>
						<pubid idtype="pmpid" link="fulltext">12446146</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores.</p>
				</title>
				<aug>
					<au>
						<snm>Wilson</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Kreychman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>297</volume>
				<fpage>233</fpage>
				<lpage>249</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.3550</pubid>
						<pubid idtype="pmpid" link="fulltext">10704319</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<aug>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
				</aug>
				<source>Sequence-Evolution-Function. Computational Approaches in Comparative Genomics</source>
				<publisher>New York: Kluwer Academic Publishers</publisher>
				<pubdate>2002</pubdate>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Chemical paleogenetics. Molecular "restoration studies" of extinct forms of life.</p>
				</title>
				<aug>
					<au>
						<snm>Pauling</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Zuckerkandl</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Acta Chem Scand</source>
				<pubdate>1963</pubdate>
				<volume>17</volume>
				<fpage>S9</fpage>
				<lpage>S16</lpage>
			</bibl>
			<bibl id="B15">
				<aug>
					<au>
						<snm>Ohno</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Evolution by Gene Duplication</source>
				<publisher>Berlin-Heidelberg-New York: Springer-Verlag</publisher>
				<pubdate>1970</pubdate>
			</bibl>
			<bibl id="B16">
				<title>
					<p>The probability of duplicate gene preservation by subfunctionalization.</p>
				</title>
				<aug>
					<au>
						<snm>Lynch</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Force</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2000</pubdate>
				<volume>154</volume>
				<fpage>459</fpage>
				<lpage>473</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10629003</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>A phylogenomic approach to microbial evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Sicheritz-Ponten</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>545</fpage>
				<lpage>552</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29656</pubid>
						<pubid idtype="pmpid" link="fulltext">11139625</pubid>
						<pubid idtype="doi">10.1093/nar/29.2.545</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>RIO: Analyzing proteomes by automated phylogenomics using resampled inference of orthologs.</p>
				</title>
				<aug>
					<au>
						<snm>Zmasek</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>14</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">116988</pubid>
						<pubid idtype="pmpid" link="fulltext">12028595</pubid>
						<pubid idtype="doi">10.1186/1471-2105-3-14</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Automated ortholog inference from phylogenetic trees and calculation of orthology reliability.</p>
				</title>
				<aug>
					<au>
						<snm>Storm</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>92</fpage>
				<lpage>99</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.1.92</pubid>
						<pubid idtype="pmpid" link="fulltext">11836216</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>A genomic perspective on protein families.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>278</volume>
				<fpage>631</fpage>
				<lpage>637</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.278.5338.631</pubid>
						<pubid idtype="pmpid" link="fulltext">9381173</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Measuring genome evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Huynen</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>5849</fpage>
				<lpage>5856</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">34486</pubid>
						<pubid idtype="pmpid" link="fulltext">9600883</pubid>
						<pubid idtype="doi">10.1073/pnas.95.11.5849</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Gene content phylogeny of herpesviruses.</p>
				</title>
				<aug>
					<au>
						<snm>Montague</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Hutchison</snm>
						<fnm>CA</fnm>
						<suf>3rd</suf>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>5334</fpage>
				<lpage>5339</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">25829</pubid>
						<pubid idtype="pmpid" link="fulltext">10805793</pubid>
						<pubid idtype="doi">10.1073/pnas.97.10.5334</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>The COG database: a tool for genome-scale analysis of protein functions and evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<fpage>33</fpage>
				<lpage>36</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102395</pubid>
						<pubid idtype="pmpid" link="fulltext">10592175</pubid>
						<pubid idtype="doi">10.1093/nar/28.1.33</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>The COG database: new developments in phylogenetic classification of proteins from complete genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Garkavtsev</snm>
						<fnm>IV</fnm>
					</au>
					<au>
						<snm>Tatusova</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Shankavaram</snm>
						<fnm>UT</fnm>
					</au>
					<au>
						<snm>Rao</snm>
						<fnm>BS</fnm>
					</au>
					<au>
						<snm>Kiryutin</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Fedorova</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>22</fpage>
				<lpage>28</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29819</pubid>
						<pubid idtype="pmpid" link="fulltext">11125040</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.22</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>The COG database: an updated version includes eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Fedorova</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Jackson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Jacobs</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Kiryutin</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Krylov</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Mazumder</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Mekhedov</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Nikolskaya</snm>
						<fnm>AN</fnm>
					</au>
					<etal/>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>41</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">222959</pubid>
						<pubid idtype="pmpid" link="fulltext">12969510</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Towards understanding the first genome sequence of a crenarchaeon by genome annotation using clusters of orthologous groups of proteins (COGs).</p>
				</title>
				<aug>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Shankavaram</snm>
						<fnm>UT</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2000</pubdate>
				<volume>1</volume>
				<fpage>research0009.1</fpage>
				<lpage>0009.19</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15027</pubid>
						<pubid idtype="pmpid" link="fulltext">11178258</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Genome sequence and comparative analysis of the solvent-producing bacterium <it>Clostridium acetobutylicum</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Nolling</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Breton</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Omelchenko</snm>
						<fnm>MV</fnm>
					</au>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Zeng</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>HM</fnm>
					</au>
					<au>
						<snm>Dubois</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Qiu</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Hitti</snm>
						<fnm>J</fnm>
					</au>
					<etal/>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2001</pubdate>
				<volume>183</volume>
				<fpage>4823</fpage>
				<lpage>4838</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">99537</pubid>
						<pubid idtype="pmpid" link="fulltext">11466286</pubid>
						<pubid idtype="doi">10.1128/JB.183.16.4823-4838.2001</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Complete genome sequence of <it>Salmonella enterica </it>serovar Typhimurium LT2.</p>
				</title>
				<aug>
					<au>
						<snm>McClelland</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sanderson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Spieth</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Clifton</snm>
						<fnm>SW</fnm>
					</au>
					<au>
						<snm>Latreille</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Courtney</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Porwollik</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ali</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dante</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Du</snm>
						<fnm>F</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>413</volume>
				<fpage>852</fpage>
				<lpage>856</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35101614</pubid>
						<pubid idtype="pmpid" link="fulltext">11677609</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>The complete genome of hyperthermophile <it>Methanopyrus kandleri </it>AV19 and monophyly of archaeal methanogens.</p>
				</title>
				<aug>
					<au>
						<snm>Slesarev</snm>
						<fnm>AI</fnm>
					</au>
					<au>
						<snm>Mezhevaya</snm>
						<fnm>KV</fnm>
					</au>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Polushin</snm>
						<fnm>NN</fnm>
					</au>
					<au>
						<snm>Shcherbinina</snm>
						<fnm>OV</fnm>
					</au>
					<au>
						<snm>Shakhova</snm>
						<fnm>VV</fnm>
					</au>
					<au>
						<snm>Belova</snm>
						<fnm>GI</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>4644</fpage>
				<lpage>4649</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">123701</pubid>
						<pubid idtype="pmpid" link="fulltext">11930014</pubid>
						<pubid idtype="doi">10.1073/pnas.032671499</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>A phylogenetic approach to target selection for structural genomics: solution structure of YciH.</p>
				</title>
				<aug>
					<au>
						<snm>Cort</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Bash</snm>
						<fnm>PA</fnm>
					</au>
					<au>
						<snm>Kennedy</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>4018</fpage>
				<lpage>4027</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148669</pubid>
						<pubid idtype="pmpid" link="fulltext">10497266</pubid>
						<pubid idtype="doi">10.1093/nar/27.20.4018</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Target selection for structural genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Brenner</snm>
						<fnm>SE</fnm>
					</au>
				</aug>
				<source>Nat Struct Biol</source>
				<pubdate>2000</pubdate>
				<volume>7</volume>
				<issue>Suppl</issue>
				<fpage>967</fpage>
				<lpage>969</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11104002</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Integrative database analysis in structural genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Gerstein</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nat Struct Biol</source>
				<pubdate>2000</pubdate>
				<volume>7</volume>
				<issue>Suppl</issue>
				<fpage>960</fpage>
				<lpage>963</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11104000</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Searching for drug targets in microbial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Curr Opin Biotechnol</source>
				<pubdate>1999</pubdate>
				<volume>10</volume>
				<fpage>571</fpage>
				<lpage>578</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0958-1669(99)00035-X</pubid>
						<pubid idtype="pmpid" link="fulltext">10600691</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>The role of genomics in antibacterial target discovery.</p>
				</title>
				<aug>
					<au>
						<snm>Buysse</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Curr Med Chem</source>
				<pubdate>2001</pubdate>
				<volume>8</volume>
				<fpage>1713</fpage>
				<lpage>1726</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11562290</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Jordan</snm>
						<fnm>IK</fnm>
					</au>
					<au>
						<snm>Kondrashov</snm>
						<fnm>FA</fnm>
					</au>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>research0053.1</fpage>
				<lpage>0053.9</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">64838</pubid>
						<pubid idtype="pmpid" link="fulltext">11790256</pubid>
						<pubid idtype="doi">10.1186/gb-2001-2-12-research0053</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>Genes linked by fusion events are generally of the same functional category: a systematic analysis of 30 microbial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Yanai</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Derti</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>DeLisi</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2001</pubdate>
				<volume>98</volume>
				<fpage>7940</fpage>
				<lpage>7945</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">35447</pubid>
						<pubid idtype="pmpid" link="fulltext">11438739</pubid>
						<pubid idtype="doi">10.1073/pnas.141236298</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea.</p>
				</title>
				<aug>
					<au>
						<snm>Lecompte</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Ripp</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Puzos-Barbe</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Duprat</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Heilig</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Dietrich</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Thierry</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Poch</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>981</fpage>
				<lpage>993</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.GR1653R</pubid>
						<pubid idtype="pmpid" link="fulltext">11381026</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Essential genes are more evolutionarily conserved than are nonessential genes in bacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Jordan</snm>
						<fnm>IK</fnm>
					</au>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>962</fpage>
				<lpage>968</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.87702. Article published online before print in May 2002</pubid>
						<pubid idtype="pmpid" link="fulltext">12045149</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.</p>
				</title>
				<aug>
					<au>
						<snm>Remm</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Storm</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2001</pubdate>
				<volume>314</volume>
				<fpage>1041</fpage>
				<lpage>1052</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.5197</pubid>
						<pubid idtype="pmpid" link="fulltext">11743721</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Microbial genescapes: phyletic and functional patterns of ORF distribution among prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Gaasterland</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ragan</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Microb Comp Genomics</source>
				<pubdate>1998</pubdate>
				<volume>3</volume>
				<fpage>199</fpage>
				<lpage>217</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10027190</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.</p>
				</title>
				<aug>
					<au>
						<snm>Pellegrini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Marcotte</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Yeates</snm>
						<fnm>TO</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>4285</fpage>
				<lpage>4288</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">16324</pubid>
						<pubid idtype="pmpid" link="fulltext">10200254</pubid>
						<pubid idtype="doi">10.1073/pnas.96.8.4285</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Who's your neighbor? New computational approaches for functional genomics.</p>
				</title>
				<aug>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>2000</pubdate>
				<volume>18</volume>
				<fpage>609</fpage>
				<lpage>613</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/76443</pubid>
						<pubid idtype="pmpid" link="fulltext">10835597</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>An alternative flavin-dependent mechanism for thymidylate synthesis.</p>
				</title>
				<aug>
					<au>
						<snm>Myllykallio</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lipowski</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Leduc</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Filee</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Forterre</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Liebl</snm>
						<fnm>U</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>297</volume>
				<fpage>105</fpage>
				<lpage>107</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1072113</pubid>
						<pubid idtype="pmpid" link="fulltext">12029065</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Trait-to-Gene. A computational method for predicting the function of uncharacterized genes.</p>
				</title>
				<aug>
					<au>
						<snm>Levesque</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Shasha</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Kim</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Surette</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Benfey</snm>
						<fnm>PN</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>129</fpage>
				<lpage>133</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(03)00009-5</pubid>
						<pubid idtype="pmpid" link="fulltext">12546786</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>Initial sequencing and analysis of the human genome.</p>
				</title>
				<aug>
					<au>
						<snm>Lander</snm>
						<fnm>ES</fnm>
					</au>
					<au>
						<snm>Linton</snm>
						<fnm>LM</fnm>
					</au>
					<au>
						<snm>Birren</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Nusbaum</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Zody</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Baldwin</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Devon</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Dewar</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Doyle</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>FitzHugh</snm>
						<fnm>W</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>409</volume>
				<fpage>860</fpage>
				<lpage>921</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35057062</pubid>
						<pubid idtype="pmpid" link="fulltext">11237011</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>The genome sequence of <it>Drosophila melanogaster</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Adams</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Celniker</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Holt</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Evans</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Gocayne</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Amanatides</snm>
						<fnm>PG</fnm>
					</au>
					<au>
						<snm>Scherer</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>PW</fnm>
					</au>
					<au>
						<snm>Hoskins</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Galle</snm>
						<fnm>RF</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>2185</fpage>
				<lpage>2195</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5461.2185</pubid>
						<pubid idtype="pmpid" link="fulltext">10731132</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>Genome sequence of the nematode <it>C. elegans </it>: a platform for investigating biology.</p>
				</title>
				<aug>
					<au>
						<cnm>The <it>C. elegans</it> Sequencing Consortium</cnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1998</pubdate>
				<volume>282</volume>
				<fpage>2012</fpage>
				<lpage>2018</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.282.5396.2012</pubid>
						<pubid idtype="pmpid" link="fulltext">9851916</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Analysis of the genome sequence of the flowering plant <it>Arabidopsis thaliana</it>.</p>
				</title>
				<aug>
					<au>
						<cnm><it>Arabidopsis</it> Genome Initiative</cnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>408</volume>
				<fpage>796</fpage>
				<lpage>815</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35048692</pubid>
						<pubid idtype="pmpid" link="fulltext">11130711</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Life with 6000 genes.</p>
				</title>
				<aug>
					<au>
						<snm>Goffeau</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Barrell</snm>
						<fnm>BG</fnm>
					</au>
					<au>
						<snm>Bussey</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Davis</snm>
						<fnm>RW</fnm>
					</au>
					<au>
						<snm>Dujon</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Feldmann</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Galibert</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Hoheisel</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Jacq</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Johnston</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>1996</pubdate>
				<volume>274</volume>
				<fpage>563</fpage>
				<lpage>547</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1126/science.274.5287.546</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>The genome sequence <it>of Schizosaccharomyces pombe</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Wood</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Gwilliam</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Rajandream</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Lyne</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lyne</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Stewart</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Sgouros</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Peat</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Hayles</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Baker</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>415</volume>
				<fpage>871</fpage>
				<lpage>880</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature724</pubid>
						<pubid idtype="pmpid" link="fulltext">11859360</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B51">
				<title>
					<p>Genome sequence and gene compaction of the eukaryote parasite <it>Encephalitozoon cuniculi</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Katinka</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Duprat</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Cornillot</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Metenier</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Thomarat</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Prensier</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Barbe</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Peyretaillade</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Brottier</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Wincker</snm>
						<fnm>P</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>414</volume>
				<fpage>450</fpage>
				<lpage>453</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35106579</pubid>
						<pubid idtype="pmpid" link="fulltext">11719806</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>The role of lineage-specific gene family expansion in the evolution of eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Lespinet</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>1048</fpage>
				<lpage>1059</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.174302</pubid>
						<pubid idtype="pmpid" link="fulltext">12097341</pubid>
						<pubid idtype="pmcid">186617</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>Clusters of orthologous groups for eukaryotic complete genomes</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi</url>
			</bibl>
			<bibl id="B54">
				<title>
					<p>HUNT: launch of a full-length cDNA database from the Helix Research Institute.</p>
				</title>
				<aug>
					<au>
						<snm>Yudate</snm>
						<fnm>HT</fnm>
					</au>
					<au>
						<snm>Suwa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Irie</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Matsui</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Nishikawa</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nakamura</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Yamaguchi</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Peng</snm>
						<fnm>ZZ</fnm>
					</au>
					<au>
						<snm>Yamamoto</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nagai</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>185</fpage>
				<lpage>188</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29790</pubid>
						<pubid idtype="pmpid" link="fulltext">11125086</pubid>
						<pubid idtype="doi">10.1093/nar/29.1.185</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B55">
				<title>
					<p>Annotation of the <it>Drosophila melanogaster </it>euchromatic genome: a systematic review.</p>
				</title>
				<aug>
					<au>
						<snm>Misra</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Crosby</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Matthews</snm>
						<fnm>BB</fnm>
					</au>
					<au>
						<snm>Campbell</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Hradecky</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kaminker</snm>
						<fnm>JS</fnm>
					</au>
					<au>
						<snm>Millburn</snm>
						<fnm>GH</fnm>
					</au>
					<au>
						<snm>Prochnik</snm>
						<fnm>SE</fnm>
					</au>
					<etal/>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>research0083.1</fpage>
				<lpage>0083.22</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">151185</pubid>
						<pubid idtype="pmpid" link="fulltext">12537572</pubid>
						<pubid idtype="doi">10.1186/gb-2002-3-12-research0083</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Sequencing and comparison of yeast species to identify genes and regulatory elements.</p>
				</title>
				<aug>
					<au>
						<snm>Kellis</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Patterson</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Endrizzi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Birren</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Lander</snm>
						<fnm>ES</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2003</pubdate>
				<volume>423</volume>
				<fpage>241</fpage>
				<lpage>254</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01644</pubid>
						<pubid idtype="pmpid" link="fulltext">12748633</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>Lineage-specific loss and divergence of functionally linked genes in eukaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Watanabe</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>11319</fpage>
				<lpage>11324</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">17198</pubid>
						<pubid idtype="pmpid" link="fulltext">11016957</pubid>
						<pubid idtype="doi">10.1073/pnas.200346997</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>Rickettsiae and Chlamydiae: evidence of horizontal gene transfer and gene exchange.</p>
				</title>
				<aug>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>1999</pubdate>
				<volume>15</volume>
				<fpage>173</fpage>
				<lpage>175</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(99)01704-7</pubid>
						<pubid idtype="pmpid" link="fulltext">10322483</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B59">
				<title>
					<p>A combined algorithm for genome-wide prediction of protein function.</p>
				</title>
				<aug>
					<au>
						<snm>Marcotte</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Pellegrini</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Yeates</snm>
						<fnm>TO</fnm>
					</au>
					<au>
						<snm>Eisenberg</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1999</pubdate>
				<volume>402</volume>
				<fpage>83</fpage>
				<lpage>86</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/47048</pubid>
						<pubid idtype="pmpid" link="fulltext">10573421</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>Gene and context: integrative approaches to genome analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Huynen</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Adv Protein Chem</source>
				<pubdate>2000</pubdate>
				<volume>54</volume>
				<fpage>345</fpage>
				<lpage>379</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10829232</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B61">
				<title>
					<p>Guilt by association: contextual information in genome analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>1074</fpage>
				<lpage>1077</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.10.8.1074</pubid>
						<pubid idtype="pmpid" link="fulltext">10958625</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B62">
				<title>
					<p>Rcl1p, the yeast protein similar to the RNA 3'-phosphate cyclase, associates with U3 snoRNP and is required for 18S rRNA biogenesis.</p>
				</title>
				<aug>
					<au>
						<snm>Billy</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Wegierski</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Nasr</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Filipowicz</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>EMBO J</source>
				<pubdate>2000</pubdate>
				<volume>19</volume>
				<fpage>2115</fpage>
				<lpage>2126</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/emboj/19.9.2115</pubid>
						<pubid idtype="pmpid" link="fulltext">10790377</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B63">
				<title>
					<p>Birth and death of protein domains: A simple model of evolution explains power law behavior.</p>
				</title>
				<aug>
					<au>
						<snm>Karev</snm>
						<fnm>GP</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Rzhetsky</snm>
						<fnm>AY</fnm>
					</au>
					<au>
						<snm>Berezovskaya</snm>
						<fnm>FS</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>BMC Evol Biol</source>
				<pubdate>2002</pubdate>
				<volume>2</volume>
				<fpage>18</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">137606</pubid>
						<pubid idtype="pmpid" link="fulltext">12379152</pubid>
						<pubid idtype="doi">10.1186/1471-2148-2-18</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B64">
				<title>
					<p>Dosage sensitivity and the evolution of gene families in yeast.</p>
				</title>
				<aug>
					<au>
						<snm>Papp</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Pal</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Hurst</snm>
						<fnm>LD</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2003</pubdate>
				<volume>424</volume>
				<fpage>194</fpage>
				<lpage>197</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01771</pubid>
						<pubid idtype="pmpid" link="fulltext">12853957</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B65">
				<title>
					<p>The chaperonin containing t-complex polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein folding and assembly in the eukaryotic cytosol.</p>
				</title>
				<aug>
					<au>
						<snm>Kubota</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Hynes</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Willison</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Eur J Biochem</source>
				<pubdate>1995</pubdate>
				<volume>230</volume>
				<fpage>3</fpage>
				<lpage>16</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7601114</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>The TRAPP complex is a nucleotide exchanger for Ypt1 and Ypt31/32.</p>
				</title>
				<aug>
					<au>
						<snm>Jones</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Newman</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Segev</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Mol Biol Cell</source>
				<pubdate>2000</pubdate>
				<volume>11</volume>
				<fpage>4403</fpage>
				<lpage>4411</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15082</pubid>
						<pubid idtype="pmpid" link="fulltext">11102533</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>MIPS: a database for genomes and protein sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Mewes</snm>
						<fnm>HW</fnm>
					</au>
					<au>
						<snm>Frishman</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Guldener</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Mannhaupt</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Mayer</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Mokrejs</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Munsterkotter</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rudd</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Weil</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>31</fpage>
				<lpage>34</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">99165</pubid>
						<pubid idtype="pmpid" link="fulltext">11752246</pubid>
						<pubid idtype="doi">10.1093/nar/30.1.31</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>Eukaryotic signalling domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer.</p>
				</title>
				<aug>
					<au>
						<snm>Ponting</snm>
						<fnm>CP</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Schultz</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1999</pubdate>
				<volume>289</volume>
				<fpage>729</fpage>
				<lpage>745</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1999.2827</pubid>
						<pubid idtype="pmpid" link="fulltext">10369758</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B69">
				<title>
					<p>ERB1, the yeast homolog of mammalian Bop1, is an essential gene required for maturation of the 25S and 5.8S ribosomal RNAs.</p>
				</title>
				<aug>
					<au>
						<snm>Pestov</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Stockelman</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Strezoska</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Lau</snm>
						<fnm>LF</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>3621</fpage>
				<lpage>3630</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">55883</pubid>
						<pubid idtype="pmpid" link="fulltext">11522832</pubid>
						<pubid idtype="doi">10.1093/nar/29.17.3621</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B70">
				<title>
					<p>A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis.</p>
				</title>
				<aug>
					<au>
						<snm>Dragon</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Gallagher</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Compagnone-Post</snm>
						<fnm>PA</fnm>
					</au>
					<au>
						<snm>Mitchell</snm>
						<fnm>BM</fnm>
					</au>
					<au>
						<snm>Porwancher</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Wehner</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Wormsley</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Settlage</snm>
						<fnm>RE</fnm>
					</au>
					<au>
						<snm>Shabanowitz</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Osheim</snm>
						<fnm>Y</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>417</volume>
				<fpage>967</fpage>
				<lpage>970</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature00769</pubid>
						<pubid idtype="pmpid" link="fulltext">12068309</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B71">
				<title>
					<p>From complete genomes to measures of substitution rate variability within and between proteins.</p>
				</title>
				<aug>
					<au>
						<snm>Grishin</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>991</fpage>
				<lpage>1000</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.10.7.991</pubid>
						<pubid idtype="pmpid" link="fulltext">10899148</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B72">
				<title>
					<p>The origin and evolution of model organisms.</p>
				</title>
				<aug>
					<au>
						<snm>Hedges</snm>
						<fnm>SB</fnm>
					</au>
				</aug>
				<source>Nat Rev Genet</source>
				<pubdate>2002</pubdate>
				<volume>3</volume>
				<fpage>838</fpage>
				<lpage>849</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrg929</pubid>
						<pubid idtype="pmpid" link="fulltext">12415314</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B73">
				<title>
					<p>The evolutionary position of nematodes.</p>
				</title>
				<aug>
					<au>
						<snm>Blair</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Ikeo</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hedges</snm>
						<fnm>SB</fnm>
					</au>
				</aug>
				<source>BMC Evol Biol</source>
				<pubdate>2002</pubdate>
				<volume>2</volume>
				<fpage>7</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">102755</pubid>
						<pubid idtype="pmpid" link="fulltext">11985779</pubid>
						<pubid idtype="doi">10.1186/1471-2148-2-7</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B74">
				<title>
					<p>Coelomata and not Ecdysozoa: evidence from genome-wide phylogenetic analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>29</fpage>
				<lpage>36</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">14707168</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B75">
				<title>
					<p>Evidence for a clade of nematodes, arthropods and other moulting animals.</p>
				</title>
				<aug>
					<au>
						<snm>Aguinaldo</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Turbeville</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Linford</snm>
						<fnm>LS</fnm>
					</au>
					<au>
						<snm>Rivera</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Garey</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Raff</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Lake</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1997</pubdate>
				<volume>387</volume>
				<fpage>489</fpage>
				<lpage>493</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/387489a0</pubid>
						<pubid idtype="pmpid">9168109</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B76">
				<title>
					<p>Hox genes in brachiopods and priapulids and protostome evolution.</p>
				</title>
				<aug>
					<au>
						<snm>de Rosa</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Grenier</snm>
						<fnm>JK</fnm>
					</au>
					<au>
						<snm>Andreeva</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Cook</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Adoutte</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Akam</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Carroll</snm>
						<fnm>SB</fnm>
					</au>
					<au>
						<snm>Balavoine</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>1999</pubdate>
				<volume>399</volume>
				<fpage>772</fpage>
				<lpage>776</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/21631</pubid>
						<pubid idtype="pmpid" link="fulltext">10391241</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B77">
				<title>
					<p>Testing the new animal phylogeny: first use of combined large-subunit and small-subunit rRNA gene sequences to classify the protostomes.</p>
				</title>
				<aug>
					<au>
						<snm>Mallatt</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Winchell</snm>
						<fnm>CJ</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2002</pubdate>
				<volume>19</volume>
				<fpage>289</fpage>
				<lpage>301</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11861888</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B78">
				<title>
					<p>Animal phylogeny and the ancestry of bilaterians: inferences from morphology and 18S rDNA gene sequences.</p>
				</title>
				<aug>
					<au>
						<snm>Peterson</snm>
						<fnm>KJ</fnm>
					</au>
					<au>
						<snm>Eernisse</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Evol Dev</source>
				<pubdate>2001</pubdate>
				<volume>3</volume>
				<fpage>170</fpage>
				<lpage>205</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1525-142x.2001.003003170.x</pubid>
						<pubid idtype="pmpid">11440251</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B79">
				<title>
					<p>Phylogenetic analysis under Dollo's Law.</p>
				</title>
				<aug>
					<au>
						<snm>Farris</snm>
						<fnm>JS</fnm>
					</au>
				</aug>
				<source>Syst Zool</source>
				<pubdate>1977</pubdate>
				<volume>26</volume>
				<fpage>77</fpage>
				<lpage>88</lpage>
			</bibl>
			<bibl id="B80">
				<title>
					<p>Modeling a minimal ribosome based on comparative sequence analysis.</p>
				</title>
				<aug>
					<au>
						<snm>Mears</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Cannone</snm>
						<fnm>JJ</fnm>
					</au>
					<au>
						<snm>Stagg</snm>
						<fnm>SM</fnm>
					</au>
					<au>
						<snm>Gutell</snm>
						<fnm>RR</fnm>
					</au>
					<au>
						<snm>Agrawal</snm>
						<fnm>RK</fnm>
					</au>
					<au>
						<snm>Harvey</snm>
						<fnm>SC</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>321</volume>
				<fpage>215</fpage>
				<lpage>234</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0022-2836(02)00568-5</pubid>
						<pubid idtype="pmpid" link="fulltext">12144780</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B81">
				<title>
					<p>CDD: a curated Entrez database of conserved domain alignments.</p>
				</title>
				<aug>
					<au>
						<snm>Marchler-Bauer</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Anderson</snm>
						<fnm>JB</fnm>
					</au>
					<au>
						<snm>DeWeese-Scott</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Fedorova</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Geer</snm>
						<fnm>LY</fnm>
					</au>
					<au>
						<snm>He</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hurwitz</snm>
						<fnm>DI</fnm>
					</au>
					<au>
						<snm>Jackson</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Jacobs</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Lanczycki</snm>
						<fnm>CJ</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>383</fpage>
				<lpage>387</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">165534</pubid>
						<pubid idtype="pmpid" link="fulltext">12520028</pubid>
						<pubid idtype="doi">10.1093/nar/gkg087</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B82">
				<title>
					<p>Archaea and the prokaryote-to-eukaryote transition.</p>
				</title>
				<aug>
					<au>
						<snm>Brown</snm>
						<fnm>JR</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
				</aug>
				<source>Microbiol Mol Biol Rev</source>
				<pubdate>1997</pubdate>
				<volume>61</volume>
				<fpage>456</fpage>
				<lpage>502</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">232621</pubid>
						<pubid idtype="pmpid" link="fulltext">9409149</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B83">
				<title>
					<p>Biochemical evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Wilson</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Carlson</snm>
						<fnm>SS</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>TJ</fnm>
					</au>
				</aug>
				<source>Annu Rev Biochem</source>
				<pubdate>1977</pubdate>
				<volume>46</volume>
				<fpage>573</fpage>
				<lpage>639</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.bi.46.070177.003041</pubid>
						<pubid idtype="pmpid" link="fulltext">409339</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B84">
				<title>
					<p>Protein dispensability and rate of evolution.</p>
				</title>
				<aug>
					<au>
						<snm>Hirsh</snm>
						<fnm>AE</fnm>
					</au>
					<au>
						<snm>Fraser</snm>
						<fnm>HB</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>411</volume>
				<fpage>1046</fpage>
				<lpage>1049</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35082561</pubid>
						<pubid idtype="pmpid" link="fulltext">11429604</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B85">
				<title>
					<p>Functional profiling of <it>the Saccharomyces cerevisiae genome.</it></p>
				</title>
				<aug>
					<au>
						<snm>Giaever</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chu</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Ni</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Connelly</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Riles</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Veronneau</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Dow</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lucau-Danila</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Anderson</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Andre</snm>
						<fnm>B</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>418</volume>
				<fpage>387</fpage>
				<lpage>391</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature00935</pubid>
						<pubid idtype="pmpid" link="fulltext">12140549</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B86">
				<title>
					<p>Systematic functional analysis of the <it>Caenorhabditis elegans </it>genome using RNAi.</p>
				</title>
				<aug>
					<au>
						<snm>Kamath</snm>
						<fnm>RS</fnm>
					</au>
					<au>
						<snm>Fraser</snm>
						<fnm>AG</fnm>
					</au>
					<au>
						<snm>Dong</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Poulin</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Durbin</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gotta</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kanapin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Le Bot</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Moreno</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sohrmann</snm>
						<fnm>M</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2003</pubdate>
				<volume>421</volume>
				<fpage>231</fpage>
				<lpage>237</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01278</pubid>
						<pubid idtype="pmpid" link="fulltext">12529635</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B87">
				<title>
					<p>Initial sequencing and comparative analysis of the mouse genome.</p>
				</title>
				<aug>
					<au>
						<snm>Waterston</snm>
						<fnm>RH</fnm>
					</au>
					<au>
						<snm>Lindblad-Toh</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Birney</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Abril</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Agarwal</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Agarwala</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Ainscough</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Alexandersson</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>An</snm>
						<fnm>P</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>420</volume>
				<fpage>520</fpage>
				<lpage>562</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01262</pubid>
						<pubid idtype="pmpid" link="fulltext">12466850</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B88">
				<title>
					<p>Whole-genome shotgun assembly and analysis of the genome of <it>Fugu rubripes</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Aparicio</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Stupka</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Putnam</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Chia</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Dehal</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Christoffels</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rash</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Hoon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Smit</snm>
						<fnm>A</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>297</volume>
				<fpage>1301</fpage>
				<lpage>1310</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1072104</pubid>
						<pubid idtype="pmpid" link="fulltext">12142439</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B89">
				<title>
					<p>The genome sequence of the malaria mosquito <it>Anopheles gambiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Holt</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Subramanian</snm>
						<fnm>GM</fnm>
					</au>
					<au>
						<snm>Halpern</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Sutton</snm>
						<fnm>GG</fnm>
					</au>
					<au>
						<snm>Charlab</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Nusskern</snm>
						<fnm>DR</fnm>
					</au>
					<au>
						<snm>Wincker</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Clark</snm>
						<fnm>AG</fnm>
					</au>
					<au>
						<snm>Ribeiro</snm>
						<fnm>JM</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>298</volume>
				<fpage>129</fpage>
				<lpage>149</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1076181</pubid>
						<pubid idtype="pmpid" link="fulltext">12364791</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B90">
				<title>
					<p>The draft genome of <it>Ciona intestinalis</it>: insights into chordate and vertebrate origins.</p>
				</title>
				<aug>
					<au>
						<snm>Dehal</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Satou</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Campbell</snm>
						<fnm>RK</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Degnan</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>De Tomaso</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Davidson</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Di Gregorio</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Gelpke</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Goodstein</snm>
						<fnm>DM</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2002</pubdate>
				<volume>298</volume>
				<fpage>2157</fpage>
				<lpage>2167</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1080049</pubid>
						<pubid idtype="pmpid" link="fulltext">12481130</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B91">
				<title>
					<p>Genome sequence of the human malaria parasite <it>Plasmodium falciparum</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Gardner</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Hall</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Fung</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Berriman</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Hyman</snm>
						<fnm>RW</fnm>
					</au>
					<au>
						<snm>Carlton</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Pain</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Bowman</snm>
						<fnm>S</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2002</pubdate>
				<volume>419</volume>
				<fpage>498</fpage>
				<lpage>511</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature01097</pubid>
						<pubid idtype="pmpid" link="fulltext">12368864</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B92">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B93">
				<title>
					<p>Analysis of compositionally biased regions in sequence databases.</p>
				</title>
				<aug>
					<au>
						<snm>Wootton</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Federhen</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1996</pubdate>
				<volume>266</volume>
				<fpage>554</fpage>
				<lpage>571</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8743706</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B94">
				<title>
					<p>NCBI BLAST server</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/blast</url>
			</bibl>
			<bibl id="B95">
				<title>
					<p>SMART, a simple modular architecture research tool: identification of signaling domains.</p>
				</title>
				<aug>
					<au>
						<snm>Schultz</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Milpetz</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Ponting</snm>
						<fnm>CP</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>5857</fpage>
				<lpage>5864</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">34487</pubid>
						<pubid idtype="pmpid" link="fulltext">9600884</pubid>
						<pubid idtype="doi">10.1073/pnas.95.11.5857</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B96">
				<title>
					<p>On global sequence alignment.</p>
				</title>
				<aug>
					<au>
						<snm>Huang</snm>
						<fnm>X</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1994</pubdate>
				<volume>10</volume>
				<fpage>227</fpage>
				<lpage>235</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7922677</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B97">
				<title>
					<p>Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods.</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Methods Enzymol</source>
				<pubdate>1996</pubdate>
				<volume>266</volume>
				<fpage>418</fpage>
				<lpage>427</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8743697</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B98">
				<title>
					<p>Clusters of orthologous groups for eukaryotic complete genomes</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi</url>
			</bibl>
			<bibl id="B99">
				<title>
					<p>The Eukaryotic Clusters of Orthologous Groups of proteins (KOGs): download</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/pub/COG/KOG</url>
			</bibl>
			<bibl id="B100">
				<title>
					<p>Reconstructed KOG sets for eukaryotic ancestral forms</p>
				</title>
				<url>ftp://ftp.ncbi.nih.gov/pub/koonin/Ancestors/</url>
			</bibl>
			<bibl id="B101">
				<title>
					<p>The BRC repeats in BRCA2 are critical for RAD51 binding and resistance to methyl methanesulfonate treatment.</p>
				</title>
				<aug>
					<au>
						<snm>Chen</snm>
						<fnm>PL</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>CF</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Xiao</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Sharp</snm>
						<fnm>ZD</fnm>
					</au>
					<au>
						<snm>Lee</snm>
						<fnm>WH</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>5287</fpage>
				<lpage>5292</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">20253</pubid>
						<pubid idtype="pmpid" link="fulltext">9560268</pubid>
						<pubid idtype="doi">10.1073/pnas.95.9.5287</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B102">
				<title>
					<p>BRCA2 homolog required for proficiency in DNA repair, recombination, and genome stability in <it>Ustilago maydis</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Kojic</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kostrub</snm>
						<fnm>CF</fnm>
					</au>
					<au>
						<snm>Buchman</snm>
						<fnm>AR</fnm>
					</au>
					<au>
						<snm>Holloman</snm>
						<fnm>WK</fnm>
					</au>
				</aug>
				<source>Mol Cell</source>
				<pubdate>2002</pubdate>
				<volume>10</volume>
				<fpage>683</fpage>
				<lpage>691</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12408834</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B103">
				<title>
					<p>Characterization of the <it>Escherichia coli </it>RNA 3'-terminal phosphate cyclase and its sigma54-regulated operon.</p>
				</title>
				<aug>
					<au>
						<snm>Genschik</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Drabikowski</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Filipowicz</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1998</pubdate>
				<volume>273</volume>
				<fpage>25516</fpage>
				<lpage>25526</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.273.39.25516</pubid>
						<pubid idtype="pmpid" link="fulltext">9738023</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B104">
				<title>
					<p>Mot1 activates and represses transcription by direct, ATPase-dependent mechanisms.</p>
				</title>
				<aug>
					<au>
						<snm>Dasgupta</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Darst</snm>
						<fnm>RP</fnm>
					</au>
					<au>
						<snm>Martin</snm>
						<fnm>KJ</fnm>
					</au>
					<au>
						<snm>Afshari</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Auble</snm>
						<fnm>DT</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>2666</fpage>
				<lpage>2671</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">122405</pubid>
						<pubid idtype="pmpid" link="fulltext">11880621</pubid>
						<pubid idtype="doi">10.1073/pnas.052397899</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B105">
				<title>
					<p>Novel families of putative protein kinases in bacteria and archaea: evolution of the "eukaryotic" protein kinase superfamily.</p>
				</title>
				<aug>
					<au>
						<snm>Leonard</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1998</pubdate>
				<volume>8</volume>
				<fpage>1038</fpage>
				<lpage>1047</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9799791</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B106">
				<title>
					<p>Late cytoplasmic maturation of the small ribosomal subunit requires RIO proteins in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Vanrobays</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Gelugne</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Gleizes</snm>
						<fnm>PE</fnm>
					</au>
					<au>
						<snm>Caizergues-Ferrer</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>2003</pubdate>
				<volume>23</volume>
				<fpage>2083</fpage>
				<lpage>2095</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">149469</pubid>
						<pubid idtype="pmpid" link="fulltext">12612080</pubid>
						<pubid idtype="doi">10.1128/MCB.23.6.2083-2095.2003</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B107">
				<title>
					<p>Functional genomic analysis of cell division in <it>C. elegans </it>using RNAi of genes on chromosome III.</p>
				</title>
				<aug>
					<au>
						<snm>Gonczy</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Echeverri</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Oegema</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Coulson</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Copley</snm>
						<fnm>RR</fnm>
					</au>
					<au>
						<snm>Duperon</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Oegema</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Brehm</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Cassin</snm>
						<fnm>E</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>408</volume>
				<fpage>331</fpage>
				<lpage>336</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35042526</pubid>
						<pubid idtype="pmpid" link="fulltext">11099034</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B108">
				<title>
					<p>Imp3p and Imp4p, two specific components of the U3 small nucleolar ribonucleoprotein that are essential for pre-18S rRNA processing.</p>
				</title>
				<aug>
					<au>
						<snm>Lee</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Baserga</snm>
						<fnm>SJ</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>1999</pubdate>
				<volume>19</volume>
				<fpage>5441</fpage>
				<lpage>5452</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">84386</pubid>
						<pubid idtype="pmpid" link="fulltext">10409734</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B109">
				<title>
					<p>Prediction of the archaeal exosome and its connections with the proteasome and the translation and transcription machineries by a comparative-genomic approach.</p>
				</title>
				<aug>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>240</fpage>
				<lpage>252</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.162001</pubid>
						<pubid idtype="pmpid" link="fulltext">11157787</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B110">
				<title>
					<p>Rrp8p is a yeast nucleolar protein functionally linked to Gar1p and involved in pre-rRNA cleavage at site A2.</p>
				</title>
				<aug>
					<au>
						<snm>Bousquet-Antonelli</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Vanrobays</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Gelugne</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Caizergues-Ferrer</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Henry</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>RNA</source>
				<pubdate>2000</pubdate>
				<volume>6</volume>
				<fpage>826</fpage>
				<lpage>843</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1017/S1355838200992288</pubid>
						<pubid idtype="pmpid" link="fulltext">10864042</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B111">
				<title>
					<p>Yeast virus propagation depends critically on free 60S ribosomal subunit concentration.</p>
				</title>
				<aug>
					<au>
						<snm>Ohtake</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Wickner</snm>
						<fnm>RB</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>1995</pubdate>
				<volume>15</volume>
				<fpage>2772</fpage>
				<lpage>2781</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">230508</pubid>
						<pubid idtype="pmpid" link="fulltext">7739558</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B112">
				<title>
					<p>Mak mutants of yeast: mapping and characterization.</p>
				</title>
				<aug>
					<au>
						<snm>Wickner</snm>
						<fnm>RB</fnm>
					</au>
					<au>
						<snm>Leibowitz</snm>
						<fnm>MJ</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1979</pubdate>
				<volume>140</volume>
				<fpage>154</fpage>
				<lpage>160</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">216791</pubid>
						<pubid idtype="pmpid">387719</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B113">
				<title>
					<p>Comparative genomics of the Archaea (Euryarchaeota): evolution of conserved protein families, the stable core, and the variable shell.</p>
				</title>
				<aug>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Grishin</snm>
						<fnm>NV</fnm>
					</au>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>1999</pubdate>
				<volume>9</volume>
				<fpage>608</fpage>
				<lpage>628</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10413400</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B114">
				<title>
					<p>PIN domains in nonsense-mediated mRNA decay and RNAi.</p>
				</title>
				<aug>
					<au>
						<snm>Clissold</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Ponting</snm>
						<fnm>CP</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>R888</fpage>
				<lpage>R890</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(00)00858-7</pubid>
						<pubid idtype="pmpid">11137022</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B115">
				<title>
					<p>Nob1p is required for biogenesis of the 26S proteasome and degraded upon its maturation in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Tone</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Toh</snm>
						<fnm>EA</fnm>
					</au>
				</aug>
				<source>Genes Dev</source>
				<pubdate>2002</pubdate>
				<volume>16</volume>
				<fpage>3142</fpage>
				<lpage>3157</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gad.1025602</pubid>
						<pubid idtype="pmpid" link="fulltext">12502737</pubid>
						<pubid idtype="pmcid">187499</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B116">
				<title>
					<p>Nob1p is required for cleavage of the 3' end of 18S rRNA.</p>
				</title>
				<aug>
					<au>
						<snm>Fatica</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Oeffinger</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Dlakic</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Tollervey</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>2003</pubdate>
				<volume>23</volume>
				<fpage>1798</fpage>
				<lpage>1807</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">151717</pubid>
						<pubid idtype="pmpid" link="fulltext">12588997</pubid>
						<pubid idtype="doi">10.1128/MCB.23.5.1798-1807.2003</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B117">
				<title>
					<p>Characterization of mutations that are synthetic lethal with pol3-13, a mutated allele of DNA polymerase delta in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Chanet</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Heude</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Curr Genet</source>
				<pubdate>2003</pubdate>
				<volume>43</volume>
				<fpage>337</fpage>
				<lpage>350</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s00294-003-0407-2</pubid>
						<pubid idtype="pmpid" link="fulltext">12759774</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B118">
				<title>
					<p>Ria1p (Ynl163c), a protein similar to elongation factors 2, is involved in the biogenesis of the 60S subunit of the ribosome in <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Becam</snm>
						<fnm>AM</fnm>
					</au>
					<au>
						<snm>Nasr</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Racki</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Zagulski</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Herbert</snm>
						<fnm>CJ</fnm>
					</au>
				</aug>
				<source>Mol Genet Genomics</source>
				<pubdate>2001</pubdate>
				<volume>266</volume>
				<fpage>454</fpage>
				<lpage>462</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/s004380100548</pubid>
						<pubid idtype="pmpid" link="fulltext">11713675</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B119">
				<title>
					<p>Distribution and evolution of von Willebrand/integrin A domains: widely dispersed domains with roles in cell adhesion and elsewhere.</p>
				</title>
				<aug>
					<au>
						<snm>Whittaker</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Hynes</snm>
						<fnm>RO</fnm>
					</au>
				</aug>
				<source>Mol Biol Cell</source>
				<pubdate>2002</pubdate>
				<volume>13</volume>
				<fpage>3369</fpage>
				<lpage>3387</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">129952</pubid>
						<pubid idtype="pmpid" link="fulltext">12388743</pubid>
						<pubid idtype="doi">10.1091/mbc.E02-05-0259</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B120">
				<title>
					<p>Mediator of transcriptional regulation.</p>
				</title>
				<aug>
					<au>
						<snm>Myers</snm>
						<fnm>LC</fnm>
					</au>
					<au>
						<snm>Kornberg</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>Annu Rev Biochem</source>
				<pubdate>2000</pubdate>
				<volume>69</volume>
				<fpage>729</fpage>
				<lpage>749</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.biochem.69.1.729</pubid>
						<pubid idtype="pmpid" link="fulltext">10966474</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B121">
				<title>
					<p>A novel human SRB/MED-containing cofactor complex, SMCC, involved in transcription regulation.</p>
				</title>
				<aug>
					<au>
						<snm>Gu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Malik</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ito</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Yuan</snm>
						<fnm>CX</fnm>
					</au>
					<au>
						<snm>Fondell</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Martinez</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Qin</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Roeder</snm>
						<fnm>RG</fnm>
					</au>
				</aug>
				<source>Mol Cell</source>
				<pubdate>1999</pubdate>
				<volume>3</volume>
				<fpage>97</fpage>
				<lpage>108</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10024883</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B122">
				<title>
					<p>Classification and evolution of P-loop GTPases and related ATPases.</p>
				</title>
				<aug>
					<au>
						<snm>Leipe</snm>
						<fnm>DD</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2002</pubdate>
				<volume>317</volume>
				<fpage>41</fpage>
				<lpage>72</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2001.5378</pubid>
						<pubid idtype="pmpid" link="fulltext">11916378</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B123">
				<title>
					<p>Phosphoesterase domains associated with DNA polymerases of diverse origins.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1998</pubdate>
				<volume>26</volume>
				<fpage>3746</fpage>
				<lpage>3752</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">147763</pubid>
						<pubid idtype="pmpid" link="fulltext">9685491</pubid>
						<pubid idtype="doi">10.1093/nar/26.16.3746</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B124">
				<title>
					<p>Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>1999</pubdate>
				<volume>287</volume>
				<fpage>1023</fpage>
				<lpage>1040</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.1999.2653</pubid>
						<pubid idtype="pmpid" link="fulltext">10222208</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B125">
				<title>
					<p>Genome alignment, evolution of prokaryotic genome organization and prediction of gene function using genomic context.</p>
				</title>
				<aug>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Kondrashov</snm>
						<fnm>AS</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>356</fpage>
				<lpage>372</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.GR-1619R</pubid>
						<pubid idtype="pmpid" link="fulltext">11230160</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B126">
				<title>
					<p>Vps45p stabilizes the syntaxin homologue Tlg2p and positively regulates SNARE complex formation.</p>
				</title>
				<aug>
					<au>
						<snm>Bryant</snm>
						<fnm>NJ</fnm>
					</au>
					<au>
						<snm>James</snm>
						<fnm>DE</fnm>
					</au>
				</aug>
				<source>EMBO J</source>
				<pubdate>2001</pubdate>
				<volume>20</volume>
				<fpage>3380</fpage>
				<lpage>3388</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">125511</pubid>
						<pubid idtype="pmpid" link="fulltext">11432826</pubid>
						<pubid idtype="doi">10.1093/emboj/20.13.3380</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B127">
				<title>
					<p>Comparative genomics and evolution of proteins involved in RNA metabolism.</p>
				</title>
				<aug>
					<au>
						<snm>Anantharaman</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>1427</fpage>
				<lpage>1464</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">101826</pubid>
						<pubid idtype="pmpid" link="fulltext">11917006</pubid>
						<pubid idtype="doi">10.1093/nar/30.7.1427</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B128">
				<title>
					<p>Ribonuclease activity of rat liver perchloric acid-soluble protein, a potent inhibitor of protein synthesis.</p>
				</title>
				<aug>
					<au>
						<snm>Morishita</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Kawagoshi</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Sawasaki</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Madin</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Ogasawara</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Oka</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Endo</snm>
						<fnm>Y</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1999</pubdate>
				<volume>274</volume>
				<fpage>20688</fpage>
				<lpage>20692</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.274.29.20688</pubid>
						<pubid idtype="pmpid" link="fulltext">10400702</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B129">
				<title>
					<p>Novel predicted RNA-binding domains associated with the translation machinery.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1999</pubdate>
				<volume>48</volume>
				<fpage>291</fpage>
				<lpage>302</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10093218</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B130">
				<title>
					<p>Cleavage of RNA hairpins mediated by a developmentally regulated CCCH zinc finger protein.</p>
				</title>
				<aug>
					<au>
						<snm>Bai</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Tolias</snm>
						<fnm>PP</fnm>
					</au>
				</aug>
				<source>Mol Cell Biol</source>
				<pubdate>1996</pubdate>
				<volume>16</volume>
				<fpage>6661</fpage>
				<lpage>6667</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">231668</pubid>
						<pubid idtype="pmpid" link="fulltext">8943320</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B131">
				<title>
					<p>Two RNA binding proteins, HEN4 and HUA1, act in the processing of <it>AGAMOUS </it>pre-mRNA in <it>Arabidopsis thaliana</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Cheng</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Kato</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Li</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>X</fnm>
					</au>
				</aug>
				<source>Dev Cell</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>53</fpage>
				<lpage>66</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12530963</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B132">
				<title>
					<p>Zinc finger-like structure in U1-specific protein C is essential for specific binding to U1 snRNP.</p>
				</title>
				<aug>
					<au>
						<snm>Nelissen</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Heinrichs</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Habets</snm>
						<fnm>WJ</fnm>
					</au>
					<au>
						<snm>Simons</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Luhrmann</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>van Venrooij</snm>
						<fnm>WJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1991</pubdate>
				<volume>19</volume>
				<fpage>449</fpage>
				<lpage>454</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1826349</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B133">
				<title>
					<p>The U box is a modified RING finger - a common domain in ubiquitination.</p>
				</title>
				<aug>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>R132</fpage>
				<lpage>R134</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(00)00398-5</pubid>
						<pubid idtype="pmpid" link="fulltext">10704423</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B134">
				<title>
					<p>Protein quality control: U-box-containing E3 ubiquitin ligases join the fold.</p>
				</title>
				<aug>
					<au>
						<snm>Cyr</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Hohfeld</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Patterson</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>2002</pubdate>
				<volume>27</volume>
				<fpage>368</fpage>
				<lpage>375</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0968-0004(02)02125-4</pubid>
						<pubid idtype="pmpid" link="fulltext">12114026</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B135">
				<title>
					<p>The essential protein fap7 is involved in the oxidative stress response of <it>Saccharomyces cerevisiae</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Juhnke</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Charizanis</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Latifi</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Krems</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Entian</snm>
						<fnm>KD</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>2000</pubdate>
				<volume>35</volume>
				<fpage>936</fpage>
				<lpage>948</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2958.2000.01768.x</pubid>
						<pubid idtype="pmpid" link="fulltext">10692169</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
