<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2004-5-4-r27</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Comparative genomics of gene-family size in closely related bacteria</p>
			</title>
			<aug>
				<au id="A1">
					<snm>Pushker</snm>
					<fnm>Ravindra</fnm>
					<insr iid="I1"/>
				</au>
				<au id="A2" ca="yes">
					<snm>Mira</snm>
					<fnm>Alex</fnm>
					<insr iid="I1"/>
					<email>alex.mira@umh.es</email>
				</au>
				<au id="A3">
					<snm>Rodr&#237;guez-Valera</snm>
					<fnm>Francisco</fnm>
					<insr iid="I1"/>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Evolutionary Genomics Group, Universidad Miguel Hern&#225;ndez, Campus de San Juan, Apartado 18, 03550 San Juan de Alicante, Alicante, Spain</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2004</pubdate>
			<volume>5</volume>
			<issue>4</issue>
			<fpage>R27</fpage>
			<url>http://genomebiology.com/2004/5/4/R27</url>
			<xrefbib>
				<pubid idtype="pmpid">15059260</pubid>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>12</day>
					<month>12</month>
					<year>2003</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>23</day>
					<month>1</month>
					<year>2004</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>6</day>
					<month>2</month>
					<year>2004</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>18</day>
					<month>3</month>
					<year>2004</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2004</year>
			<collab>Pushker et al.; licensee BioMed Central Ltd. This is an Open 
Access article: verbatim copying and redistribution of this article are 
permitted in all media for any purpose, provided this notice is preserved 
along with the article's original URL.</collab>
		</cpyrt>
		<shorttitle>
			<p>Comparative genomics of gene-family size in closely related bacteria</p>
		</shorttitle>
		<shortabs>
			<p>The size of a given gene family is remarkably similar in strains of the same species and in closely related species, suggesting that homologous gene families are vertically transmitted and depend little on horizontal gene transfer.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>The wealth of genomic data in bacteria is helping microbiologists understand the factors involved in gene innovation. Among these, the expansion and reduction of gene families appears to have a fundamental role in this, but the factors influencing gene family size are unclear.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>The relative content of paralogous genes in bacterial genomes increases with genome size, largely due to the expansion of gene family size in large genomes. Bacteria undergoing genome reduction display a parallel process of redundancy elimination, by which gene families are reduced to one or a few members. Gene family size is also influenced by sequence divergence and physiological function. Large gene families show wider sequence divergence, suggesting they are probably older, and certain functions (such as metabolite transport mechanisms) are overrepresented in large families. The size of a given gene family is remarkably similar in strains of the same species and in closely related species, suggesting that homologous gene families are vertically transmitted and depend little on horizontal gene transfer (HGT).</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>The remarkable preservation of copy numbers in widely different ecotypes indicates a functional role for the different copies rather than simply a back-up role. When different genera are compared, the increase in phylogenetic distance and/or ecological specialization disrupts this preservation, albeit in a gradual manner and maintaining an overall similarity, which also supports this view. HGT can have an important role, however, in nonhomologous gene families, as exemplified by a comparison between saprophytic and enterohemorrhagic strains of <it>Escherichia coli</it>.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>One of the unexpected revelations of prokaryotic genomes has been the existence of significant gene redundancy. The existence of multiple gene copies in eukaryotes has been known for a long time and is considered an important element in their molecular evolution <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. In pre-genomic times, however, bacteria were considered to be streamlined cells that carried very little, if any, redundant information in their genomes. It therefore came as a surprise when the genome of <it>Escherichia coli </it>K12 showed that nearly 30% of the coding sequences could be grouped into gene families that were similar enough to be assigned similar functions <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. They were described as 'paralog' gene families, with the implicit assumption that their similarity reflected similar evolutionary descent, but actual or potential functional divergence. Since then, the presence of gene families typically containing between two and 30 copies has been described for nearly every prokaryotic genome sequenced. The number of paralogous genes and families appears to correlate well with an increase in genome size <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>. The relative contribution of these genes in each genome seems to be independent of phylogenetic affiliation and, for a limited dataset, appears to depend on genome size <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>.</p>
			<p>These gene families of diverse size and degree of similarity remain an important and little explored feature of prokaryotes. In eukaryotic genomes they are generally taken as the result of gene duplication. This would either supply the required gene dosage or the raw material for adaptation by mutation and selection acting on one of the copies that diverges in properties or function <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B8">8</abbr></abbrgrp>. In <it>E. coli</it>, a model organism in which traditional genetics and physiology have already allowed the unequivocal identification of more than half of the coding genes, the role of paralog families (whatever their origin) seems much more operational than in eukaryotes <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. For example, the different members of a gene family contribute the proper gene dosage or, most often, provide different specificities for similar chemical reactions or for other processes such as transport of different molecules. Regarding origin, duplication is not necessarily the only source for new members of a gene family in prokaryotes. The gene pools are known to vary enormously from one strain to another <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, and horizontal gene transfer (HGT) acts as a powerful source of innovation <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Therefore, HGT could provide gene families with members already divergent in sequence and function <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. In prokaryotes, gene families could be the result of incomplete xenologous gene replacement by which a gene from another genome gets incorporated into a gene family with which it shares some sequence similarity. This process would provide additional physiological plasticity, and studies on the DNA composition of paralogous genes suggest that its contribution might be substantial <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The divergence of some of the members of the gene families or their DNA composition could be taken as evidence for a HGT origin <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. It is unclear at the moment the extent to which each of these genomic forces (gene duplication and HGT) contributes to genome expansion and variability <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
			<p>To address these issues we have compared the size of gene families across bacterial taxa. To try to shed light on the evolutionary origin of these initially redundant genes we have studied the distribution of gene family size among completed genomes of strains within the same bacterial species and over larger taxonomic distances. If the different family members were acquired by HGT their numbers will vary widely among different strains, as already detected for single genes in adaptive islands <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> or for whole families predicted to have been transferred as a whole <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. On the other hand, if the family numbers are similar in different strains, vertical descent or a very old HGT will be a more likely origin. We have also determined the contribution of paralogous families to genome size for all 127 available eubacterial genomes, updating earlier work on a more limited dataset <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. We have also tried to identify other factors affecting the number of members in a family, besides genome size, particularly sequence divergence, gene function and species lifestyle.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>Gene family size in bacterial genomes</p>
				</st>
				<p>Previous work on a more reduced set of sequenced genomes had determined that large genomes contain more paralogs and more gene families than smaller genomes <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Jordan and collaborators also found a correlation between the fraction of the genome occupied by gene families and the genome size; that is, larger genomes had a larger proportion of redundant genes. However, at the time of that analysis, the sequences of genomes larger than 5 million base pairs (5 Mbp) were not available. Now, the inclusion of genomes nearly twice as large confirms both trends (Figure <figr fid="F1">1</figr>): for example, nearly 50% of the genome is occupied by paralogous genes in <it>Streptomyces coelicolor</it>. A closer look at these data shows that larger genomes have larger gene families, as the average family size also increases with genome size (Figure <figr fid="F1">1</figr>, inset). Thus, the higher percentage of paralogs in large genomes is partly due to the expansion of existing gene families, together with a larger number of new families. The large-genomed species at one end of the distribution, such as <it>Streptomyces</it>, have gene families of up to 85 members, whereas the largest gene families in middle-sized genomes such as those of <it>E. coli </it>or <it>Salmonella </it>have more moderate numbers (40-45). This is reminiscent of the situation in eukaryotes, where the number of gene families increases with the number of genes in the genome at a lower rate than in prokaryotes <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, indicating that gene families have many more members in the larger eukaryotic genomes. Also consistent with this trend, some reconstructions of prokaryotic genome evolution based on gene content conclude that gene duplication has a critical role in the expansion of genome size <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Relationship between percentage of genes belonging to paralogous families plotted versus genome size in 127 eubacterial genomes</p>
					</caption>
					<text>
						<p>Relationship between percentage of genes belonging to paralogous families plotted versus genome size in 127 eubacterial genomes. Inset shows the average gene family size versus genome size for the same genomes, except <it>Shigella flexneri</it>, <it>Bordetella pertusis</it>, <it>B. parapertussis </it>and <it>B. bronchiseptica</it>, which contain a high number of IS elements. Some genomes with atypical values are identified: <it>Mpn</it>, <it>Mycoplasma pneumoniae</it>; <it>Mpt</it>, <it>Mycoplasma penetrans</it>; <it>Mga</it>, <it>Mycoplasma gallisepticum</it>; <it>Mlp</it>, <it>Mycobacterium leprae</it>; <it>Pir</it>, <it>Pirellula </it>sp.</p>
					</text>
					<graphic file="gb-2004-5-4-r27-1"/>
				</fig>
				<p>Exceptions to the linear correlation in this graph are interesting to consider. On one hand, <it>Pirellula </it>(marked as <it>Pir </it>in Figure <figr fid="F1">1</figr>) has an enormous genome with a surprisingly low relative number of paralogs. This is due to an overrepresentation of small gene families and the absence of large ones (the largest gene family contains 57 members; see Additional data file 1). <it>Pirellula </it>is a marine bacterium and the reason for the reduced gene family size might be the homogeneity of the marine environment, in contrast to other large-genomed bacteria included in the graph which have the ability to survive in many different niches or in much more heterogeneous habitats, such as soil. In agreement with this, <it>Pirellula </it>has a greatly reduced number of transcriptional regulators, which again might reflect a relatively constant environment <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. At the other end of the distribution, exceptions occur for three species that have small genomes with a larger-than-expected percentage of paralogs. All these species are mycoplasmas, and the high percentage of paralogs is due to a few gene families that are greatly expanded, including more than 25 members. In <it>Mycoplasma penetrans</it>, for example, these families include surface-exposed lipoproteins involved in antigenic variation <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, which are critical to the success of microbes exposed to the immune system of their hosts. On the other hand, the small genomes of other pathogenic bacteria correspond to intracellular parasites that do not need to evade the immune system <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, and these species show the smallest portion of paralogs. Finally, the largest gene families that we detected were those involving mobile genetic elements such as the IS elements of <it>Shigella flexneri</it>, where families surpassed 100 members (not included in the inset of Figure <figr fid="F1">1</figr>).</p>
				<p>The data in Figure <figr fid="F1">1</figr> cannot be viewed as a continuum, because small genomes are not ancestral to bigger ones. Instead, small genomes have been shown to be the result of reductive evolution, a process by which a larger-sized ancestor changes niche and undergoes a dramatic loss of DNA <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>. Both small and large genome fragments can be eliminated but the outcome of this process for gene families has not been documented. We have compared the number of members per gene family in two genomes that are undergoing rapid reductive evolution - <it>Shigella flexneri </it>2a and <it>Mycobacterium leprae </it>TN - with larger-genomed close relatives (Figure <figr fid="F2">2</figr>). <it>Shigella </it>is a close relative of <it>E. coli </it>that has specialized in living as a human pathogen <abbrgrp><abbr bid="B24">24</abbr><abbr bid="B25">25</abbr></abbrgrp>. As a result of the expansion of the human population from Neolithic times a number of more generalistic or opportunistic pathogens found a new niche; <it>Salmonella typhi </it>might be a similar example <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. In both cases there is a clear tendency to genome reduction accompanied by expansion of IS families (314 and 46 IS elements, respectively).</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Gene family sizes in genomes undergoing reductive evolution compared to a phylogenetically related larger sequenced genome</p>
					</caption>
					<text>
						<p>Gene family sizes in genomes undergoing reductive evolution compared to a phylogenetically related larger sequenced genome. <b>(a) </b><it>Mycobacterium leprae </it>(reductive) vs <it>Mycobacterium tuberculosis </it>H37Rv; <b>(b) </b><it>Shigella flexneri </it>(reductive) vs <it>Escherichia coli </it>K12. Orthologous genes in the genome pairs (identified by amino-acid sequence similarity) are displayed in arbitrary order and plotted against the number of homologs in their own genome (that is, paralogs). Only protein-coding genes are included. IS elements from <it>S. flexneri </it>2a are excluded.</p>
					</text>
					<graphic file="gb-2004-5-4-r27-2"/>
				</fig>
				<p>In <it>Shigella </it>there is a clear reduction in gene family copy number (Figure <figr fid="F2">2</figr>), which seems to be higher than would be expected from the random location of IS elements, suggesting that they might insert preferentially in gene family members. Something similar is found in the case of <it>M. leprae </it>(Figure <figr fid="F3">3</figr>), although in this case the main mechanism for gene inactivation is the generation of pseudogenes by mutation <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. <it>M. leprae </it>is closely related to <it>M. tuberculosis</it>, with which it shares many homologous sequences. However, most gene families have been simplified in the short time period in which the leprosy bacillus has adopted its mainly intracellular lifestyle. This also illustrates the fact that, as described above, an early step in genome reduction allowed by intracellular parasitism or a narrower range of hosts is the shrinkage of gene families. It shows that the smaller percentage of paralogs in reduced genomes is probably due to simplification of existing gene families. A similar pattern was found in the small-genomed intracellular species <it>Rickettsia </it>and <it>Buchnera </it>when compared with free-living species of the same taxonomic group (see Additional data file 2). Thus, both genome expansion and reduction can be partly explained by the parallel growth or simplification, respectively, of gene families.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>The number of members in <it>E. coli </it>K12 gene families plotted versus mean sequence identity of pairwise comparisons among the members of each family</p>
					</caption>
					<text>
						<p>The number of members in <it>E. coli </it>K12 gene families plotted versus mean sequence identity of pairwise comparisons among the members of each family.</p>
					</text>
					<graphic file="gb-2004-5-4-r27-3"/>
				</fig>
				<p>Another feature we could detect in the evolution of gene families was that large families were more divergent (Figure <figr fid="F3">3</figr>). This could partly be due to a side-effect of the higher variability of a larger sample size or to misidentification of family members at low sequence identity levels. However, given the observed similarity of functions in these large families (<abbrgrp><abbr bid="B4">4</abbr><abbr bid="B28">28</abbr></abbrgrp> and R.P., A.M. and F.R-V., unpublished results), a substantial proportion must be true paralogous genes. Thus, this relationship can be interpreted as older (more divergent) families containing more members. Smaller families range from those with very similar members to those in which the members are very different. The latter probably represent either old families in which new members have not evolved because new duplications do not confer a selective advantage, or more recent incomplete xenologous replacements.</p>
			</sec>
			<sec>
				<st>
					<p>Gene family size in intraspecific and interspecific comparisons</p>
				</st>
				<p>The sequencing of several strains of a single species is now common in bacterial genomics. One of the most remarkable findings has been the different gene pools carried by strains that are highly similar if their housekeeping genes only are compared. For example, different virotypes of <it>E. coli </it>were shown to contain very different gene complements, with large pools of genes characteristic of each virotype <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Obvious candidates to vary would be multigene families. Thus, the comparison of the numbers of members within a single species might shed light in their origin. If the members of a gene family are frequently acquired by HGT from outside, the numbers should be expected to vary broadly in different lineages of the species (as a result of different acquisitions). On the other hand, if the numbers are similar, that would indicate that the families were already present in the common ancestor and represent a relatively stable feature of the genome.</p>
				<p>We selected distinct prokaryotic taxa in which three or more strains have been fully sequenced (<it>Escherichia coli</it>, <it>Streptococcus pyogenes</it>, <it>Staphylococcus aureus </it>and <it>Chlamydophila pneumoniae</it>) and for each taxon established a list of homologous genes common to all strains. The gene family to which each homolog belonged was determined for each strain, and the number of family members compared for equivalent families (Figure <figr fid="F4">4</figr>). In all four species considered, the different strains showed a remarkably similar pattern of gene family size distributions: large gene families in one strain were also expanded in the others; small families were small, regardless of strain or virotype. Caution has to be exercised when examining these plots, as a gene can be a member of more than one gene family. However, although some of the gene families in Figure <figr fid="F4">4</figr> are redundant, the parallel size pattern of gene families across strains is remarkably clear and seems to reflect a stable feature of the genome. Thus, the majority of gene families were most likely to have been formed by ancestral gene duplications or ancient gene transfers common to all strains. In addition, the preservation of gene family size in different strains strongly suggests that most family members have a high value for survival; redundant copies would otherwise be quickly eliminated.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Gene family sizes for homologous genes in groups of strains belonging to the same species, represented as in Figure <figr fid="F2">2</figr></p>
					</caption>
					<text>
						<p>Gene family sizes for homologous genes in groups of strains belonging to the same species, represented as in Figure <figr fid="F2">2</figr>. <b>(a) </b><it>Chlamydophila pneumoniae </it>strains; <b>(b) </b><it>Streptococcus pyogenes </it>strains; <b>(c) </b><it>Escherichia coli </it>strains; <b>(d) </b><it>Staphylococcus aureus </it>strains. Strain denomination and graph code displayed in the top right-hand corner. Only protein-coding genes are included. Zero on the <it>y</it>-axis indicates single-copy genes; 1 indicates a gene family formed of two members.</p>
					</text>
					<graphic file="gb-2004-5-4-r27-4"/>
				</fig>
				<p>We have obviously not excluded the possibility that nonhomologous gene families add to the differences among the compared genomes. For example, in a pairwise comparison between <it>E. coli </it>K12 and <it>E. coli </it>O157:H7, 186 genes belonging to paralog families were unique to K12 and 788 to O157:H7, versus 403 singletons (single-copy genes not belonging to families) unique to K12 and 883 to O157:H7. Thus, K12 keeps the same standard proportion of 30% paralogs for the differential gene pool. In O157:H7, on the other hand, paralogs account for 47% of the set of unique genes. The interpretation might be that the large islands that characterize the genome of the enterohemorrhagic virotype tend to carry a bigger proportion of families than the rest of the genome. Thus, it is possible that in some strains, HGT may contribute to expand and generate gene families that do not appear as homologs in closely related genomes. For example, 146 genes belonging to families of 10 or more members were detected in the O157:H7 differential pool, including three whole families of 14, 17 and 20 members with a G+C content of 57, 54 and 53%, respectively (the average G+C content in <it>E. coli </it>O157:H7 is 50.6%). The largest differential family in K12 had 11 members, which were not present in the enterohemorrhagic strain, and had a G+C content of 54.1% (the average G+C content of <it>E. coli </it>K12 is 50.5%).</p>
				<p>To investigate whether the conservation in the size of homologous families was maintained across more divergent genomes, gene family plots were performed between species. A representative case for a Gram-negative (<it>Pseudomonas</it>) and a Gram-positive (<it>Bacillus</it>) comparison is illustrated in Figure <figr fid="F5">5</figr>. The preservation of family size was still remarkable, although, in the case of <it>Pseudomonas</it>, the number of orthologous genes is considerable smaller. The overall pattern of family sizes is preserved across these species. The two <it>Bacillus </it>species considered have the same genome size and one species contains larger numbers in some families but fewer in others (Figure <figr fid="F5">5b</figr>). The same trend was found in comparisons between species of <it>Staphylococcus</it>, <it>Streptococcus</it>, <it>Salmonella </it>and <it>Mycoplasma </it>(data not shown). It is also interesting to analyze the variation detected. Part of it can be attributed to differences in genome size. <it>Pseudomonas syringae </it>is approximately 200 kb larger than its other sequenced partners, which have mostly smaller gene families. However, part of the variability is also due to intrinsic differences between the species. For example, <it>P. syringae </it>contains some large gene families involved in invasion of the plant host and in pathogenesis <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. One way to examine whether this variation can underlie the phenotypic/ecological characteristics of a given species is to visualize the size difference of each paralog group for some representative cases.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Gene family sizes for homologous protein-coding genes in different species of the same genus</p>
					</caption>
					<text>
						<p>Gene family sizes for homologous protein-coding genes in different species of the same genus. <b>(a) </b><it>Pseudomonas </it>spp; <b>(b) </b><it>Bacillus </it>spp. <b>(c) </b>Difference in the size of equivalent gene families between <it>E. coli </it>K12 and <it>S. typhimurium </it>LT2. Positive values indicate larger families in <it>E. coli</it>; negative values indicate larger families in <it>S. typhymurium</it>. The <it>potG </it>gene family is indicated.</p>
					</text>
					<graphic file="gb-2004-5-4-r27-5"/>
				</fig>
				<p>Figure <figr fid="F5">5c</figr> shows the difference in gene family size in the interspecific comparison of <it>E. coli </it>K12 and <it>S. typhimurium </it>LT2. Both strains have similarly sized genomes (<it>S. typhimurium </it>is 218 kb larger) and a relatively high level of homology (3,026 orthologous genes). Of these, there are 572 homologs belonging to families that differ in size between the two genomes, and 435 belonging to families having the same number of members in both species. The rest are single-copy genes in both genomes. Forty-eight families were significantly larger (two or more extra copies) in <it>E. coli</it>, while 53 were larger in <it>Salmonella</it>. These differences can be taken as an example of the evolution of gene families in two diverging groups. Although the natural history of these model bacteria is not as well known as might be expected, it is generally believed that both <it>Salmonella </it>and <it>Escherichia </it>are mostly saprophytic facultative anaerobes that inhabit the intestine of vertebrates. The divergence between these two microbes arose after the origin of mammals around 120 million years ago. <it>E. coli </it>specialized as a commensal and an opportunistic pathogen of mammals, as witnessed, for example, by its ability to degrade lactose. On the other hand, <it>Salmonella </it>remains as a commensal in reptiles, with some serotypes colonizing mammals, but as a pathogen rather than a commensal and after developing strategies for intracellular invasion of the host <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. Accepting this scenario, the fact that many gene families (and the number of members of each family) are preserved reflects a significant involvement in the saprophytic intestinal lifestyle, preserved over many millions of years. On the other hand, significant differences are starting to arise between the two species, perhaps reflecting their specialization in different hosts and lifestyles <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. A dramatic example is the <it>potG </it>gene family, which has 13 more members in <it>S. typhimurium </it>than in <it>E. coli </it>(Figure <figr fid="F5">5</figr>). This is an ATP-binding component of spermidine/putrescine transport and for some reason its amplification has been selected in this species. Proteins involved in the transport of spermidine and putrescine have been shown to be involved in attachment to host cells and virulence <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>. Therefore, the size of this gene family might reflect the more pathogenic lifestyle of <it>Salmonella</it>.</p>
			</sec>
			<sec>
				<st>
					<p>Functional classification of gene families</p>
				</st>
				<p>Do certain functions predispose genes to form families? Do single genes that do not form families belong to a different category? To address these questions, extended gene families were identified, where a gene was not allowed to belong to more than one family. Thus, if gene A matched gene B, and gene B matched C, but A did not match C, all three were considered part of the same family, as it is likely that they are all evolutionarily derived from each other <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. This method of transitive assembly of paralogs has been confirmed to include, in most cases, genes with related functions <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. We found that, for all 127 sequenced species, singletons (genes in a single copy in a given genome) were massively overrepresented by genes with an unknown or hypothetical function. When only genes with a known or predicted function were included, these single genes without paralogs appeared equally distributed among the different functional categories. However, when genes belonging to families, especially large ones, were considered, a significant fraction had particular functions, such as transport of metabolites (data not shown). These data are, however, probably unrealistic because they represent the distribution of genes in sequenced genomes only, and certain species are overrepresented. In addition, larger genomes will also weigh more in this comparison than small genomes, as will species with several sequenced strains. We did, however, find relatively uniform results for individual genomes. Figure <figr fid="F6">6</figr> shows such a distribution for two species, one Gram-negative (<it>E. coli </it>K12) and the other Gram-positive (<it>B. subtilis</it>). For <it>E. coli</it>, in which a large proportion of genes has been allocated a function, families with more than five members contain fewer unknown or hypothetical genes than do smaller families, and the distribution of functions among categories is unequal, with certain categories being overrepresented. Among these, genes involved in transport of different metabolites predominate (39% of the total), followed by those with transcription and replication/repair functions. In genes that do not belong to a family, however, most functional categories are equally represented and a large proportion of these singletons have an unknown function. The overrepresentation of unknown or hypothetical open reading frames (ORFs) could, in part, be due to many of these singletons not being real genes, as supported by their shorter length when compared to genes belonging to families. In the gamma-proteobacteria, for example, average singleton length is 127 nucleotides less than in genes belonging to families. It is also interesting to note that the phylogenetic distribution of these unknown singletons is not different from that of unknown paralogs (see Additional data file 4). In conclusion, some functions do appear to be more prone to develop families, although the functions overrepresented in a particular species may depend on its lifestyle.</p>
				<fig id="F6">
					<title>
						<p>Figure 6</p>
					</title>
					<caption>
						<p>Proportions of assigned functions among genes belonging to families and singletons in <it>B. subtilis </it>and <it>E. coli </it>K12</p>
					</caption>
					<text>
						<p>Proportions of assigned functions among genes belonging to families and singletons in <it>B. subtilis </it>and <it>E. coli </it>K12. Gene functions were assigned according to the Cluster of Orthologous Genes (COGs) classification <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. Extended gene families are considered, in which a gene belongs to a single family only (see Materials and methods).</p>
					</text>
					<graphic file="gb-2004-5-4-r27-6"/>
				</fig>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>In eukaryotic genomes, a cornerstone of gene creation is extension of paralogous families by gene duplication <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. This is reflected in the slow increase of new gene families with genome size, which does correlate with an increase in the size of the families <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. The importance of DNA duplication in eukaryotes is probably also favored by the limitations of HGT in this group <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Despite the pervasiveness of HGT in prokaryotes, the increase in gene families with genome size is also robust (Figure <figr fid="F1">1</figr>). One obvious fact contributing to this situation might be that the pool of essential genes that have to be present for basic cell biology represents a larger percentage of a smaller genome, restricting the contribution of redundant genes with related functions and thus more expendable. However, this does not explain the high level of correlation maintained at the larger end of the range.</p>
			<p>Of course, with the number of genomes available presently there is a certain representation bias, with a large input from human pathogens. Among these, small genomes often correspond to intracellular forms that are protected from the immune system of the host. Variability of antigen specificity is one paradigmatic case that justifies gene familes in extracellular pathogens of vertebrates, for example the PPE genes of <it>Mycobacterium </it><abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and the Pap adhesins in <it>E. coli </it><abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The exceptional case of the mycoplasmas points in this direction as they possess small genomes but are extracellular mucosa-associated pathogens, and hence subjected to the host immune system <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. At the other end of the genome size range there are many more free-living, saprophytic or opportunistic pathogens, a lifestyle that requires a highly versatile gene complement in order to survive, for example, both inside and outside a host. Again, the one exception is a single large-genome species from a relatively stable environment (<it>Pirellula</it>, which lives in the open ocean). Here, the possibility to carry out many different physiological activities is probably more advantageous than the ability to adapt the same activity to a wider range of conditions. Thus, as with other aspects of biology, the genomic properties of bacteria appear to be greatly conditioned by their specialist or generalist lifestyle.</p>
			<p>The comparison of gene family size among strains from a single species shows a remarkable level of conservation, even when genome sizes are very different. This conservation indicates that gene family size is probably an ancestral feature rather than reflecting the acquisition of paralogs by HGT. This is consistent with evolutionary models based on bacterial gene content, which concluded that most protein gene families are transmitted by vertical inheritance <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. The conservation that is detected even among more distantly related taxa strengthens this view, as in mostly free-living and very niche-diversified species such as <it>Pseudomonas</it>, there is a remarkable degree of conservation. This might reflect involvement of the gene families in more fundamental (less environment-dependent) processes of cell biology.</p>
			<p>Genomic evolution simulations concluded that the amount of gene duplication is independent of HGT levels <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. On the basis of these simulations, an upper limit of 20% was estimated for paralogs of xenologous origin. Assuming that the extra members of a gene family from our paralog plots represent an upper limit of HGT for established families, we calculate that gene transfer accounts for a maximum of 11% of a given family in <it>E. coli </it>(Figure <figr fid="F4">4c</figr>). However, this does not take into account families that are unique to a given strain and that may have a xenologous origin. The fact that these families are not included in the paralog plots (which display only homolog pairs between strains) suggests that they can represent transfers to a given strain. Thus, the paralog plots present a picture of stability and limited xenologous genes for already established families, but this is not inconsistent with the transfer of families that appear to be unique to a given strain or species. It could, theoretically, be more probable that gene families expand by horizontal transfers than by gene duplication <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. This way, xenologous genes would already confer a functionally distinct role and would avoid the neutrality period in which redundant gene copies coexist and can be eliminated <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The results shown here suggest that the overrepresentation of duplications among transferred genes found by Hooper and Berg <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> might be a feature of these specific families but not of more ancient, homologous ones.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<p>The protein sequences of the 127 completely sequenced eubacterial genomes at the time this paper was submitted for publication were retrieved from the Genome division, Entrez retrieval system of the National Center for Biotechnology Information (NCBI; <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>). Table <tblr tid="T1">1</tblr> shows a list with all the genomes used, with their genome size and accession numbers. To detect potentially homologous genes we started by carrying out an all-against-all BLASTP <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> search of every protein sequence in one genome against every protein sequence in all the other genomes. We then recorded the best reciprocal hit for each protein sequence with an E-value lower than 10<sup>-5 </sup>and sequence identity higher than 50% over more than 60% of the length. To validate the results, we performed some representative comparisons by studying the distribution of the ratio of bit score to the maximal bit score <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. This method would separate probable homology from random similarity. We obtained almost identical results, with only a reduced set of the respective homologous genes being different in the two lists. For example, out of 3,026 homolog pairs between <it>E. coli </it>K12 and <it>S. typhimurium </it>detected by the reciprocal hit method, only one pair was found to differ with the bit score method. In addition, only three genes were detected with the reciprocal best-hit method that were not selected as homologs using the bit score method (using a cut-off value of 0.4). Finally, the bit-score ratio method identified 165 additional homologs that were not selected using reciprocal best-hits because they did not satisfy the length and/or sequence-identity requirements. Therefore, the list of homologous genes obtained by reciprocal best-hits was used for all the analyses.</p>
			<tbl id="T1">
				<title>
					<p>Table 1</p>
				</title>
				<caption>
					<p>Species used in the current work and their accession numbers</p>
				</caption>
				<tblbdy cols="3">
					<r>
						<c ca="left">
							<p>Species</p>
						</c>
						<c ca="left">
							<p>Accession number</p>
						</c>
						<c ca="left">
							<p>Genome size (bp)</p>
						</c>
					</r>
					<r>
						<c cspan="3">
							<hr/>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Agrobacterium tumefaciens </it>str. C58 (Cereon)</p>
						</c>
						<c ca="left">
							<p>NC_003062</p>
						</c>
						<c ca="left">
							<p>2,841,581</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Agrobacterium tumefaciens </it>str. C58 (U. Washington)</p>
						</c>
						<c ca="left">
							<p>NC_003304</p>
						</c>
						<c ca="left">
							<p>2,841,490</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Aquifex aeolicus </it>VF5</p>
						</c>
						<c ca="left">
							<p>NC_000918</p>
						</c>
						<c ca="left">
							<p>1,551,335</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bacillus anthracis </it>str. Ames</p>
						</c>
						<c ca="left">
							<p>NC_003997</p>
						</c>
						<c ca="left">
							<p>5,227,293</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bacillus cereus </it>ATCC 14579</p>
						</c>
						<c ca="left">
							<p>NC_004722</p>
						</c>
						<c ca="left">
							<p>5,411,809</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Bacillus halodurans</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002570</p>
						</c>
						<c ca="left">
							<p>4,202,353</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bacillus subtilis </it>subsp. <it>subtilis </it>str. 168</p>
						</c>
						<c ca="left">
							<p>NC_000964</p>
						</c>
						<c ca="left">
							<p>4,214,814</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bacteroides thetaiotaomicron </it>VPI-5482</p>
						</c>
						<c ca="left">
							<p>NC_004663</p>
						</c>
						<c ca="left">
							<p>6,260,361</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bifidobacterium longum </it>NCC2705</p>
						</c>
						<c ca="left">
							<p>NC_004307</p>
						</c>
						<c ca="left">
							<p>2,256,646</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Bordetella bronchiseptica</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002927</p>
						</c>
						<c ca="left">
							<p>5,339,179</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Bordetella parapertussis</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002928</p>
						</c>
						<c ca="left">
							<p>4,773,551</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Bordetella pertussis</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002929</p>
						</c>
						<c ca="left">
							<p>4,086,189</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Borrelia burgdorferi </it>B31</p>
						</c>
						<c ca="left">
							<p>NC_001318</p>
						</c>
						<c ca="left">
							<p>910,724</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Bradyrhizobium japonicum </it>USDA 110</p>
						</c>
						<c ca="left">
							<p>NC_004463</p>
						</c>
						<c ca="left">
							<p>9,105,828</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Brucella melitensis </it>16M</p>
						</c>
						<c ca="left">
							<p>NC_003317</p>
						</c>
						<c ca="left">
							<p>2,117,144</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Brucella suis </it>1330</p>
						</c>
						<c ca="left">
							<p>NC_004310</p>
						</c>
						<c ca="left">
							<p>2,107,792</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Buchnera aphidicola </it>str. APS <it>(Acyrthosiphon pisum)</it></p>
						</c>
						<c ca="left">
							<p>NC_002528</p>
						</c>
						<c ca="left">
							<p>640,681</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Buchnera aphidicola </it>str. Bp <it>(Baizongia pistaciae)</it></p>
						</c>
						<c ca="left">
							<p>NC_004545</p>
						</c>
						<c ca="left">
							<p>615,980</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Buchnera aphidicola </it>str. Sg <it>(Schizaphis graminum)</it></p>
						</c>
						<c ca="left">
							<p>NC_004061</p>
						</c>
						<c ca="left">
							<p>641,454</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Campylobacter jejuni </it>subsp. <it>jejuni </it>NCTC 11168</p>
						</c>
						<c ca="left">
							<p>NC_002163</p>
						</c>
						<c ca="left">
							<p>1,641,481</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Candidatus Blochmannia floridanus</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_005061</p>
						</c>
						<c ca="left">
							<p>705,557</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Caulobacter crescentus </it>CB15</p>
						</c>
						<c ca="left">
							<p>NC_002696</p>
						</c>
						<c ca="left">
							<p>4,016,947</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Chlamydia muridarum</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002620</p>
						</c>
						<c ca="left">
							<p>1,072,950</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Chlamydia trachomatis</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000117</p>
						</c>
						<c ca="left">
							<p>1,042,519</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlamydophila caviae </it>GPIC</p>
						</c>
						<c ca="left">
							<p>NC_003361</p>
						</c>
						<c ca="left">
							<p>1,173,390</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlamydophila pneumoniae </it>AR39</p>
						</c>
						<c ca="left">
							<p>NC_002179</p>
						</c>
						<c ca="left">
							<p>1,229,858</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlamydophila pneumoniae </it>CWL029</p>
						</c>
						<c ca="left">
							<p>NC_000922</p>
						</c>
						<c ca="left">
							<p>1,230,230</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlamydophila pneumoniae </it>J138</p>
						</c>
						<c ca="left">
							<p>NC_002491</p>
						</c>
						<c ca="left">
							<p>1,226,565</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlamydophila pneumoniae </it>TW-183</p>
						</c>
						<c ca="left">
							<p>NC_005043</p>
						</c>
						<c ca="left">
							<p>1,225,935</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chlorobium tepidum </it>TLS</p>
						</c>
						<c ca="left">
							<p>NC_002932</p>
						</c>
						<c ca="left">
							<p>2,154,946</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Chromobacterium violaceum </it>ATCC 12472</p>
						</c>
						<c ca="left">
							<p>NC_005085</p>
						</c>
						<c ca="left">
							<p>4,751,080</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Clostridium acetobutylicum</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003030</p>
						</c>
						<c ca="left">
							<p>3,940,880</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Clostridium perfringens </it>str. 13</p>
						</c>
						<c ca="left">
							<p>NC_003366</p>
						</c>
						<c ca="left">
							<p>3,031,430</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Clostridium tetani </it>E88</p>
						</c>
						<c ca="left">
							<p>NC_004557</p>
						</c>
						<c ca="left">
							<p>2,799,251</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Corynebacterium diphtheriae</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002935</p>
						</c>
						<c ca="left">
							<p>2,488,635</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Corynebacterium efficiens </it>YS-314</p>
						</c>
						<c ca="left">
							<p>NC_004369</p>
						</c>
						<c ca="left">
							<p>3,147,090</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Corynebacterium glutamicum </it>ATCC 13032</p>
						</c>
						<c ca="left">
							<p>NC_003450</p>
						</c>
						<c ca="left">
							<p>3,309,401</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Coxiella burnetii </it>RSA 493</p>
						</c>
						<c ca="left">
							<p>NC_002971</p>
						</c>
						<c ca="left">
							<p>1,995,275</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Deinococcus radiodurans</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_001263</p>
						</c>
						<c ca="left">
							<p>2,648,638</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Enterococcus faecalis </it>V583</p>
						</c>
						<c ca="left">
							<p>NC_004668</p>
						</c>
						<c ca="left">
							<p>3,218,031</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Escherichia coli </it>CFT073</p>
						</c>
						<c ca="left">
							<p>NC_004431</p>
						</c>
						<c ca="left">
							<p>5,231,428</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Escherichia coli </it>K12</p>
						</c>
						<c ca="left">
							<p>NC_000913</p>
						</c>
						<c ca="left">
							<p>4,639,221</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Escherichia coli </it>O157:H7</p>
						</c>
						<c ca="left">
							<p>NC_002695</p>
						</c>
						<c ca="left">
							<p>5,498,450</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Escherichia coli </it>O157:H7 EDL933</p>
						</c>
						<c ca="left">
							<p>NC_002655</p>
						</c>
						<c ca="left">
							<p>5,528,445</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Fusobacterium nucleatum </it>subsp. <it>nucleatum </it>ATCC 25586</p>
						</c>
						<c ca="left">
							<p>NC_003454</p>
						</c>
						<c ca="left">
							<p>2,174,500</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Gloeobacter violaceus</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_005125</p>
						</c>
						<c ca="left">
							<p>4,659,019</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Haemophilus ducreyi </it>35000HP</p>
						</c>
						<c ca="left">
							<p>NC_002940</p>
						</c>
						<c ca="left">
							<p>1,698,955</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Haemophilus influenzae </it>Rd</p>
						</c>
						<c ca="left">
							<p>NC_000907</p>
						</c>
						<c ca="left">
							<p>1,830,138</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Helicobacter hepaticus </it>ATCC 51449</p>
						</c>
						<c ca="left">
							<p>NC_004917</p>
						</c>
						<c ca="left">
							<p>1,799,146</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Helicobacter pylori </it>26695</p>
						</c>
						<c ca="left">
							<p>NC_000915</p>
						</c>
						<c ca="left">
							<p>1,667,867</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Helicobacter pylori </it>J99</p>
						</c>
						<c ca="left">
							<p>NC_000921</p>
						</c>
						<c ca="left">
							<p>1,643,831</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Lactobacillus plantarum </it>WCFS1</p>
						</c>
						<c ca="left">
							<p>NC_004567</p>
						</c>
						<c ca="left">
							<p>3,308,274</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Lactococcus lactis </it>subsp. <it>lactis</it></p>
						</c>
						<c ca="left">
							<p>NC_002662</p>
						</c>
						<c ca="left">
							<p>2,365,589</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Leptospira interrogans </it>serovar <it>lai </it>str. 56601</p>
						</c>
						<c ca="left">
							<p>NC_004342</p>
						</c>
						<c ca="left">
							<p>4,332,241</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Listeria innocua</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003212</p>
						</c>
						<c ca="left">
							<p>3,011,208</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Listeria monocytogenes </it>EGD-e</p>
						</c>
						<c ca="left">
							<p>NC_003210</p>
						</c>
						<c ca="left">
							<p>2,944,528</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mesorhizobium loti</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002678</p>
						</c>
						<c ca="left">
							<p>7,036,074</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Mycobacterium bovis </it>subsp. <it>bovis </it>AF2122/97</p>
						</c>
						<c ca="left">
							<p>NC_002945</p>
						</c>
						<c ca="left">
							<p>4,345,492</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mycobacterium leprae</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002677</p>
						</c>
						<c ca="left">
							<p>3,268,203</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Mycobacterium tuberculosis </it>CDC1551</p>
						</c>
						<c ca="left">
							<p>NC_002755</p>
						</c>
						<c ca="left">
							<p>4,403,836</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Mycobacterium tuberculosis </it>H37Rv</p>
						</c>
						<c ca="left">
							<p>NC_000962</p>
						</c>
						<c ca="left">
							<p>4,411,529</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Mycoplasma gallisepticum </it>R</p>
						</c>
						<c ca="left">
							<p>NC_004829</p>
						</c>
						<c ca="left">
							<p>996,422</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mycoplasma genitalium</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000908</p>
						</c>
						<c ca="left">
							<p>580,074</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mycoplasma penetrans</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_004432</p>
						</c>
						<c ca="left">
							<p>1,358,633</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mycoplasma pneumoniae</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000912</p>
						</c>
						<c ca="left">
							<p>816,394</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Mycoplasma pulmonis</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002771</p>
						</c>
						<c ca="left">
							<p>963,879</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Neisseria meningitidis </it>MC58</p>
						</c>
						<c ca="left">
							<p>NC_003112</p>
						</c>
						<c ca="left">
							<p>2,272,351</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Neisseria meningitidis </it>Z2491</p>
						</c>
						<c ca="left">
							<p>NC_003116</p>
						</c>
						<c ca="left">
							<p>2,184,406</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Nitrosomonas europaea </it>ATCC 19718</p>
						</c>
						<c ca="left">
							<p>NC_004757</p>
						</c>
						<c ca="left">
							<p>2,812,094</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Nostoc </it>sp. PCC 7120</p>
						</c>
						<c ca="left">
							<p>NC_003272</p>
						</c>
						<c ca="left">
							<p>6,413,771</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Oceanobacillus iheyensis </it>HTE831</p>
						</c>
						<c ca="left">
							<p>NC_004193</p>
						</c>
						<c ca="left">
							<p>3,630,528</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Pasteurella multocida</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002663</p>
						</c>
						<c ca="left">
							<p>2,257,487</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Photorhabdus luminescens </it>subsp. <it>laumondii </it>TTO1</p>
						</c>
						<c ca="left">
							<p>NC_005126</p>
						</c>
						<c ca="left">
							<p>5,688,987</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Pirellula </it>sp.</p>
						</c>
						<c ca="left">
							<p>NC_005027</p>
						</c>
						<c ca="left">
							<p>7,145,576</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Porphyromonas gingivalis </it>W83</p>
						</c>
						<c ca="left">
							<p>NC_002950</p>
						</c>
						<c ca="left">
							<p>2,343,476</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Prochlorococcus marinus </it>str. MIT 9313</p>
						</c>
						<c ca="left">
							<p>NC_005071</p>
						</c>
						<c ca="left">
							<p>2,410,873</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Prochlorococcus marinus </it>subsp. <it>marinus </it>str. CCMP1375</p>
						</c>
						<c ca="left">
							<p>NC_005042</p>
						</c>
						<c ca="left">
							<p>1,751,080</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Prochlorococcus marinus </it>subsp. <it>pastoris </it>str. CCMP1378</p>
						</c>
						<c ca="left">
							<p>NC_005072</p>
						</c>
						<c ca="left">
							<p>1,657,990</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Pseudomonas aeruginosa </it>PA01</p>
						</c>
						<c ca="left">
							<p>NC_002516</p>
						</c>
						<c ca="left">
							<p>6,264,403</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Pseudomonas putida </it>KT2440</p>
						</c>
						<c ca="left">
							<p>NC_002947</p>
						</c>
						<c ca="left">
							<p>6,181,863</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Pseudomonas syringae </it>pv. <it>tomato </it>str. DC3000</p>
						</c>
						<c ca="left">
							<p>NC_004578</p>
						</c>
						<c ca="left">
							<p>6,397,126</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Ralstonia solanacearum</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003295</p>
						</c>
						<c ca="left">
							<p>3,716,413</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Rickettsia conorii</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003103</p>
						</c>
						<c ca="left">
							<p>1,268,755</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Rickettsia prowazekii</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000963</p>
						</c>
						<c ca="left">
							<p>1,111,523</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Salmonella enterica </it>subsp. <it>enterica serovar </it>Typhi</p>
						</c>
						<c ca="left">
							<p>NC_003198</p>
						</c>
						<c ca="left">
							<p>4,809,037</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Salmonella enterica </it>subsp. <it>enterica serovar </it>Typhi Ty2</p>
						</c>
						<c ca="left">
							<p>NC_004631</p>
						</c>
						<c ca="left">
							<p>4,791,961</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Salmonella typhimurium </it>LT2</p>
						</c>
						<c ca="left">
							<p>NC_003197</p>
						</c>
						<c ca="left">
							<p>4,857,432</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Shewanella oneidensis </it>MR-1</p>
						</c>
						<c ca="left">
							<p>NC_004347</p>
						</c>
						<c ca="left">
							<p>4,969,803</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Shigella flexneri 2a </it>str. 2457T</p>
						</c>
						<c ca="left">
							<p>NC_004741</p>
						</c>
						<c ca="left">
							<p>4,599,354</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Shigella flexneri 2a </it>str. 301</p>
						</c>
						<c ca="left">
							<p>NC_004337</p>
						</c>
						<c ca="left">
							<p>4,607,203</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Sinorhizobium meliloti</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003047</p>
						</c>
						<c ca="left">
							<p>3,654,135</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Staphylococcus aureus </it>subsp. <it>aureus </it>MW2</p>
						</c>
						<c ca="left">
							<p>NC_003923</p>
						</c>
						<c ca="left">
							<p>2,820,462</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Staphylococcus aureus </it>subsp. <it>aureus </it>Mu50</p>
						</c>
						<c ca="left">
							<p>NC_002758</p>
						</c>
						<c ca="left">
							<p>2,878,040</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Staphylococcus aureus </it>subsp. <it>aureus </it>N315</p>
						</c>
						<c ca="left">
							<p>NC_002745</p>
						</c>
						<c ca="left">
							<p>2,814,816</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Staphylococcus epidermidis </it>ATCC 12228</p>
						</c>
						<c ca="left">
							<p>NC_004461</p>
						</c>
						<c ca="left">
							<p>2,499,279</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus agalactiae </it>2603V/<it>R</it></p>
						</c>
						<c ca="left">
							<p>NC_004116</p>
						</c>
						<c ca="left">
							<p>2,160,267</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus agalactiae </it>NEM316</p>
						</c>
						<c ca="left">
							<p>NC_004368</p>
						</c>
						<c ca="left">
							<p>211,485</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus mutans </it>UA159</p>
						</c>
						<c ca="left">
							<p>NC_004350</p>
						</c>
						<c ca="left">
							<p>203,0921</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pneumoniae </it>R6</p>
						</c>
						<c ca="left">
							<p>NC_003098</p>
						</c>
						<c ca="left">
							<p>2,038,615</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pneumoniae </it>TIGR4</p>
						</c>
						<c ca="left">
							<p>NC_003028</p>
						</c>
						<c ca="left">
							<p>2,160,837</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pyogenes </it>M1 GAS</p>
						</c>
						<c ca="left">
							<p>NC_002737</p>
						</c>
						<c ca="left">
							<p>1,852,441</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pyogenes </it>MGAS315</p>
						</c>
						<c ca="left">
							<p>NC_004070</p>
						</c>
						<c ca="left">
							<p>1,900,521</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pyogenes </it>MGAS8232</p>
						</c>
						<c ca="left">
							<p>NC_003485</p>
						</c>
						<c ca="left">
							<p>1,895,017</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptococcus pyogenes </it>SSI-1</p>
						</c>
						<c ca="left">
							<p>NC_004606</p>
						</c>
						<c ca="left">
							<p>1,894,275</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptomyces avermitilis </it>MA-4680</p>
						</c>
						<c ca="left">
							<p>NC_003155</p>
						</c>
						<c ca="left">
							<p>9,025,608</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Streptomyces coelicolor </it>A3(2)</p>
						</c>
						<c ca="left">
							<p>NC_003888</p>
						</c>
						<c ca="left">
							<p>8,667,507</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Synechococcus </it>sp. WH 8102</p>
						</c>
						<c ca="left">
							<p>NC_005070</p>
						</c>
						<c ca="left">
							<p>2,434,428</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Synechocystis </it>sp. PCC 6803</p>
						</c>
						<c ca="left">
							<p>NC_000911</p>
						</c>
						<c ca="left">
							<p>3,573,470</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Thermoanaerobacter tengcongensis</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_003869</p>
						</c>
						<c ca="left">
							<p>2,689,445</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Thermosynechococcus elongates </it>BP-1</p>
						</c>
						<c ca="left">
							<p>NC_004113</p>
						</c>
						<c ca="left">
							<p>2,593,857</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Thermotoga maritima</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000853</p>
						</c>
						<c ca="left">
							<p>1,860,725</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Treponema pallidum</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_000919</p>
						</c>
						<c ca="left">
							<p>1,138,011</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Tropheryma whipplei </it>TW08/27</p>
						</c>
						<c ca="left">
							<p>NC_004551</p>
						</c>
						<c ca="left">
							<p>925,938</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Tropheryma whipplei </it>str. Twist</p>
						</c>
						<c ca="left">
							<p>NC_004572</p>
						</c>
						<c ca="left">
							<p>927,303</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Ureaplasma urealyticum</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002162</p>
						</c>
						<c ca="left">
							<p>751,719</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Vibrio cholerae</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_002505</p>
						</c>
						<c ca="left">
							<p>2,961,149</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Vibrio parahaemolyticus </it>RIMD 2210633</p>
						</c>
						<c ca="left">
							<p>NC_004603</p>
						</c>
						<c ca="left">
							<p>3,288,558</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Vibrio vulnificus </it>CMCP6</p>
						</c>
						<c ca="left">
							<p>NC_004459</p>
						</c>
						<c ca="left">
							<p>3,281,945</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Vibrio vulnificus </it>YJ016</p>
						</c>
						<c ca="left">
							<p>NC_005139</p>
						</c>
						<c ca="left">
							<p>3,354,505</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Wigglesworthia glossinidia </it>(from <it>Glossina brevipalpis</it>)</p>
						</c>
						<c ca="left">
							<p>NC_004344</p>
						</c>
						<c ca="left">
							<p>697,724</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p>
								<it>Wolinella succinogenes</it>
							</p>
						</c>
						<c ca="left">
							<p>NC_005090</p>
						</c>
						<c ca="left">
							<p>2,110,355</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Xanthomonas axonopodis </it>pv. <it>citri </it>str. 306</p>
						</c>
						<c ca="left">
							<p>NC_003919</p>
						</c>
						<c ca="left">
							<p>5,175,554</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Xanthomonas campestris </it>pv. <it>campestris </it>str. ATCC 33913</p>
						</c>
						<c ca="left">
							<p>NC_003902</p>
						</c>
						<c ca="left">
							<p>5,076,188</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Xylella fastidiosa </it>9a5c</p>
						</c>
						<c ca="left">
							<p>NC_002488</p>
						</c>
						<c ca="left">
							<p>2,679,306</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Xylella fastidiosa </it>Temecula1</p>
						</c>
						<c ca="left">
							<p>NC_004556</p>
						</c>
						<c ca="left">
							<p>2,519,802</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Yersinia pestis </it>CO92</p>
						</c>
						<c ca="left">
							<p>NC_003143</p>
						</c>
						<c ca="left">
							<p>4,653,728</p>
						</c>
					</r>
					<r>
						<c ca="left">
							<p><it>Yersinia pestis </it>KIM</p>
						</c>
						<c ca="left">
							<p>NC_004088</p>
						</c>
						<c ca="left">
							<p>4,600,755</p>
						</c>
					</r>
				</tblbdy>
			</tbl>
			<p>To detect potential paralogous genes, we carried out an all-against-all BLASTP <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> search of every protein sequence in a genome against every protein sequence in the same genome. We define paralogs as protein sequences satisfying an E-value threshold of 10<sup>-5 </sup>in BLASTP <abbrgrp><abbr bid="B40">40</abbr></abbrgrp> search and having at least 30% sequence identity over more than 60% of their lengths <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
			<p>When comparing paralogs between two species, a gene family was created for each homologous gene detected in both genomes. This gave rise to some redundant families but ensured that the comparison between species was done between equivalent gene families. To describe the functional assignment of paralogous genes, extended gene families were created <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> that contained all genes that were interrelated by hits among any of their members. This is based on the transitive nature of sequence homology <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> and is supported by the findings on well-studied genomes of species with a relatively well-known metabolism. In these cases, extended gene families seem to be formed by genes involved in similar functions <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. To minimize the incorporation of multidomain proteins in a family together with unrelated members <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, length cut-offs were kept at 60%. The assignment of a function to a gene was based on the Clusters of Orthologous Groups (COGs) classification <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>Additional data file 1 is a PDF file of a figure showing the number of paralogs and the percentage of paralogous genes in the different-sized gene families in <it>Pirelulla </it>sp. compared to other large-sized genomes. Additional data file 2 is a PDF file of a figure showing gene family sizes in intracellular genomes that have undergone reductive evolution compared to related free-living organisms. Additional data file 3 contains legends to the figures in Additional data files 1 and 2. Additional data file 4 is a zip file containing the data from which the figures in the manuscript were made. The files are ordered following the figures as they appear in the text, and a readme text file explains the content of each file.</p>
			<suppl id="s1">
				<title>
					<p>Additional data file 1</p>
				</title>
				<caption>
					<p>A figure showing the number of paralogs and the percentage of paralogous genes in the different-sized gene families in <it>Pirelulla </it>sp. compared to other large-sized genomes</p>
				</caption>
				<text>
					<p>A figure showing the number of paralogs and the percentage of paralogous genes in the different-sized gene families in <it>Pirelulla </it>sp. compared to other large-sized genomes</p>
				</text>
				<file name="gb-2004-5-4-r27-s1.pdf">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
			<suppl id="s2">
				<title>
					<p>Additional data file 2</p>
				</title>
				<caption>
					<p>A figure showing gene family sizes in intracellular genomes that have undergone reductive evolution compared to related free-living organisms</p>
				</caption>
				<text>
					<p>A figure showing gene family sizes in intracellular genomes that have undergone reductive evolution compared to related free-living organisms</p>
				</text>
				<file name="gb-2004-5-4-r27-s2.pdf">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
			<suppl id="s3">
				<title>
					<p>Additional data file 3</p>
				</title>
				<caption>
					<p>The legends to the figures in Additional data files 1 and 2</p>
				</caption>
				<text>
					<p>The legends to the figures in Additional data files 1 and 2</p>
				</text>
				<file name="gb-2004-5-4-r27-s3.doc">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
			<suppl id="s4">
				<title>
					<p>Additional data file 4</p>
				</title>
				<caption>
					<p>A zip file containing the data from which the figures in the manuscript were made</p>
				</caption>
				<text>
					<p>A zip file containing the data from which the figures in the manuscript were made</p>
				</text>
				<file name="gb-2004-5-4-r27-s4.zip">
					<p>Click here for additional data file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>A.M. is the recipient of a 'Ram&#243;n y Cajal' research contract from the Spanish Ministry of Science and Technology (MCyT). Support from European Commission Project GEMINI (QLK3-CT-2002-02056) and MCYT project PM1999-0078 is also acknowledged. We thank Stuart Ingham for help with the graphics.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<aug>
					<au>
						<snm>Ohno</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Evolution by Gene Duplication</source>
				<publisher>New York: Springer</publisher>
				<pubdate>1970</pubdate>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Orthologs, paralogs and genome comparisons.</p>
				</title>
				<aug>
					<au>
						<snm>Gogarten</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Olendzenski</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Curr Opin Genet Dev</source>
				<pubdate>1999</pubdate>
				<volume>9</volume>
				<fpage>630</fpage>
				<lpage>636</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-437X(99)00029-5</pubid>
						<pubid idtype="pmpid" link="fulltext">10607614</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>The complete genome sequence of <it>Escherichia coli </it>K-12.</p>
				</title>
				<aug>
					<au>
						<snm>Blattner</snm>
						<fnm>FR</fnm>
					</au>
					<au>
						<snm>Plunkett</snm>
						<fnm>G</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Bloch</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Perna</snm>
						<fnm>NT</fnm>
					</au>
					<au>
						<snm>Burland</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Riley</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Collado-Vides</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Glasner</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Rode</snm>
						<fnm>CK</fnm>
					</au>
					<au>
						<snm>Mayhew</snm>
						<fnm>GF</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>277</volume>
				<fpage>1453</fpage>
				<lpage>1474</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.277.5331.1453</pubid>
						<pubid idtype="pmpid" link="fulltext">9278503</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Physiological genomics of <it>Escherichia coli </it>protein families.</p>
				</title>
				<aug>
					<au>
						<snm>Liang</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Labedan</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Riley</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Physiol Genomics</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<fpage>15</fpage>
				<lpage>26</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">11948287</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Duplication is more common among laterally transferred genes than among indigenous genes.</p>
				</title>
				<aug>
					<au>
						<snm>Hooper</snm>
						<fnm>SD</fnm>
					</au>
					<au>
						<snm>Berg</snm>
						<fnm>OG</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>R48</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1186/gb-2003-4-8-r48</pubid>
						<pubid idtype="pmpid" link="fulltext">12914657</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Protein families and TRIBES in genome sequence space.</p>
				</title>
				<aug>
					<au>
						<snm>Enright</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Kunin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>4632</fpage>
				<lpage>4638</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/gkg495</pubid>
						<pubid idtype="pmpid" link="fulltext">12888524</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Lineage-specific gene expansions in bacterial and archaeal genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Jordan</snm>
						<fnm>IK</fnm>
					</au>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Spouge</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>555</fpage>
				<lpage>565</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.GR-1660R</pubid>
						<pubid idtype="pmpid" link="fulltext">11282971</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>The evolutionary fate and cosequences of duplicate genes.</p>
				</title>
				<aug>
					<au>
						<snm>Lynch</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Conery</snm>
						<fnm>JS</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>290</volume>
				<fpage>1151</fpage>
				<lpage>1155</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.290.5494.1151</pubid>
						<pubid idtype="pmpid" link="fulltext">11073452</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>A whole-genome microarray reveals genetic diversity among <it>Helicobacter pylori </it>strains.</p>
				</title>
				<aug>
					<au>
						<snm>Salama</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Guillemin</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>McDaniel</snm>
						<fnm>TK</fnm>
					</au>
					<au>
						<snm>Sherlock</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Tompkins</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Falkow</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<fpage>14668</fpage>
				<lpage>14673</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.97.26.14668</pubid>
						<pubid idtype="pmpid" link="fulltext">11121067</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Extensive mosaic structure revealed by the complete genome sequence of uropathogenic <it>Escherichia coli</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Welch</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Burland</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Plunkett</snm>
						<fnm>G</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Redford</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Roesch</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Rasko</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Buckles</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Liou</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Boutin</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Hackett</snm>
						<fnm>J</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>17020</fpage>
				<lpage>17024</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.252529799</pubid>
						<pubid idtype="pmpid" link="fulltext">12471157</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Lateral gene transfer and the nature of bacterial innovation.</p>
				</title>
				<aug>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Groisman</snm>
						<fnm>EA</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>405</volume>
				<fpage>299</fpage>
				<lpage>304</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35012500</pubid>
						<pubid idtype="pmpid" link="fulltext">10830951</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Lateral gene transfer: when will adolescence end?</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Hendrickson</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>50</volume>
				<fpage>739</fpage>
				<lpage>749</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2958.2003.03778.x</pubid>
						<pubid idtype="pmpid" link="fulltext">14617137</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>On the nature of gene innovation: duplication patterns in microbial genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Hooper</snm>
						<fnm>SD</fnm>
					</au>
					<au>
						<snm>Berg</snm>
						<fnm>OG</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2003</pubdate>
				<volume>20</volume>
				<fpage>945</fpage>
				<lpage>54</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msg101</pubid>
						<pubid idtype="pmpid" link="fulltext">12716994</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Genome evolution. Gene fusion versus gene fission.</p>
				</title>
				<aug>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huynen</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2000</pubdate>
				<volume>16</volume>
				<fpage>9</fpage>
				<lpage>11</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(99)01924-1</pubid>
						<pubid idtype="pmpid" link="fulltext">10637623</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Genomes in flux: the evolution of archaeal and proteobacterial gene content.</p>
				</title>
				<aug>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Huynen</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>17</fpage>
				<lpage>25</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.176501</pubid>
						<pubid idtype="pmpid" link="fulltext">11779827</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>The balance of driving forces during genome evolution in prokaryotes.</p>
				</title>
				<aug>
					<au>
						<snm>Kunin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Ouzounis</snm>
						<fnm>CA</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>1589</fpage>
				<lpage>1594</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1101/gr.1092603</pubid>
						<pubid idtype="pmpid" link="fulltext">12840037</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Excision of large DNA regions termed pathogenicity islands from tRNA-specific loci in the chromosome of an <it>Escherichia coli </it>wild-type pathogen.</p>
				</title>
				<aug>
					<au>
						<snm>Blum</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Ott</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Lischewski</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Ritter</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Imrich</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Tsch&#228;pe</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Hacker</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Infect Immun</source>
				<pubdate>1994</pubdate>
				<volume>62</volume>
				<fpage>606</fpage>
				<lpage>614</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7507897</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Amelioration of bacterial genomes: rates of change and exchange.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1997</pubdate>
				<volume>44</volume>
				<fpage>383</fpage>
				<lpage>397</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9089078</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Complete genome sequence of the marine planctomycete <it>Pirellula </it>sp. strain 1.</p>
				</title>
				<aug>
					<au>
						<snm>Gl&#246;ckner</snm>
						<fnm>FO</fnm>
					</au>
					<au>
						<snm>Kube</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Bauer</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Teeling</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lombardot</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ludwig</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Gade</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Beck</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Borzym</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Heitmann</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<fpage>8298</fpage>
				<lpage>8303</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.1431443100</pubid>
						<pubid idtype="pmpid" link="fulltext">12835416</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The complete genomic sequence of <it>Mycoplasma penetrans</it>, an intracellular bacterial pathogen in humans.</p>
				</title>
				<aug>
					<au>
						<snm>Sasaki</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Ishikawa</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Yamashita</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Oshima</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kenri</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Furuya</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Yoshino</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Horino</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Shiba</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Sasaki</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hattori</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>5293</fpage>
				<lpage>5300</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/gkf667</pubid>
						<pubid idtype="pmpid" link="fulltext">12466555</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Genome analysis of pathogenic bacteria - a review.</p>
				</title>
				<aug>
					<au>
						<snm>Nakazawa</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Nippon Rinsho</source>
				<pubdate>2000</pubdate>
				<volume>58</volume>
				<fpage>1315</fpage>
				<lpage>1325</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10879059</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Reductive evolution of resident genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
					<au>
						<snm>Kurland</snm>
						<fnm>CG</fnm>
					</au>
				</aug>
				<source>Trends Microbiol</source>
				<pubdate>1998</pubdate>
				<volume>6</volume>
				<fpage>263</fpage>
				<lpage>268</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0966-842X(98)01312-2</pubid>
						<pubid idtype="pmpid" link="fulltext">9717214</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>The process of genome shrinkage in the obligate symbiont <it>Buchnera aphidicola</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Moran</snm>
						<fnm>NA</fnm>
					</au>
					<au>
						<snm>Mira</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>research0054.1</fpage>
				<lpage>0054.12</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">11790257</pubid>
						<pubid idtype="doi">10.1186/gb-2001-2-12-research0054</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Genome sequence of <it>Shigella flexneri </it>2a: insights into pathogenicity through comparison with genomes of <it>Escherichia coli </it>K12 and O157.</p>
				</title>
				<aug>
					<au>
						<snm>Jin</snm>
						<fnm>Q</fnm>
					</au>
					<au>
						<snm>Yuan</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Xu</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Shen</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Lu</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Liu</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>F</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>4432</fpage>
				<lpage>4441</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/gkf566</pubid>
						<pubid idtype="pmpid" link="fulltext">12384590</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Complete genome sequence and comparative genomics of <it>Shigella flexneri </it>serotype 2a strain 2457T.</p>
				</title>
				<aug>
					<au>
						<snm>Wei</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Goldberg</snm>
						<fnm>MB</fnm>
					</au>
					<au>
						<snm>Burland</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Venkatesan</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Deng</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Fournier</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Mayhew</snm>
						<fnm>GF</fnm>
					</au>
					<au>
						<snm>Plunkett</snm>
						<fnm>G</fnm>
						<suf>3rd</suf>
					</au>
					<au>
						<snm>Rose</snm>
						<fnm>DJ</fnm>
					</au>
					<au>
						<snm>Darling</snm>
						<fnm>A</fnm>
					</au>
					<etal/>
				</aug>
				<source>Infect Immun</source>
				<pubdate>2003</pubdate>
				<volume>71</volume>
				<fpage>2775</fpage>
				<lpage>2786</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1128/IAI.71.5.2775-2786.2003</pubid>
						<pubid idtype="pmpid" link="fulltext">12704152</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Complete genome sequence of a multiple drug resistant <it>Salmonella enterica </it>serovar Typhi CT18.</p>
				</title>
				<aug>
					<au>
						<snm>Parkhill</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Dougan</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>James</snm>
						<fnm>KD</fnm>
					</au>
					<au>
						<snm>Thomson</snm>
						<fnm>NR</fnm>
					</au>
					<au>
						<snm>Pickard</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Wain</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Churcher</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Bentley</snm>
						<fnm>SD</fnm>
					</au>
					<au>
						<snm>Holden</snm>
						<fnm>MT</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>413</volume>
				<fpage>848</fpage>
				<lpage>852</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35101607</pubid>
						<pubid idtype="pmpid" link="fulltext">11677608</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Massive gene decay in the leprosy bacillus.</p>
				</title>
				<aug>
					<au>
						<snm>Cole</snm>
						<fnm>ST</fnm>
					</au>
					<au>
						<snm>Eiglmeier</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Parkhill</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>James</snm>
						<fnm>KD</fnm>
					</au>
					<au>
						<snm>Thomson</snm>
						<fnm>NR</fnm>
					</au>
					<au>
						<snm>Wheeler</snm>
						<fnm>PR</fnm>
					</au>
					<au>
						<snm>Honore</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Garnier</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Churcher</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Harris</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>K</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>409</volume>
				<fpage>1007</fpage>
				<lpage>1011</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35059006</pubid>
						<pubid idtype="pmpid" link="fulltext">11234002</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>The PE multigene family: a 'molecular mantra' for mycobacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Brennan</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Delogu</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Trends Microbiol</source>
				<pubdate>2002</pubdate>
				<volume>10</volume>
				<fpage>246</fpage>
				<lpage>249</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0966-842X(02)02335-1</pubid>
						<pubid idtype="pmpid" link="fulltext">11973159</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Complete genome sequence of the <it>Arabidopsis </it>and tomato pathogen <it>Pseudomonas syringae </it>pv. tomato DC3000.</p>
				</title>
				<aug>
					<au>
						<snm>Buell</snm>
						<fnm>CR</fnm>
					</au>
					<au>
						<snm>Joardar</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Lindeberg</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Selengut</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Paulsen</snm>
						<fnm>IT</fnm>
					</au>
					<au>
						<snm>Gwinn</snm>
						<fnm>ML</fnm>
					</au>
					<au>
						<snm>Dodson</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Deboy</snm>
						<fnm>RT</fnm>
					</au>
					<au>
						<snm>Durkin</snm>
						<fnm>AS</fnm>
					</au>
					<au>
						<snm>Kolonay</snm>
						<fnm>JF</fnm>
					</au>
					<etal/>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<fpage>10181</fpage>
				<lpage>10186</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.1731982100</pubid>
						<pubid idtype="pmpid" link="fulltext">12928499</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Evolution in bacteria: evidence for a universal substitution rate in cellular genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Ochman</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Wilson</snm>
						<fnm>AC</fnm>
					</au>
				</aug>
				<source>J Mol Evol</source>
				<pubdate>1987</pubdate>
				<volume>26</volume>
				<fpage>74</fpage>
				<lpage>86</lpage>
				<note>Erratum in: J Mol Evol 26: 377</note>
				<xrefbib>
					<pubid idtype="pmpid">3125340</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>DNA sequence analysis of the genetic structure and evolution of <it>Salmonella enterica</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Selander</snm>
						<fnm>RK</fnm>
					</au>
				</aug>
				<source>In Ecology of Pathogenic Bacteria. Molecular and Evolutionary Aspects</source>
				<publisher>Amsterdam, The Netherlands: Royal Netherlands Academy of Arts and Sciences</publisher>
				<editor>van der Zeijst BAM, Hoekstra WPM, van Embden JDA, van Alphen AJW</editor>
				<pubdate>1997</pubdate>
				<fpage>191</fpage>
				<lpage>214</lpage>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Role of nonhost environments in the lifestyles of <it>Salmonella </it>and <it>Escherichia coli</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Winfield</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Groisman</snm>
						<fnm>EA</fnm>
					</au>
				</aug>
				<source>Appl Environ Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>69</volume>
				<fpage>3687</fpage>
				<lpage>3694</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1128/AEM.69.7.3687-3694.2003</pubid>
						<pubid idtype="pmpid" link="fulltext">12839733</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Requirement for genes with homology to ABC transport systems for attachment and virulence of <it>Agrobacterium tumefaciens</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Matthysse</snm>
						<fnm>AG</fnm>
					</au>
					<au>
						<snm>Yarnall</snm>
						<fnm>HA</fnm>
					</au>
					<au>
						<snm>Young</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1996</pubdate>
				<volume>178</volume>
				<fpage>5302</fpage>
				<lpage>5308</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8752352</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<aug>
					<au>
						<snm>Brown</snm>
						<fnm>SM</fnm>
					</au>
				</aug>
				<source>Bioinformatics: A Biologist's Guide to Biocomputing and the Internet</source>
				<publisher>Natick, MA: Eaton Publishing</publisher>
				<pubdate>2000</pubdate>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Genomics. Are there bugs in our genome?</p>
				</title>
				<aug>
					<au>
						<snm>Andersson</snm>
						<fnm>JO</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>WF</fnm>
					</au>
					<au>
						<snm>Nesbo</snm>
						<fnm>CL</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2001</pubdate>
				<volume>292</volume>
				<fpage>1848</fpage>
				<lpage>50</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1062241</pubid>
						<pubid idtype="pmpid" link="fulltext">11358998</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>The regulation of pap and type 1 fimbriation in <it>Escherichia coli</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Blomfield</snm>
						<fnm>IC</fnm>
					</au>
				</aug>
				<source>Adv Microb Physiol</source>
				<pubdate>2001</pubdate>
				<volume>45</volume>
				<fpage>1</fpage>
				<lpage>49</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11450107</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<title>
					<p>Structural assignments to the <it>Mycoplasma genitalium </it>proteins show extensive gene duplications and domain rearrangements.</p>
				</title>
				<aug>
					<au>
						<snm>Teichmann</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Park</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Chothia</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>14658</fpage>
				<lpage>14663</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1073/pnas.95.25.14658</pubid>
						<pubid idtype="pmpid" link="fulltext">9843945</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Catalyzing bacterial speciation: correlating lateral transfer with genetic headroom.</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>2001</pubdate>
				<volume>50</volume>
				<fpage>479</fpage>
				<lpage>496</lpage>
				<xrefbib>
					<pubid idtype="pmpid">12116648</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>NCBI genomes</p>
				</title>
				<pubdate>2001</pubdate>
				<url>ftp://ftp.ncbi.nih.gov/genomes/Bacteria</url>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>From gene trees to organismal phylogeny in prokaryotes: the case of the &#947;-proteobacteria.</p>
				</title>
				<aug>
					<au>
						<snm>Lerat</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Daubin</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Moran</snm>
						<fnm>NA</fnm>
					</au>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2003</pubdate>
				<volume>1</volume>
				<fpage>E19</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12975657</pubid>
						<pubid idtype="doi">10.1371/journal.pbio.0000019</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>The COG database: new developments in phylogenetic classification of proteins from complete genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Garkavtsev</snm>
						<fnm>IV</fnm>
					</au>
					<au>
						<snm>Tatusova</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Shankavaram</snm>
						<fnm>UT</fnm>
					</au>
					<au>
						<snm>Rao</snm>
						<fnm>BS</fnm>
					</au>
					<au>
						<snm>Kiryutin</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Galperin</snm>
						<fnm>MY</fnm>
					</au>
					<au>
						<snm>Fedorova</snm>
						<fnm>ND</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>22</fpage>
				<lpage>28</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/nar/29.1.22</pubid>
						<pubid idtype="pmpid" link="fulltext">11125040</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
