<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2148-5-42</ui>
   <ji>1471-2148</ji>
   <fm>
		<dochead>Research article</dochead>
		<bibl>
			<title>
				<p>Evolution of a microbial nitrilase gene family: a comparative and environmental genomics study</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Podar</snm>
					<fnm>Mircea</fnm>
					<insr iid="I1"/>
					<email>mpodar@diversa.com</email>
				</au>
				<au id="A2">
					<snm>Eads</snm>
					<mi>R</mi>
					<fnm>Jonathan</fnm>
					<insr iid="I1"/>
					<email>jeads@diversa.com</email>
				</au>
				<au id="A3">
					<snm>Richardson</snm>
					<mi>H</mi>
					<fnm>Toby</fnm>
					<insr iid="I1"/>
					<email>trichardson@diversa.com</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Diversa Corporation, 4955 Directors Place, San Diego, CA 92131 USA</p>
				</ins>
			</insg>
			<source>BMC Evolutionary Biology</source>
			<issn>1471-2148</issn>
			<pubdate>2005</pubdate>
			<volume>5</volume>
			<issue>1</issue>
			<fpage>42</fpage>
			<url>http://www.biomedcentral.com/1471-2148/5/42</url>
			<xrefbib>
				<pubidlist>
					<pubid idtype="pmpid">16083508</pubid>
					<pubid idtype="doi">10.1186/1471-2148-5-42</pubid>
				</pubidlist>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>05</day>
					<month>5</month>
					<year>2005</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>06</day>
					<month>8</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>06</day>
					<month>8</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Podar et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Completed genomes and environmental genomic sequences are bringing a significant contribution to understanding the evolution of gene families, microbial metabolism and community eco-physiology. Here, we used comparative genomics and phylogenetic analyses in conjunction with enzymatic data to probe the evolution and functions of a microbial nitrilase gene family. Nitrilases are relatively rare in bacterial genomes, their biological function being unclear.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>We examined the genetic neighborhood of the different subfamily genes and discovered conserved gene clusters or operons associated with specific nitrilase clades. The inferred evolutionary transitions that separate nitrilases which belong to different gene clusters correlated with changes in their enzymatic properties. We present evidence that Darwinian adaptation acted during one of those transitions and identified sites in the enzyme that may have been under positive selection.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>Changes in the observed biochemical properties of the nitrilases associated with the different gene clusters are consistent with a hypothesis that those enzymes have been recruited to a novel metabolic pathway following gene duplication and neofunctionalization. These results demonstrate the benefits of combining environmental genomic sampling and completed genomes data with evolutionary and biochemical analyses in the study of gene families. They also open new directions for studying the functions of nitrilases and the genes they are associated with.</p>
				</sec>
			</sec>
		</abs>
	</fm>
   <meta>
		<classifications>
			<classification type="bmc" subtype="user_supplied_xml" id="refman"/>
		</classifications>
	</meta>
   <bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Having colonized virtually every environment, bacteria and archaea have evolved enzymatic solutions for a wide range of metabolic biochemical transformations <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
				</abbrgrp>. Studying enzymes derived from organisms inhabiting these environments is important for understanding how microbes adapt, react to and transform the environment. The overwhelming majority of microbial species remain however uncultivated <abbrgrp>
					<abbr bid="B3">3</abbr>
				</abbrgrp>. A variety of functional and sequence-based approaches have been developed for discovering and characterizing genes, operons and even entire genomes directly from the environment, collectively referred to as metagenomics or environmental genomics <abbrgrp>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. The use of environmental genomics has already led to important discoveries such as genes responsible for novel biological functions <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>, microbial community metabolic traits <abbrgrp>
					<abbr bid="B6">6</abbr>
					<abbr bid="B7">7</abbr>
					<abbr bid="B8">8</abbr>
				</abbrgrp> and dramatic increases in the diversity of various enzyme families <abbrgrp>
					<abbr bid="B9">9</abbr>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. Subsequent biochemical and evolutionary analyses can strengthen the biological end ecological inferences even before organisms that carry that genetic information are isolated in culture <abbrgrp>
					<abbr bid="B11">11</abbr>
					<abbr bid="B12">12</abbr>
					<abbr bid="B13">13</abbr>
				</abbrgrp>. From a practical perspective, microbial environmental genomics has been a successful approach for the discovery of enzymes for a broad spectrum of biotechnological applications <abbrgrp>
					<abbr bid="B14">14</abbr>
					<abbr bid="B15">15</abbr>
					<abbr bid="B16">16</abbr>
					<abbr bid="B17">17</abbr>
				</abbrgrp>.</p>
			<p>To gain insight into the evolution of function in a gene family that has been extensively sampled by environmental genomic screening and characterized biochemically, we focused on bacterial nitrilases. These enzymes are members of the carbon-nitrogen hydrolase superfamily which catalyze the hydrolysis of a wide range of non-peptide carbon-nitrogen bonds <abbrgrp>
					<abbr bid="B18">18</abbr>
					<abbr bid="B19">19</abbr>
					<abbr bid="B20">20</abbr>
				</abbrgrp>. The nitrilase family hydrolyzes nitriles to their corresponding carboxylic acids, releasing ammonia. This reaction is likely involved in detoxification of xenobiotics and nitriles produced as defense chemicals by other microorganisms and plants, as well as in secondary metabolite biosynthetic pathways. Nitrilases appear to be rare in bacteria (out of over 150 sequenced bacterial genomes only 10 contain nitrilase genes). Recently, over 130 nitrilases were identified by functional screening of hundreds of environmental DNA libraries, for use in industrial biocatalysis applications <abbrgrp>
					<abbr bid="B9">9</abbr>
				</abbrgrp>. Those enzymes were characterized biochemically and classified into six subfamilies, four of them with no representatives in known bacterial species. It was found that a number of enzymatic properties (substrate specificity and enantioselectivity) were specific to subfamilies and, in some cases, correlated with the biogeography and ecology of the environmental samples.</p>
			<p>The role of gene duplication, natural selection and functional diversification in the evolution of the nitrilase gene family is unknown. The correlation of distinct enzymatic properties with the different genes subfamilies suggest that nitrilases have diverged functionally to accommodate distinct biological roles in microbial communities that occupy various ecological niches. Functional divergence is the result of changes in selection pressure and is often accompanied by associations with novel gene clusters or operons which encode for enzymes with coupled metabolic activities. To begin addressing some of these aspects, we analyzed the genetic neighborhoods of all available nitrilase genes, identified conserved patterns of conserved gene clustering relative to biochemical data and phylogeny and propose a hypothesis on nitrilase evolution involving gene duplications and Darwinian selection.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<sec>
				<st>
					<p>The nitrilases from cultivated bacteria belong to clade-specific gene clusters</p>
				</st>
				<p>Bacterial nitrilases (137 environmental sequences and 10 sequences from cultivated species) have been recently classified into six major clades <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp> that we refer to as subfamilies. We analyzed more recently released genome sequences and found an additional nine novel nitrilases. Phylogenetic analysis of a sequence dataset consisting of all nitrilase genes from cultivated bacteria shows that 18 sequences belong to subfamilies one and two (Fig. <figr fid="F1">1</figr>). The level of sequence similarity among these 18 enzymes is quite high, ranging from 50&#8211;70% pairwise identity in subfamily one to 30&#8211;40% in subfamily two. The relationships between the different nitrilases do not reflect the taxonomy of their host organisms. Additionally, for several genera or species that harbor two nitrilases (<it>Pseudomonas</it>, <it>Klebsiella pneumoniae </it>and <it>Burkholderia fungorum</it>), the genes belong to different subfamilies/clades, suggesting ancient gene duplications or acquisition by horizontal gene transfer (HGT). <it>Rhodococcus rhodochrous </it>on the other hand contains two closely related nitrilases, suggesting a more recent gene duplication event. Supporting the possibility of HGT, one of the nitrilase genes we identified by database mining is in the plasmid pLVPK of <it>Klebsiella pneumoniae</it>, which may be transferable to other bacteria. Also, several fungal cyanide hydratase genes form a clade deeply nested within subfamily two of bacterial nitrilases, suggesting HGT acquisition from bacteria, followed by neofunctionalization. The paucity of nitrilase genes in bacterial genomes makes it difficult to evaluate the contribution of the different evolutionary events (duplications, gene loss and HGT) to the observed distribution and the functional significance of the presence of different types of enzymes in related organisms.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Maximum likelihood tree of nitrilases from known bacterial species (accession numbers are in parentheses)</p>
					</caption>
					<text>
						<p>Maximum likelihood tree of nitrilases from known bacterial species (accession numbers are in parentheses). Bootstrap support values are indicated for the major groups only. The schematic organization of the gene clusters that contain a nitrilase ORF is shown for species where that sequence information is available.</p>
					</text>
					<graphic file="1471-2148-5-42-1"/>
				</fig>
				<p>In bacteria, genes are often organized in clusters (e.g. operons, regulons) that reflect involvement in a common metabolic process or association in a supramolecular complex <abbrgrp>
						<abbr bid="B21">21</abbr>
						<abbr bid="B22">22</abbr>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. To determine if nitrilase function could be inferred from the nature of the surrounding genes, we analyzed those genes in the available genomic data. We found that all of the known seven subfamily 1 nitrilase genes (six genomic and one on a plasmid) belong to a conserved and previously undescribed cluster of seven genes, Nit1C (Figure <figr fid="F1">1</figr> and Figure <figr fid="F2">2</figr>). Six of the coding sequences are on the same DNA strand, separated by few or no intergenic nucleotides and are likely part of an operon/regulon. This hypothesis is supported by analysis using a recent method for operon prediction <abbrgrp>
						<abbr bid="B24">24</abbr>
					</abbrgrp> although we could not identify conserved transcription factor binding sites in the upstream region. The genes in this predicted operon occur in the order (1) hypothetical protein, (2) nitrilase, (3) radical S-adenosyl methionine superfamily member, (4) acetyltransferase, (5) AIR synthase, and (6) hypothetical protein. The seventh gene encodes a predicted flavoprotein, putatively involved in K<sup>+ </sup>transport and is located either at the beginning of the cluster but on the opposite strand (cyanobacteria <it>Synechocystis sp</it>. PCC6803 and <it>Synechococcus sp</it>. WH8102) or as the last gene of the cluster, in the same orientation as the others (proteobacteria <it>Burkholderia fungorum</it>, <it>Rubrivivax</it>, <it>Photorhabdus luminescens </it>and <it>Klebsiella pneumoniae</it>). In <it>Verrucomicrobium spinosum</it>, the cluster has been rearranged, as ORFs 6 and 7 occur in between ORFs 3 and 4. Yet another variation exists in the betaproteobacteria <it>Burkholderia </it>and <it>Rubrivivax </it>where a glycosyltransferase gene is inserted between ORFs 5 and 6. These slight variations in the cluster architecture correlate to the major taxonomic bacterial groups (Cyanobacteria, Beta- and Gamma proteobacteria). Outside of Nit1C there is no conservation between the different species in terms of genes or metabolic functions encoded by gene clusters. The presence of genes associated with mobile DNA elements (transposases, IS elements) immediately downstream of the Nit1C clusters in <it>Synechocystis </it>and <it>Photorhabdus </it>and the apparent interruption of a large polyketide synthase pathway by the nitrilase cluster in <it>Photorhabdus </it>may indicate HGT or internal chromosomal rearrangements.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Organization of gene clusters around the subfamily 1 nitrilases in sequenced bacterial genomes</p>
					</caption>
					<text>
						<p>Organization of gene clusters around the subfamily 1 nitrilases in sequenced bacterial genomes. The highly conserved gene cluster Nit1C is flanked by unrelated genomic neighbourhoods in the different species. Gene names are based on the available genomic annotation.</p>
					</text>
					<graphic file="1471-2148-5-42-2"/>
				</fig>
				<p>In the case of subfamily 2, gene neighborhood information was available for only four of the twelve genes from cultivated bacteria. In <it>Bacillus sp</it>. and <it>Pseudomonas syringae</it>, the nitrilase gene is apparently co-transcribed with a downstream phenylacetaldoxime dehydratase gene and preceded by an araC transcription factor transcribed from the other strand. The other nitrilase genes (from <it>Burkholderia</it>, <it>Bradyrhizobium </it>and <it>Ralstonia</it>) are part of unrelated clusters (Figure <figr fid="F1">1</figr>).</p>
				<p>In addition to the nitrilases from completed genomes of cultivated bacteria, we searched for such enzymes in two large environmental sequence datasets: the acid-mine drainage microbial mats <abbrgrp>
						<abbr bid="B7">7</abbr>
					</abbrgrp> and the Sargasso Sea <abbrgrp>
						<abbr bid="B10">10</abbr>
					</abbrgrp> using BLASTP. No nitrilases were found in the acid-mine dataset. In the Sargasso Sea dataset we identified 17 nitrilases that were full-length or long enough to be phylogenetically informative. Three of the genes appear to be eukaryotic while eight bacterial genes are close relatives to nitrilases from <it>Synechoocccus </it>or <it>Burkholderia</it>. The remaining six genes do not appear to have close relatives among known nitrilases and belong to subfamilies 2, 4 and 5 [see <supplr sid="S1">Additional file 1</supplr>]. Finding so few nitrilase genes in such a large dataset suggests that for uncovering the sequence space of a gene family, functional screening of a large number of samples from very different environments is more efficient than deep sequence coverage of one or a few environments.</p>
				<suppl id="S1">
					<title>
						<p>Additional file 1</p>
					</title>
					<text>
						<p>Protein neighbor-joining tree for nitrilase genes from cultivated bacteria and from environmental samples. The environmental sequences are represented by GenBank accession numbers and gene names for those derived from Robertson <it>et al</it>, 2004. The Sargasso Sea sequences are shaded.</p>
					</text>
					<file name="1471-2148-5-42-S1.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Nitrilases associated with different types of gene clusters have distinct enzymatic properties</p>
				</st>
				<p>For the nitrilase genes identified from environmental DNA, the identity of the host organism is unknown. However, because those libraries were constructed using fragments of genomic DNA several times larger than the average nitrilase gene length (~1 kb), we also analyzed the the gene neighborhood of the environmental nitrilase. Because of the highly conserved nature of the Nit1C cluster and its occurrence in distant taxa of bacteria, we first focused on mapping its distribution among the environmental nitrilase clones. We found that the Nit1C cluster is strictly confined to a group of subfamily 1 nitrilases that includes the seven genes identified in completed genomes and 14 of the environmental ones. Four of the subfamily 1 nitrilases from the Sargasso Sea dataset had small flanking sequences and we identified the presence of the Nit1C type genes (ORFs 1 or 3), similar to those of their close relatives from <it>Synechococcus </it>and <it>Burkholderia</it>. However, because of their incomplete length, those sequences were not included in further analyses.</p>
				<p>The nitrilase genes that belong to the Nit1C cluster are indicated on a maximum likelihood phylogenetic tree calculated using the subfamily 1 genes as well as several outgroup sequences from subfamilies 2 and 3 (Figure <figr fid="F3">3A</figr>). Since the size of the genomic insert in the environmental clones was limited, not all the Nit1C genes were identified; however, we did not find evidence to suggest that the cluster was different in any of the host genomes (Figure <figr fid="F3">3B</figr>). We also identified a more recent evolutionary event that marks the loss of nitrilase association with the Nit1C cluster. After that transition event (TE), nitrilase genes are no longer associated with a highly conserved gene cluster. Instead, they are flanked by genes encoding MarR transcriptional regulators, epimerases, epoxide hydrolases and other ORFs. These latter genes were not so highly conserved in their order as those found in the Nit1C cluster. No cultivated bacteria that contain nitrilases from this group have been found so far.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>(<b>A</b>). Protein maximum likelihood tree of subfamily 1 nitrilases</p>
					</caption>
					<text>
						<p>(<b>A</b>). Protein maximum likelihood tree of subfamily 1 nitrilases. The tree was arbitrarily rooted with sequences from the two most closely related subfamilies 2 and 3. Numbers at nodes represent bootstrap support (not shown if &lt;50). (<b>B</b>). Diagram of the gene clusters that include the nitrilase ORF. For environmental genes, the information was limited by the size of the genomic insert. (<b>C</b>). Histogram representing enzymatic enantioselectivity (R or S) on hydroxyglutaronitrile, based on data from [9](na, not assayed; x, not active).</p>
					</text>
					<graphic file="1471-2148-5-42-3"/>
				</fig>
				<p>The sister group of subfamily 1 nitrilases, subfamily 3, consists of only three environmental type genes. We had sufficient flanking sequence to determine the nature of the neighboring genes for only one of the genes (3A1), flanked by two hypothetical ORFs with no identifiable homologs. Therefore, the Nit1C cluster appears to have originated with and is restricted to a subset of subfamily 1 nitrilases. The more distantly related nitrilases from subfamilies 4, 5 and 6 have no apparent associations with a conserved gene cluster (data not shown).</p>
				<p>In our previous study <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp> we uncovered a number of correlations between the biochemical properties of the environmental microbial nitrilases and their phylogenetic classification. Distinct gains or losses of activity or switches in enantioselectivity coincided with the evolutionary events that led to the formation of the main subfamilies. One of the most interesting findings was a reversal in enantioselectivity (R to S) that occurred in subfamily 1, against the model substrate hydroxyglutaronitrile. To correlate the differences in types of gene clusters with the nitrilase biochemical properties, we graphed the available hydroxyglutaronitrile activity data on the side of the phylogenetic tree (Figure <figr fid="F3">3C</figr>). With one exception (1B15), the enzymes that belong to the Nit1C group are R-enantioselective on hydroxyglutaronitrile. The transition event (TE) marks changes in biochemical properties leading to enantioselectivity reversal. The first enzyme not associated with Nit1C (1A21) was inactive on that substrate, while the next diverging ones (1A20, 1A22, 1A16, 1A17) were R-selective or not enantioselective (low bootstrap values do not support a robust branching order). However, the next statistically supported clade (1A14 and above in the Figure <figr fid="F3">3A</figr> tree) show a reversal of enantioselectivity followed by a steep increase in selectivity to values over 95%.</p>
			</sec>
			<sec>
				<st>
					<p>Analysis of the subfamily 1 nitrilase gene clusters</p>
				</st>
				<p>Having determined that subfamily 1 nitrilases belong to two distinct subgroups based on their associated gene clusters and enzymatic properties, we analyzed the nitrilase neighboring genes for clues to their individual metabolic roles. First in the Nit1C cluster, ORF1 proteins are highly conserved in length (160&#8211;163 amino acids) and sequence (>60% identity between any two genes). However, no other homologs were found using standard searching techniques of current databases. Using HMM structural homology modeling (Superfamily 1.63 server) <abbrgrp>
						<abbr bid="B25">25</abbr>
					</abbrgrp>, we tentatively assigned the hypothetical protein 1 to the YchN1-like superfamily and fold, whose biochemical activity is unknown. Next in the cluster is the nitrilase gene. The third gene encodes a member of the radical SAM superfamily (Pfam 04055), enzymes that catalyze a wide variety of radical-based reactions through reductive cleavage of S-adenosylmethionine at an iron-sulfur center <abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp>. The Nit1C SAM genes form a strongly supported clade (~50% average sequence identity), most closely related to bacterial and archaeal genes annotated as biotin synthase-related enzymes (COG2516) [see <supplr sid="S2">Additional file 2</supplr>]. ORF4 in the Nit1C cluster also forms a clade of closely related sequences and belong to the GCN5-related N-acetyltransferase (GNAT) superfamily (Pfam 00583) <abbrgrp>
						<abbr bid="B27">27</abbr>
					</abbrgrp>. These enzymes are involved in antibiotic detoxification as well as in histone acetylation in eukaryotes. The closest homologs to the Nit1C GNAT genes are a number of other acetylases from bacteria like <it>Rhodobacter </it>and <it>Enterococcus </it>[see <supplr sid="S2">Additional file 2</supplr>]. The fifth gene in the cluster encodes members of the large 5'-phosphorybosyl-5-aminoimidazole synthase-related proteins superfamily (AIRS, Pfam 00586). Enzymes in this superfamily are involved in de novo purine biosynthesis, selenophosphate synthesis, or maturation of NifE hydrogenase. These genes form a unique clade, most closely related to a group of archaeal genes encoding phosphoribosylformylglycinamide synthases [see <supplr sid="S2">Additional file 2</supplr>]. The last invariant position in the cluster, ORF6, encodes a protein of approximately 100 amino acids. While the sequence identity between the individual genes surpasses 70%, we could not find any other relatives to these genes by any sequence analysis approach. The seventh ORF of Nit1C is located at either end of the cluster, on either coding strand. This gene is a member of the pyridine nucleotide-disulphide oxidoreductases (Pfam 00070, COG2072), that include flavin-containing monooxygenases and flavoproteins involved in K<sup>+ </sup>transport. The closest relatives to the Nit1C genes are putative monooxygenases found in several species of <it>Pseudomonas </it>[see <supplr sid="S1">Additional file 2</supplr>]. All Nit1C genes form clusters of closely related sequences within their respective superfamilies, suggesting a common function, possibly in a pathway for detoxification of plant or microbial defense compounds.</p>
				<suppl id="S2">
					<title>
						<p>Additional file 2</p>
					</title>
					<text>
						<p>Maximum likelihood phylogenetic trees for genes that belong to the Nit1C clusters identified in known bacterial species, in the context of their respective protein families. Numbers represent bootstrap support (for major clades only). The Nit1C ORF sequences are shaded.</p>
					</text>
					<file name="1471-2148-5-42-S2.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>Members of the nitrilase clade that split after the transition event are exclusively of environmental origin, with no sequence representatives in characterized bacterial species. Approximately two thirds of the nitrilases in this group are associated with genes encoding a MarR transcriptional regulator, epimerases and epoxide hydrolases. MarR genes (PFam 01047) are transcriptional repressors controlling the expression of the Mar operon, involved in multiple antibiotic resistances <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp>. The nitrilase-associated MarR genes form a specific clade, most closely related to genes from <it>Xanthomonas </it>and <it>Desulfitobacterium </it>(30&#8211;40% identity) [see <supplr sid="S3">Additional file 3</supplr>] and are always upstream of the nitrilase gene. The location of the epimerase and epoxide hydrolase varies somewhat, the epimerase ORF being usually between the nitrilase and the epoxide hydrolase ORFs. Epimerases are a large class of enzymes that reversibly determine stereochemical inversions of hydroxyl substituents in carbohydrates, participating in numerous metabolic pathways <abbrgrp>
						<abbr bid="B29">29</abbr>
						<abbr bid="B30">30</abbr>
					</abbrgrp>. The nitrilase-associated epimerases form a unique clade in which the relationship between the genes parallels that of their associated nitrilases. Their closest relatives are epimerases from species of <it>Streptomyces </it>(~35% identity) [see <supplr sid="S3">Additional file 3</supplr>]. Epoxide hydrolases belong to the large superfamily of alpha-beta fold hydrolases and hydrate chemically reactive epoxides to more stable dihydrodiols. This reaction is of major importance in detoxification of a large number of endogenous epoxide metabolites and xenobiotic compounds in all organisms <abbrgrp>
						<abbr bid="B31">31</abbr>
					</abbrgrp>. The association of all these genes with nitrilases could indicate the requirement for coupled reactions under the transcriptional control of MarR, perhaps involved in detoxifying sugar-based cyanogenic compounds in soils rich in decaying plant material.</p>
				<suppl id="S3">
					<title>
						<p>Additional file 3</p>
					</title>
					<text>
						<p>Maximum likelihood phylogenetic trees for two genes associated with nitrilases after the subfamily 1 cluster transition event, in the context of their respective larger protein families. The nitrilase associated genes are shaded. Numbers represent bootstrap support (for major clades only).</p>
					</text>
					<file name="1471-2148-5-42-S3.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Positive selection as a possible driving force for nitrilase functional diversification</p>
				</st>
				<p>The observed changes in associated gene clusters and in enzymatic properties suggest that the hypothetical gene duplication in subfamily 1 was followed by nitrilase recruitment to novel metabolic functions, possibly under selective constraints. A powerful approach to studying changes in the selective pressure in protein encoding genes involves calculation of the nonsynonymous/synonymous substitution rate ratio (&#969; = dN/dS) (reviewed in <abbrgrp>
						<abbr bid="B32">32</abbr>
						<abbr bid="B33">33</abbr>
					</abbrgrp>). A ratio below one indicates negative (purifying) selection, restricting amino acid changes that could interfere with a well-established protein function, while &#969; = 1 suggests that the gene evolves neutrally. On the other hand, a ratio significantly higher than one may indicate a selective advantage for fixation of amino acid changes. This can be considered evidence of positive selection associated with functional divergence after events such as gene duplications or changes in the environment (e.g. <abbrgrp>
						<abbr bid="B34">34</abbr>
						<abbr bid="B35">35</abbr>
					</abbrgrp>).</p>
				<p>Using a relative rate test <abbrgrp>
						<abbr bid="B36">36</abbr>
					</abbrgrp>, we first investigated the rate variation between the branches flanking the transition event (1A23/1A25 and 1A21). A likelihood ratio test based on a three-taxon tree (consisting of 1A25 and 1A21 as test sequences and 1A29 as outgroup) compared the null hypothesis (equal rates for both branches following the transition event) with an alternative model with unconstrained rates. The null model was rejected (P = 2 &#215; 10<sup>-6</sup>, df = 1), supporting a 5.6 times faster overall rate for the 1A21 lineage than for 1A25, which has maintained the Nit1C association. A rate increase is predicted when gene duplication is followed by functional divergence and could occur because of positive Darwinian selection or an increase in fixation of neutral mutations as result of relaxation of functional constraints <abbrgrp>
						<abbr bid="B37">37</abbr>
						<abbr bid="B38">38</abbr>
						<abbr bid="B39">39</abbr>
						<abbr bid="B40">40</abbr>
					</abbrgrp>.</p>
				<p>To test if positive selection acted along the nitrilase lineages flanking the cluster transition event, we used a maximum likelihood (ML) approach based on codon substitution models <abbrgrp>
						<abbr bid="B34">34</abbr>
					</abbrgrp>. These models take into account sequence features such as transition-transversion rate biases, codon usage variation and allow testing hypotheses at specific branches in a phylogeny by employing heterogeneous &#969; values among sites and lineages. Positive selection can also be investigated using a parsimony-based method, there being some controversy on to which of the two methods is more reliable <abbrgrp>
						<abbr bid="B41">41</abbr>
						<abbr bid="B42">42</abbr>
						<abbr bid="B43">43</abbr>
					</abbrgrp>.</p>
				<p>The tree used for &#969; estimation was obtained based on the nitrilase DNA sequences, focusing on the genes around the transition event (Figure 6A). The first set of likelihood models that we used, site-specific <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp>, assume variations in the selective pressure across sites but no variations among individual genes. Using these models we determined that purifying selection has a dominant role across subfamily 1 nitrilases (&#969; = 0.04) (Table <tblr tid="T1">1</tblr>). This is reflected in the large number of conserved amino acids: 86 invariant (~25% of sites) and 149 conserved at 90% level in this data set. No significant positive selection signal was identified using this category of models. However, since these models average the substitution ratios of individual sites over all lineages, they are known to lack sensitivity in detecting positive selection that acts only along a few lineages (e.g. <abbrgrp>
						<abbr bid="B44">44</abbr>
						<abbr bid="B45">45</abbr>
					</abbrgrp>.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Parameter estimates, likelihood scores and identified selected sites under various models. Branch numbers refer to Figure 4A. Parameters indicating positive selection are in bold. A likelihood ratio test (LRT) is used to compare a pair of nested models: one which accounts for sites with &#969; > 1 and one which does not (the null model). To accept or reject the &#969; > 1 hypothesis, twice the log-likelihood difference in the scores is compared with a &#967;<sup>2 </sup>distribution with the degrees of freedom equal to the difference in the numbers of parameters between the two models. When ML detects lineages with &#969; > 1, an empirical Bayes analysis identifies sites under positive selection and calculate posterior probabilities that provide a measure of confidence for that prediction.</p>
					</caption>
					<tblbdy cols="6">
						<r>
							<c ca="left">
								<p><b>Model</b></p>
							</c>
							<c ca="center">
								<p><b>p</b></p>
							</c>
							<c ca="center">
								<p><b>l</b></p>
							</c>
							<c ca="left">
								<p><b>Parameter estimates</b></p>
							</c>
							<c ca="left">
								<p><b>Positively selected sites</b></p>
							</c>
							<c ca="left">
								<p><b>Likelihood Ratio Test</b></p>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>M0:one ratio</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>-11903.5</p>
							</c>
							<c ca="left">
								<p>&#969; = 0.0418</p>
							</c>
							<c ca="left">
								<p>none</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Site-specific models</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="1" ca="left">
								<p>M1:neutral (K = 2)</p>
							</c>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>-13195.5</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.298, p<sub>1 </sub>= 0.702</p>
							</c>
							<c ca="left">
								<p>not allowed</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="1" ca="left">
								<p>M3:discrete (K = 2)</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>-11627.6</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.6, p<sub>1 </sub>= 0.4, &#969;<sub>0 </sub>= 0.012, &#969;<sub>1 </sub>= 0.098</p>
							</c>
							<c ca="left">
								<p>none</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c cspan="6">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Branch-site models</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="1" ca="left">
								<p>Branch 1</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="2" ca="left">
								<p>Model A</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>-13160.0</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.3, p<sub>1 </sub>= 0.70, p<sub>2</sub>+p<sub>3 </sub>= 0, &#969;<sub>2 </sub>= 0</p>
							</c>
							<c ca="left">
								<p>none</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="2" ca="left">
								<p>Model B</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>-11627.6</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.4, p<sub>1 </sub>= 0.6, p<sub>2</sub>+p<sub>3 </sub>= 0</p>
								<p>&#969;<sub>0 </sub>= 0.098, &#969;<sub>1 </sub>= 0.012, &#969;<sub>2 </sub>= 0</p>
							</c>
							<c ca="left">
								<p>none</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="1" ca="left">
								<p>Branch 2</p>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c indent="2" ca="left">
								<p>Model A</p>
							</c>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>-13188.7</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.296, p<sub>1 </sub>= 0.688, <b>p<sub>2</sub>+p<sub>3 </sub>= 0.016, &#969;<sub>2 </sub>= 129.6</b></p>
							</c>
							<c ca="left">
								<p>Q157 (P = 0.77), Q203 (P = 0.999), T41, Q157, Y184, N200, Q203, R284 (P > 0.9)</p>
							</c>
							<c ca="left">
								<p>LRT vs. M1 2&#916;l = 6.8, P = 0.03, df = 2</p>
							</c>
						</r>
						<r>
							<c indent="2" ca="left">
								<p>Model B</p>
							</c>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>-11621.4</p>
							</c>
							<c ca="left">
								<p>p<sub>0 </sub>= 0.356, p<sub>1 </sub>= 0.59, <b>p<sub>2</sub>+p<sub>3 </sub>= 0.05</b></p>
								<p>&#969;<sub>0 </sub>= 0.1, &#969;<sub>1 </sub>= 0.0125, <b>&#969;<sub>2 </sub>= 9.7</b></p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>LRT vs. M3 (K = 2) 2&#916;l = 6.2, P = 0.04, df = 2</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>To investigate if adaptive evolution acted alongside branches around the transition event, we also used a more recently developed set of maximum likelihood models, which allow the &#969; ratio to vary among both sites and lineages <abbrgrp>
						<abbr bid="B46">46</abbr>
					</abbrgrp>. These models are more sensitive in detecting positively selected sites along a pre-specified lineage of interest ("foreground" branch) as compared to the rest of the genes ("background" branches). These models were applied to the two lineages that followed the transition event (branches 1 and 2 in Figure <figr fid="F4">4A</figr>). For branch 1, which belongs to the Nit1C nitrilases and served as a negative control, we did not detect any positive selection signal. Branch 2 represents the basal lineage for the group of nitrilase genes that have lost the Nit1C cluster association, potentially having led to nitrilase neofunctionalization. A significant positive selection pressure (&#969; = 9.7 under model B) was detected for that lineage, the empirical Bayes analysis pointing to residues T41, Q157, Y184, N200, Q203 and R284 as being the selection target. These amino acid positions may represent hot spots for changes in substrate specificity or other nitrilase enzymatic properties. The variation of those aminoacids across the subfamily is shown in Figure <figr fid="F4">4</figr>. Shown also is a site (residue 39) that is invariant before the transition event then changes with that event and becomes again invariant.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>(<b>A</b>) Maximum likelihood tree for subfamily 1 nitrilases used to test for positive selection</p>
					</caption>
					<text>
						<p>(<b>A</b>) Maximum likelihood tree for subfamily 1 nitrilases used to test for positive selection. Branch lengths are scaled to the mean number of substitutions per codon site under model M3. Branches 1 and 2 indicate lineages tested for positive selection signal, following the transition event. The sequences illustrate the variability across the clade at positions identified under positive selection. (<b>B</b>). A three dimensional model of the 1A21 nitrilase dimer. Shown are the catalytic triad (blue) and the residues under positive selection (red). Residue 39, invariant before and after the transition event, is shown in green.</p>
					</text>
					<graphic file="1471-2148-5-42-4"/>
				</fig>
				<p>High resolution structures are not yet available for nitrilases. However, the structures of two homologs, the <it>C. elegans </it>NitFhit protein and the <it>Agrobacterium radiobacter </it>N-carbamoyl-D-amino acid amidohydrolase (D-NCAase) have been solved <abbrgrp>
						<abbr bid="B47">47</abbr>
						<abbr bid="B48">48</abbr>
					</abbrgrp>. Both proteins form tetramers with two dimer subunits and revealed a novel four layer &#945;-&#946;-&#946;-&#945; fold. It is believed that all members of the nitrilase superfamily share this fold and the catalytic triad Glu-Lys-Cys in the active site. A three dimensional model of 1A21 (the first nitrilase outside the Nit1C group) was derived based on the D-NCAase structure coordinates, and used to map the location of the residues under positive selection at the CTE. Three of those, T41, Q157 and Y184, were found to be buried within the protein, close to the catalytic triad (E44, K126, C160) (Figure <figr fid="F4">4B</figr>). Those residues could be involved in the overall conformation of the active site or may have a direct role in the reaction by interacting with the substrate. The other three positively selected sites, N200, Q203 and R284 cluster on the surface interface between the molecules of the dimer. That interface has been shown in D-NCAase to form a hydrophobic pocket that is responsible for the tight dimer structure. It is known that the quaternary structures of nitrilases and cyanide hydratases can be quite different, ranging in size from monomers and dimers to oligomers containing 10, 14 or more subunits. Substrate binding has also been shown to play a role in the formation of active enzyme oligomers. The three interface residues may play a role in aspects of quaternary structure and substrate specificity associated with the proposed neofunctionalization after the cluster transition event.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>In this study, we combined genomic and biochemical analysis of a microbial enzyme family to understand evolutionary events that have shaped the genome organization and metabolism of organisms inhabiting various environments. It has long been known that bacterial genes often cluster based on linked functions. The gene location sometimes correlates with the order of the individual reactions in an enzymatic cascade or facilitate regulatory mechanisms of gene expression. Various models have been proposed to explain the formation, the evolutionary and physiological significance of operons and other gene clusters <abbrgrp>
					<abbr bid="B23">23</abbr>
				</abbrgrp>. Comparative genomic studies have shown that recognition of clusters can assist in functional annotation of novel genes but clusters often they break apart with increasing taxonomic distance <abbrgrp>
					<abbr bid="B49">49</abbr>
					<abbr bid="B50">50</abbr>
					<abbr bid="B51">51</abbr>
					<abbr bid="B52">52</abbr>
					<abbr bid="B53">53</abbr>
				</abbrgrp>. The Nit1C cluster that we described is remarkable in that it is highly conserved across several bacterial phyla and is present in organisms that inhabit extremely diverse environments. While limited rearrangements have occurred in Nit1C, the preservation of all seven genes suggests there is selective pressure for maintenance of the entire gene cluster regardless of the genomic dynamics in that neighborhood. The internal rearrangements of Nit1C correlate with high level taxa (cyanobacteria, beta and gamma proteobacteria).</p>
			<p>There is no experimental evidence for an involvement of any of the Nit1C genes in a known metabolic transformation. Two of the cluster genes have no close homologs or predictable biochemical activities while the remaining genes, even though have a predictable type of biochemical activity, belong to classes of enzymes that are involved in a wide range of transformations. Predicting function for remote homologs in the absence of experimental data is still a major difficulty in genomics <abbrgrp>
					<abbr bid="B54">54</abbr>
					<abbr bid="B55">55</abbr>
				</abbrgrp>. Having a defined cluster of genes such as Nit1C, likely to be functionally connected, sets the ground for future experimental genetic and biochemical investigation in search of its biological function.</p>
			<p>Phylogenetically, the nitrilases from the Nit1C cluster appear strictly confined to a basal subset of subfamily 1 genes. More recent diversification of the genes in this subfamily has been accompanied by a change in the type of associated gene clusters and is paralleled by changes in biochemical properties of the nitrilases. While overall, subfamily 1 nitrilases are under strong purifying selection pressure, we detected a significant positive selection signal for the lineage following the transition event and identified several residues under such selection. This supports a hypothesis that a group of nitrilases diverged functionally from the Nit1C-type enzymes, became associated with other metabolic enzymes possibly as part of a novel pathway and advantageous mutations were fixed at specific sites under positive selection. Future studies of bacterial nitrilases and biochemical and genetic characterization of mutations at these residues are needed to better understand the determinants of substrate specificity and the functional differences between the nitrilase subfamilies.</p>
			<p>Environmental microbial genomics has demonstrated its utility in studying large scale ecological processes <abbrgrp>
					<abbr bid="B5">5</abbr>
					<abbr bid="B6">6</abbr>
					<abbr bid="B11">11</abbr>
				</abbrgrp>, discovering valuable biocatalysts <abbrgrp>
					<abbr bid="B15">15</abbr>
				</abbrgrp> and reassembling the genomic and metabolic blueprint of natural microbial communities thorough shotgun sequencing <abbrgrp>
					<abbr bid="B7">7</abbr>
					<abbr bid="B8">8</abbr>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. Vast amounts of sequence data could potentially be used to answer a wide range of questions, although there are open questions regarding experimental design, data analysis and breadth of biological significance <abbrgrp>
					<abbr bid="B4">4</abbr>
					<abbr bid="B56">56</abbr>
					<abbr bid="B57">57</abbr>
				</abbrgrp>. A broad environmental sampling from worldwide geographical locations coupled with experimental biochemical validation and comparative genomic analysis allowed us to test metabolic and evolutionary hypotheses difficult to approach by using sequence data from only a few environments.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>DNA sequences</p>
				</st>
				<p>The nitrilase sequences discovered from environmental DNA libraries are available from Genbank (<ext-link ext-link-type="gen" ext-link-id="AY487426">AY487426</ext-link>-<ext-link ext-link-type="gen" ext-link-id="AY487562">AY487562</ext-link>). Nitrilase sequences from sequenced bacterial genomes and their corresponding flanking genes were also obtained from GenBank, their names and accession numbers being indicated in the corresponding figures. For <it>Verrucomicrobium spinosum </it>DSM 4136, preliminary sequence data was obtained from the The Institute for Genome Research website <abbrgrp>
						<abbr bid="B58">58</abbr>
					</abbrgrp> and for <it>Burkholderia fungorum </it>and <it>Rubrivivax gelatinosus </it>from the DOE Joint Genome Institute website <abbrgrp>
						<abbr bid="B59">59</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Enzymatic activity</p>
				</st>
				<p>The biochemical characterization data used in this study for the environmental nitrilases tested on the non physiological substrate hydroxyglutaronitrile has been published <abbrgrp>
						<abbr bid="B9">9</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Sequence analysis and annotation</p>
				</st>
				<p>For the analysis of the ORFs flanking the nitrilase genes in known bacterial genomes we used the sequence coordinates available in the corresponding GenBank files. For the environmental DNA clones containing nitrilase genes we identified and annotated the other open reading frames (ORFs) contiguous with the nitrilase in the genomic insert using standard approaches. The inserts varied in size from 1 to 7 kb and in most cases contained information to identify at least one or more ORFs in addition to the nitrilase gene. Annotation was derived based on available experimental or predicted function or biochemical activity using information associated with those genes in GenBank, PFAM, COG and KEGG databases.</p>
			</sec>
			<sec>
				<st>
					<p>Phylogenetic reconstructions</p>
				</st>
				<p>Amino acid sequences were aligned in BioEdit <abbrgrp>
						<abbr bid="B60">60</abbr>
					</abbrgrp> followed by manual refinement. Sequence alignments are provided [see Additional files <supplr sid="S4">4</supplr>, <supplr sid="S5">5</supplr>]. Phylogenetic trees were constructed in PROML (PHYLIP 3.6) <abbrgrp>
						<abbr bid="B61">61</abbr>
					</abbrgrp> using maximum likelihood, JTT amino acid substitution matrix, five global rearrangements with randomized sequence input order and among-site rate variation modeled with an eight rate category discrete approximation to a gamma distribution. The model parameters were estimated using TREE-PUZZLE 5.1. <abbrgrp>
						<abbr bid="B62">62</abbr>
					</abbrgrp>. Branch support was obtained by bootstrapping (100 replicates).</p>
				<suppl id="S4">
					<title>
						<p>Additional files 4</p>
					</title>
					<text>
						<p>Alignment of nitrilase amino acid sequences from cultivated bacteria (used to generate the tree in Figure <figr fid="F1">1</figr>)</p>
					</text>
					<file name="1471-2148-5-42-S4.txt">
						<p>Click here for file</p>
					</file>
				</suppl>
				<suppl id="S5">
					<title>
						<p>Additional files 5</p>
					</title>
					<text>
						<p>Alignment of nitrilase amino acid sequences used to generate the tree in Figure <figr fid="F3">3</figr>.</p>
					</text>
					<file name="1471-2148-5-42-S5.txt">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Analysis for positive selection</p>
				</st>
				<p>A DNA sequence alignment for the nitrilase genes was obtained based on the protein alignment and used for phylogenetic reconstructions in PAUP* 4.0 <abbrgrp>
						<abbr bid="B63">63</abbr>
					</abbrgrp> using maximum likelihood and is provided [see <supplr sid="S6">Additional file 6</supplr>]. The model of sequence evolution (GTR+I+G) was selected using Modeltest v.3.06 <abbrgrp>
						<abbr bid="B64">64</abbr>
					</abbrgrp>. To test specific branches for possible rate changes we used Hy-Phy <abbrgrp>
						<abbr bid="B36">36</abbr>
					</abbrgrp>. The topologies for the DNA tree and the protein tree were identical.</p>
				<suppl id="S6">
					<title>
						<p>Additional file 6</p>
					</title>
					<text>
						<p>Alignment of DNA sequences of nitrilase genes used to test for positive selection and to generate the tree in Figure <figr fid="F4">4</figr>.</p>
					</text>
					<file name="1471-2148-5-42-S6.txt">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>The tree topology was used in the program codeml (PAML <abbrgrp>
						<abbr bid="B65">65</abbr>
					</abbrgrp>, to estimate dN/dS ratios based on maximum likelihood codon substitution models. Two categories of models were used, site specific <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp> as well as branch-site models <abbrgrp>
						<abbr bid="B46">46</abbr>
					</abbrgrp>. Statistical comparisons between the results from different nested models were done using likelihood ratio tests <abbrgrp>
						<abbr bid="B66">66</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Molecular modeling</p>
				</st>
				<p>A three-dimensional model for a clade 1 nitrilase (1A21) was obtained based on the structure of the homologous protein N-carbamoyl-D-amino acid amidohydrolase <abbrgrp>
						<abbr bid="B48">48</abbr>
					</abbrgrp>, using the Jackal software <abbrgrp>
						<abbr bid="B67">67</abbr>
					</abbrgrp>. Analysis of the model and mapping of amino acid residues involved in catalysis or subject to positive selection was done in PyMol [68].</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>MP participated in the design of the study, performed phylogenetic, comparative genomic and statistical analyses and drafted the manuscript. JE performed sequence analysis and functional annotation. TR participated in the design of the study, performed comparative genomic and gene function analyses. All authors contributed to the writing and approved the final manuscript.</p>
		</sec>
	</bdy>
   <bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Jay Short and Michiel Noordewier for their support and guidance, the Diversa Research and Development team, especially, Dan Robertson, Jenny Chaplin and Grace Desantis for leading the nitrilase discovery and characterization projects, David Lomelin and Cosmin Deciu for bioinformatics analysis support and Mark Wall for the three dimensional model of the nitrilase. Special thanks also to Melvin Simon and Phil Hugenholtz for stimulating discussions and suggestions.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>A molecular view of microbial diversity and the biosphere</p>
				</title>
				<aug>
					<au>
						<snm>Pace</snm>
						<fnm>NR</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>276</volume>
				<fpage>734</fpage>
				<lpage>740</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.276.5313.734</pubid>
						<pubid idtype="pmpid" link="fulltext">9115194</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>The uncultured microbial majority</p>
				</title>
				<aug>
					<au>
						<snm>Rappe</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Giovannoni</snm>
						<fnm>SJ</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>57</volume>
				<fpage>369</fpage>
				<lpage>394</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.micro.57.030502.090759</pubid>
						<pubid idtype="pmpid" link="fulltext">14527284</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Tapping into microbial diversity</p>
				</title>
				<aug>
					<au>
						<snm>Keller</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Zengler</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Nat Rev Microbiol</source>
				<pubdate>2004</pubdate>
				<volume>2</volume>
				<fpage>141</fpage>
				<lpage>150</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrmicro819</pubid>
						<pubid idtype="pmpid">15040261</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Metagenomics: application of genomics to uncultured microorganisms</p>
				</title>
				<aug>
					<au>
						<snm>Handelsman</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Microbiol Mol Biol Rev</source>
				<pubdate>2004</pubdate>
				<volume>68</volume>
				<fpage>669</fpage>
				<lpage>685</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">539003</pubid>
						<pubid idtype="pmpid" link="fulltext">15590779</pubid>
						<pubid idtype="doi">10.1128/MMBR.68.4.669-685.2004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Bacterial rhodopsin: evidence for a new type of phototrophy in the sea</p>
				</title>
				<aug>
					<au>
						<snm>Beja</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Suzuki</snm>
						<fnm>MT</fnm>
					</au>
					<au>
						<snm>Hadd</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Nguyen</snm>
						<fnm>LP</fnm>
					</au>
					<au>
						<snm>Jovanovich</snm>
						<fnm>SB</fnm>
					</au>
					<au>
						<snm>Gates</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Feldman</snm>
						<fnm>RA</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>EN</fnm>
					</au>
					<au>
						<snm>DeLong</snm>
						<fnm>EF</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>289</volume>
				<fpage>1902</fpage>
				<lpage>1906</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.289.5486.1902</pubid>
						<pubid idtype="pmpid" link="fulltext">10988064</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Reverse methanogenesis: testing the hypothesis with environmental genomics</p>
				</title>
				<aug>
					<au>
						<snm>Hallam</snm>
						<fnm>SJ</fnm>
					</au>
					<au>
						<snm>Putnam</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Preston</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Detter</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Rokhsar</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Richardson</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>DeLong</snm>
						<fnm>EF</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2004</pubdate>
				<volume>305</volume>
				<fpage>1457</fpage>
				<lpage>1462</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1100025</pubid>
						<pubid idtype="pmpid" link="fulltext">15353801</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Community structure and metabolism through reconstruction of microbial genomes from the environment</p>
				</title>
				<aug>
					<au>
						<snm>Tyson</snm>
						<fnm>GW</fnm>
					</au>
					<au>
						<snm>Chapman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Hugenholtz</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Allen</snm>
						<fnm>EE</fnm>
					</au>
					<au>
						<snm>Ram</snm>
						<fnm>RJ</fnm>
					</au>
					<au>
						<snm>Richardson</snm>
						<fnm>PM</fnm>
					</au>
					<au>
						<snm>Solovyev</snm>
						<fnm>VV</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Rokhsar</snm>
						<fnm>DS</fnm>
					</au>
					<au>
						<snm>Banfield</snm>
						<fnm>JF</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2004</pubdate>
				<volume>428</volume>
				<fpage>37</fpage>
				<lpage>43</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nature02340</pubid>
						<pubid idtype="pmpid" link="fulltext">14961025</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Comparative metagenomics of microbial communities
1</p>
				</title>
				<aug>
					<au>
						<snm>Tringe</snm>
						<fnm>SG</fnm>
					</au>
					<au>
						<snm>von Mering</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Kobayashi</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Salamov</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Chang</snm>
						<fnm>HW</fnm>
					</au>
					<au>
						<snm>Podar</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Short</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Mathur</snm>
						<fnm>EJ</fnm>
					</au>
					<au>
						<snm>Detter</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Hugenholtz</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2005</pubdate>
				<volume>308</volume>
				<fpage>554</fpage>
				<lpage>557</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1107851</pubid>
						<pubid idtype="pmpid" link="fulltext">15845853</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Exploring nitrilase sequence space for enantioselective catalysis</p>
				</title>
				<aug>
					<au>
						<snm>Robertson</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Chaplin</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>DeSantis</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Podar</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Chi</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Richardson</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Milan</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Weiner</snm>
						<fnm>DP</fnm>
					</au>
					<au>
						<snm>Wong</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>McQuaid</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Farwell</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Preston</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Tan</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Snead</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Keller</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Mathur</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Kretz</snm>
						<fnm>PL</fnm>
					</au>
					<au>
						<snm>Burk</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Short</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Appl Environ Microbiol</source>
				<pubdate>2004</pubdate>
				<volume>70</volume>
				<fpage>2429</fpage>
				<lpage>2436</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">383143</pubid>
						<pubid idtype="pmpid" link="fulltext">15066841</pubid>
						<pubid idtype="doi">10.1128/AEM.70.4.2429-2436.2004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Environmental genome shotgun sequencing of the Sargasso Sea</p>
				</title>
				<aug>
					<au>
						<snm>Venter</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>Remington</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Heidelberg</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Halpern</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Rusch</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>JA</fnm>
					</au>
					<au>
						<snm>Wu</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Paulsen</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>KE</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Fouts</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Levy</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Knap</snm>
						<fnm>AH</fnm>
					</au>
					<au>
						<snm>Lomas</snm>
						<fnm>MW</fnm>
					</au>
					<au>
						<snm>Nealson</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Hoffman</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Parsons</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Baden-Tillson</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Pfannkoch</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Rogers</snm>
						<fnm>YH</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>HO</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2004</pubdate>
				<volume>304</volume>
				<fpage>66</fpage>
				<lpage>74</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1093857</pubid>
						<pubid idtype="pmpid" link="fulltext">15001713</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Proteorhodopsin phototrophy in the ocean</p>
				</title>
				<aug>
					<au>
						<snm>Beja</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>EN</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Leclerc</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>DeLong</snm>
						<fnm>EF</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2001</pubdate>
				<volume>411</volume>
				<fpage>786</fpage>
				<lpage>789</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35081051</pubid>
						<pubid idtype="pmpid" link="fulltext">11459054</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Darwinian adaptation of proteorhodopsin to different light intensities in the marine environment</p>
				</title>
				<aug>
					<au>
						<snm>Bielawski</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Dunn</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Sabehi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Beja</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>2004</pubdate>
				<volume>101</volume>
				<fpage>14824</fpage>
				<lpage>14829</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">522022</pubid>
						<pubid idtype="pmpid" link="fulltext">15466697</pubid>
						<pubid idtype="doi">10.1073/pnas.0403999101</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Diversification and spectral tuning in marine proteorhodopsins</p>
				</title>
				<aug>
					<au>
						<snm>Man</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Wang</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Sabehi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Aravind</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Post</snm>
						<fnm>AF</fnm>
					</au>
					<au>
						<snm>Massana</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>EN</fnm>
					</au>
					<au>
						<snm>Spudich</snm>
						<fnm>JL</fnm>
					</au>
					<au>
						<snm>Beja</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>EMBO J</source>
				<pubdate>2003</pubdate>
				<volume>22</volume>
				<fpage>1725</fpage>
				<lpage>1731</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">154475</pubid>
						<pubid idtype="pmpid" link="fulltext">12682005</pubid>
						<pubid idtype="doi">10.1093/emboj/cdg183</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Metagenomics and industrial applications
1</p>
				</title>
				<aug>
					<au>
						<snm>Lorenz</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Eck</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Nat Rev Microbiol</source>
				<pubdate>2005</pubdate>
				<volume>3</volume>
				<fpage>510</fpage>
				<lpage>516</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nrmicro1161</pubid>
						<pubid idtype="pmpid" link="fulltext">15931168</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>The discovery of new biocatalysts from microbial diversity</p>
				</title>
				<aug>
					<au>
						<snm>Robertson</snm>
						<fnm>DE</fnm>
					</au>
					<au>
						<snm>Mathur</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Swanson</snm>
						<fnm>RV</fnm>
					</au>
					<au>
						<snm>Marrs</snm>
						<fnm>BL</fnm>
					</au>
					<au>
						<snm>Short</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Society for Industrial Microbiology News</source>
				<pubdate>1996</pubdate>
				<volume>46</volume>
				<fpage>3</fpage>
				<lpage>8</lpage>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Biotechnological prospects from metagenomics</p>
				</title>
				<aug>
					<au>
						<snm>Schloss</snm>
						<fnm>PD</fnm>
					</au>
					<au>
						<snm>Handelsman</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Curr Opin Biotechnol</source>
				<pubdate>2003</pubdate>
				<volume>14</volume>
				<fpage>303</fpage>
				<lpage>310</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0958-1669(03)00067-3</pubid>
						<pubid idtype="pmpid" link="fulltext">12849784</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Recombinant approaches for accessing biodiversity</p>
				</title>
				<aug>
					<au>
						<snm>Short</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Nat Biotechnol</source>
				<pubdate>1997</pubdate>
				<volume>15</volume>
				<fpage>1322</fpage>
				<lpage>1323</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/nbt1297-1322</pubid>
						<pubid idtype="pmpid" link="fulltext">9415872</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Catalysis in the nitrilase superfamily</p>
				</title>
				<aug>
					<au>
						<snm>Brenner</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Curr Opin Struct Biol</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>775</fpage>
				<lpage>782</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-440X(02)00387-1</pubid>
						<pubid idtype="pmpid" link="fulltext">12504683</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>The nitrilase family of CN hydrolysing enzymes - a comparative study</p>
				</title>
				<aug>
					<au>
						<snm>O'Reilly</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Turner</snm>
						<fnm>PD</fnm>
					</au>
				</aug>
				<source>J Appl Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>95</volume>
				<fpage>1161</fpage>
				<lpage>1174</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2672.2003.02123.x</pubid>
						<pubid idtype="pmpid" link="fulltext">14632988</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The nitrilase superfamily: classification, structure and function</p>
				</title>
				<aug>
					<au>
						<snm>Pace</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>reviews0001.1&#8211;0001.9</fpage>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2001-2-1-reviews0001</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Gene context conservation of a higher order than operons</p>
				</title>
				<aug>
					<au>
						<snm>Lathe</snm>
						<fnm>WCIII</fnm>
					</au>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Trends Biochem Sci</source>
				<pubdate>2000</pubdate>
				<volume>25</volume>
				<fpage>474</fpage>
				<lpage>479</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0968-0004(00)01663-7</pubid>
						<pubid idtype="pmpid" link="fulltext">11050428</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Connected gene neighborhoods in prokaryotic genomes</p>
				</title>
				<aug>
					<au>
						<snm>Rogozin</snm>
						<fnm>IB</fnm>
					</au>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Murvai</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Czabarka</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Wolf</snm>
						<fnm>YI</fnm>
					</au>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Szekely</snm>
						<fnm>LA</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>2212</fpage>
				<lpage>2223</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">115289</pubid>
						<pubid idtype="pmpid" link="fulltext">12000841</pubid>
						<pubid idtype="doi">10.1093/nar/30.10.2212</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Gene organization: selection, selfishness, and serendipity</p>
				</title>
				<aug>
					<au>
						<snm>Lawrence</snm>
						<fnm>JG</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>2003</pubdate>
				<volume>57</volume>
				<fpage>419</fpage>
				<lpage>440</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1146/annurev.micro.57.030502.090816</pubid>
						<pubid idtype="pmpid" link="fulltext">14527286</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>A novel method for accurate operon predictions in all sequenced prokaryotes
3</p>
				</title>
				<aug>
					<au>
						<snm>Price</snm>
						<fnm>MN</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>KH</fnm>
					</au>
					<au>
						<snm>Alm</snm>
						<fnm>EJ</fnm>
					</au>
					<au>
						<snm>Arkin</snm>
						<fnm>AP</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<fpage>880</fpage>
				<lpage>892</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">549399</pubid>
						<pubid idtype="pmpid" link="fulltext">15701760</pubid>
						<pubid idtype="doi">10.1093/nar/gki232</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure</p>
				</title>
				<aug>
					<au>
						<snm>Gough</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Karplus</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Hughey</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Chothia</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2001</pubdate>
				<volume>313</volume>
				<fpage>903</fpage>
				<lpage>919</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2001.5080</pubid>
						<pubid idtype="pmpid" link="fulltext">11697912</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Radical SAM, a novel protein superfamily linking unresolved steps in familiar biosynthetic pathways with radical mechanisms: functional characterization using new analysis and information visualization methods</p>
				</title>
				<aug>
					<au>
						<snm>Sofia</snm>
						<fnm>HJ</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Hetzler</snm>
						<fnm>BG</fnm>
					</au>
					<au>
						<snm>Reyes-Spindola</snm>
						<fnm>JF</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>NE</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2001</pubdate>
				<volume>29</volume>
				<fpage>1097</fpage>
				<lpage>1106</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">29726</pubid>
						<pubid idtype="pmpid" link="fulltext">11222759</pubid>
						<pubid idtype="doi">10.1093/nar/29.5.1097</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Crystal structure of an aminoglycoside 6'-N-acetyltransferase: defining the GCN5-related N-acetyltransferase superfamily fold</p>
				</title>
				<aug>
					<au>
						<snm>Wybenga-Groot</snm>
						<fnm>LE</fnm>
					</au>
					<au>
						<snm>Draker</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Wright</snm>
						<fnm>GD</fnm>
					</au>
					<au>
						<snm>Berghuis</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>Structure Fold Des</source>
				<pubdate>1999</pubdate>
				<volume>7</volume>
				<fpage>497</fpage>
				<lpage>507</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0969-2126(99)80066-5</pubid>
						<pubid idtype="pmpid">10378269</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>The MarR repressor of the multiple antibiotic resistance (mar) operon in Escherichia coli: prototypic member of a family of bacterial regulatory proteins involved in sensing phenolic compounds</p>
				</title>
				<aug>
					<au>
						<snm>Sulavik</snm>
						<fnm>MC</fnm>
					</au>
					<au>
						<snm>Gambino</snm>
						<fnm>LF</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>PF</fnm>
					</au>
				</aug>
				<source>Mol Med</source>
				<pubdate>1995</pubdate>
				<volume>1</volume>
				<fpage>436</fpage>
				<lpage>446</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8521301</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Epimerases: structure, function and mechanism</p>
				</title>
				<aug>
					<au>
						<snm>Allard</snm>
						<fnm>ST</fnm>
					</au>
					<au>
						<snm>Giraud</snm>
						<fnm>MF</fnm>
					</au>
					<au>
						<snm>Naismith</snm>
						<fnm>JH</fnm>
					</au>
				</aug>
				<source>Cell Mol Life Sci</source>
				<pubdate>2001</pubdate>
				<volume>58</volume>
				<fpage>1650</fpage>
				<lpage>1665</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11706991</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Understanding nature's strategies for enzyme-catalyzed racemization and epimerization</p>
				</title>
				<aug>
					<au>
						<snm>Tanner</snm>
						<fnm>ME</fnm>
					</au>
				</aug>
				<source>Acc Chem Res</source>
				<pubdate>2002</pubdate>
				<volume>35</volume>
				<fpage>237</fpage>
				<lpage>246</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/ar000056y</pubid>
						<pubid idtype="pmpid" link="fulltext">11955052</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Epoxide hydrolases: biochemistry and molecular biology</p>
				</title>
				<aug>
					<au>
						<snm>Fretland</snm>
						<fnm>AJ</fnm>
					</au>
					<au>
						<snm>Omiecinski</snm>
						<fnm>CJ</fnm>
					</au>
				</aug>
				<source>Chem Biol Interact</source>
				<pubdate>2000</pubdate>
				<volume>129</volume>
				<fpage>41</fpage>
				<lpage>59</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0009-2797(00)00197-6</pubid>
						<pubid idtype="pmpid" link="fulltext">11154734</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Statistical methods for detecting molecular adaptation</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Bielawski</snm>
						<fnm>JP</fnm>
					</au>
				</aug>
				<source>Trends Ecol Evol</source>
				<pubdate>2000</pubdate>
				<volume>15</volume>
				<fpage>496</fpage>
				<lpage>503</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0169-5347(00)01994-7</pubid>
						<pubid idtype="pmpid">11114436</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Inference of selection from multiple species alignments</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Curr Opin Genet Dev</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<fpage>688</fpage>
				<lpage>694</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-437X(02)00348-9</pubid>
						<pubid idtype="pmpid" link="fulltext">12433583</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Maximum likelihood methods for detecting adaptive evolution after gene duplication</p>
				</title>
				<aug>
					<au>
						<snm>Bielawski</snm>
						<fnm>JP</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>J Struct Funct Genomics</source>
				<pubdate>2003</pubdate>
				<volume>3</volume>
				<fpage>201</fpage>
				<lpage>212</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1023/A:1022642807731</pubid>
						<pubid idtype="pmpid" link="fulltext">12836699</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Positive Darwinian selection after gene duplication in primate ribonuclease genes</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rosenberg</snm>
						<fnm>HF</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>1998</pubdate>
				<volume>95</volume>
				<fpage>3708</fpage>
				<lpage>3713</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">19901</pubid>
						<pubid idtype="pmpid" link="fulltext">9520431</pubid>
						<pubid idtype="doi">10.1073/pnas.95.7.3708</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome</p>
				</title>
				<aug>
					<au>
						<snm>Muse</snm>
						<fnm>SV</fnm>
					</au>
					<au>
						<snm>Gaut</snm>
						<fnm>BS</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1994</pubdate>
				<volume>11</volume>
				<fpage>715</fpage>
				<lpage>724</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7968485</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B37">
				<aug>
					<au>
						<snm>Ohno</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Evolution by Gene Duplication</source>
				<publisher>Springer</publisher>
				<pubdate>1970</pubdate>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Selective neutrality of 6PGD allozymes in E. coli and the effects of genetic background</p>
				</title>
				<aug>
					<au>
						<snm>Dykhuizen</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Hartl</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>1980</pubdate>
				<volume>96</volume>
				<fpage>801</fpage>
				<lpage>817</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">7021316</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>Convergent neofunctionalization by positive Darwinian selection after ancient recurrent duplications of the xanthine dehydrogenase gene</p>
				</title>
				<aug>
					<au>
						<snm>Rodriguez-Trelles</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Tarrio</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Ayala</snm>
						<fnm>FJ</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<fpage>13413</fpage>
				<lpage>13417</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">263828</pubid>
						<pubid idtype="pmpid" link="fulltext">14576276</pubid>
						<pubid idtype="doi">10.1073/pnas.1835646100</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>Evolution by gene duplication: an update</p>
				</title>
				<aug>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Trends Ecol Evol</source>
				<pubdate>2003</pubdate>
				<volume>18</volume>
				<fpage>292</fpage>
				<lpage>298</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/S0169-5347(03)00033-8</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites</p>
				</title>
				<aug>
					<au>
						<snm>Suzuki</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2002</pubdate>
				<volume>19</volume>
				<fpage>1865</fpage>
				<lpage>1869</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12411595</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>False positive selection identified by ML-based methods: examples from the Sig1 gene of the diatom Thalassiosira weissflogii and the tax gene of a human T-cell lymphotropic virus</p>
				</title>
				<aug>
					<au>
						<snm>Suzuki</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<fpage>914</fpage>
				<lpage>921</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msh098</pubid>
						<pubid idtype="pmpid" link="fulltext">15014169</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites</p>
				</title>
				<aug>
					<au>
						<snm>Wong</snm>
						<fnm>WS</fnm>
					</au>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Goldman</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Nielsen</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2004</pubdate>
				<volume>168</volume>
				<fpage>1041</fpage>
				<lpage>1051</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1534/genetics.104.031153</pubid>
						<pubid idtype="pmpid" link="fulltext">15514074</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Codon-substitution models for heterogeneous selection pressure at amino acid sites</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Nielsen</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Goldman</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Pedersen</snm>
						<fnm>AM</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>2000</pubdate>
				<volume>155</volume>
				<fpage>431</fpage>
				<lpage>449</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10790415</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>Large-scale search for genes on which positive selection may operate
5</p>
				</title>
				<aug>
					<au>
						<snm>Endo</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ikeo</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1996</pubdate>
				<volume>13</volume>
				<fpage>685</fpage>
				<lpage>690</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8676743</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Nielsen</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2002</pubdate>
				<volume>19</volume>
				<fpage>908</fpage>
				<lpage>917</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">12032247</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>Crystal structure of the worm NitFhit Rosetta Stone protein reveals a Nit tetramer binding two Fhit dimers</p>
				</title>
				<aug>
					<au>
						<snm>Pace</snm>
						<fnm>HC</fnm>
					</au>
					<au>
						<snm>Hodawadekar</snm>
						<fnm>SC</fnm>
					</au>
					<au>
						<snm>Draganescu</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Bieganowski</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Pekarsky</snm>
						<fnm>Y</fnm>
					</au>
					<au>
						<snm>Croce</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<fpage>907</fpage>
				<lpage>917</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(00)00621-7</pubid>
						<pubid idtype="pmpid" link="fulltext">10959838</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<title>
					<p>Crystal structure and site-directed mutagenesis studies of N-carbamoyl-D-amino-acid amidohydrolase from Agrobacterium radiobacter reveals a homotetramer and insight into a catalytic cleft</p>
				</title>
				<aug>
					<au>
						<snm>Wang</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>Hsu</snm>
						<fnm>WH</fnm>
					</au>
					<au>
						<snm>Chien</snm>
						<fnm>FT</fnm>
					</au>
					<au>
						<snm>Chen</snm>
						<fnm>CY</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2001</pubdate>
				<volume>306</volume>
				<fpage>251</fpage>
				<lpage>261</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.4380</pubid>
						<pubid idtype="pmpid" link="fulltext">11237598</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B49">
				<title>
					<p>The use of gene clusters to infer functional coupling</p>
				</title>
				<aug>
					<au>
						<snm>Overbeek</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Fonstein</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>D'Souza</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Pusch</snm>
						<fnm>GD</fnm>
					</au>
					<au>
						<snm>Maltsev</snm>
						<fnm>N</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>1999</pubdate>
				<volume>96</volume>
				<fpage>2896</fpage>
				<lpage>2901</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">15866</pubid>
						<pubid idtype="pmpid" link="fulltext">10077608</pubid>
						<pubid idtype="doi">10.1073/pnas.96.6.2896</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B50">
				<title>
					<p>Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes</p>
				</title>
				<aug>
					<au>
						<snm>Itoh</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Takemoto</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Mori</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Gojobori</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1999</pubdate>
				<volume>16</volume>
				<fpage>332</fpage>
				<lpage>346</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">10331260</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B51">
				<title>
					<p>A comparative genomics approach to prediction of new members of regulons</p>
				</title>
				<aug>
					<au>
						<snm>Tan</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Moreno-Hagelsieb</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Collado-Vides</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Stormo</snm>
						<fnm>GD</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<fpage>566</fpage>
				<lpage>584</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">311042</pubid>
						<pubid idtype="pmpid" link="fulltext">11282972</pubid>
						<pubid idtype="doi">10.1101/gr.149301</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>Missing genes in metabolic pathways: a comparative genomics approach</p>
				</title>
				<aug>
					<au>
						<snm>Osterman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Overbeek</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>Curr Opin Chem Biol</source>
				<pubdate>2003</pubdate>
				<volume>7</volume>
				<fpage>238</fpage>
				<lpage>251</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1367-5931(03)00027-9</pubid>
						<pubid idtype="pmpid" link="fulltext">12714058</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>Evolution of gene order conservation in prokaryotes</p>
				</title>
				<aug>
					<au>
						<snm>Tamames</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<issue>6</issue>
				<fpage>Research0020</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">33396</pubid>
						<pubid idtype="pmpid" link="fulltext">11423009</pubid>
						<pubid idtype="doi">10.1186/gb-2001-2-6-research0020</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Comparative genomics of Archaea: how much have we learned in six years, and what's next?</p>
				</title>
				<aug>
					<au>
						<snm>Makarova</snm>
						<fnm>KS</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<fpage>115</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">193635</pubid>
						<pubid idtype="pmpid" link="fulltext">12914651</pubid>
						<pubid idtype="doi">10.1186/gb-2003-4-8-115</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B55">
				<title>
					<p>Definitions of enzyme function for the structural genomics era</p>
				</title>
				<aug>
					<au>
						<snm>Babbitt</snm>
						<fnm>PC</fnm>
					</au>
				</aug>
				<source>Curr Opin Chem Biol</source>
				<pubdate>2003</pubdate>
				<volume>7</volume>
				<fpage>230</fpage>
				<lpage>237</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1367-5931(03)00028-0</pubid>
						<pubid idtype="pmpid" link="fulltext">12714057</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Microbial population genomics and ecology: the road ahead</p>
				</title>
				<aug>
					<au>
						<snm>DeLong</snm>
						<fnm>EF</fnm>
					</au>
				</aug>
				<source>Environ Microbiol</source>
				<pubdate>2004</pubdate>
				<volume>6</volume>
				<fpage>875</fpage>
				<lpage>878</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1111/j.1462-2920.2004.00668.x</pubid>
						<pubid idtype="pmpid" link="fulltext">15305912</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>Environmental genomics, the big picture</p>
				</title>
				<aug>
					<au>
						<snm>Rodriguez-Valera</snm>
						<fnm>F</fnm>
					</au>
				</aug>
				<source>FEMS Microbiol Lett</source>
				<pubdate>2004</pubdate>
				<volume>231</volume>
				<fpage>153</fpage>
				<lpage>158</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0378-1097(04)00006-0</pubid>
						<pubid idtype="pmpid">15027428</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B58">
				<title>
					<p>The Institute for Genome Research</p>
				</title>
				<pubdate>2005</pubdate>
				<url>http://www.tigr.org</url>
			</bibl>
			<bibl id="B59">
				<title>
					<p>DOE Joint Genome Institute</p>
				</title>
				<pubdate>2005</pubdate>
				<url>http://www.jgi.doe.gov/</url>
			</bibl>
			<bibl id="B60">
				<title>
					<p>BioEdit</p>
				</title>
				<aug>
					<au>
						<snm>Hall</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<pubdate>2005</pubdate>
				<url>http://www.mbio.ncsu.edu/BioEdit/bioedit.html</url>
			</bibl>
			<bibl id="B61">
				<title>
					<p>PHYLIP -- Phylogeny Inference Package (Version 3.2)</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Cladistics</source>
				<pubdate>1989</pubdate>
				<volume>5</volume>
				<fpage>164</fpage>
				<lpage>166</lpage>
			</bibl>
			<bibl id="B62">
				<title>
					<p>TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing</p>
				</title>
				<aug>
					<au>
						<snm>Schmidt</snm>
						<fnm>HA</fnm>
					</au>
					<au>
						<snm>Strimmer</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Vingron</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>von Haeseler</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<fpage>502</fpage>
				<lpage>504</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/18.3.502</pubid>
						<pubid idtype="pmpid" link="fulltext">11934758</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B63">
				<title>
					<p>PAUP*: phylogenetic analysis using parsimony (*and other methods). </p>
				</title>
				<aug>
					<au>
						<snm>Swofford</snm>
						<fnm>DL</fnm>
					</au>
				</aug>
				<publisher>Sinauer Associates, Sunderland, Mass.</publisher>
				<pubdate>1998</pubdate>
				<url>http://paup.csit.fsu.edu/about.html</url>
			</bibl>
			<bibl id="B64">
				<title>
					<p>MODELTEST: testing the model of DNA substitution</p>
				</title>
				<aug>
					<au>
						<snm>Posada</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Crandall</snm>
						<fnm>KA</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1998</pubdate>
				<volume>14</volume>
				<fpage>817</fpage>
				<lpage>818</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/14.9.817</pubid>
						<pubid idtype="pmpid" link="fulltext">9918953</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B65">
				<title>
					<p>PAML: a program package for phylogenetic analysis by maximum likelihood</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1997</pubdate>
				<volume>13</volume>
				<fpage>555</fpage>
				<lpage>556</lpage>
				<xrefbib>
					<pubid idtype="pmpid">9367129</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>Z</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1998</pubdate>
				<volume>15</volume>
				<fpage>568</fpage>
				<lpage>573</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9580986</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>A Protein Structure Modeling Package</p>
				</title>
				<aug>
					<au>
						<snm>Xiang</snm>
						<fnm>SZ Jackal</fnm>
					</au>
				</aug>
				<pubdate>2005</pubdate>
				<url>http://honiglab.cpmc.columbia.edu/programs/jackal</url>
			</bibl>
			<bibl id="B68">
				<title>
					<p>The PyMOL Molecular Graphics System</p>
				</title>
				<aug>
					<au>
						<snm>DeLano</snm>
						<fnm>WL</fnm>
					</au>
				</aug>
				<publisher>DeLano Scientific, San Carlos, CA, USA.</publisher>
				<pubdate>2002</pubdate>
				<url>http://www.pymol.org</url>
			</bibl>
		</refgrp>
	</bm>
</art>
