<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2148-7-78</ui>
	<ji>1471-2148</ji>
	<fm>
		<dochead>Research article</dochead>
		<bibl>
			<title>
				<p>Phylogenetic analysis of condensation domains in NRPS sheds light on their functional evolution</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Rausch</snm>
					<fnm>Christian</fnm>
					<insr iid="I1"/>
					<email>rausch@informatik.uni-tuebingen.de</email>
				</au>
				<au id="A2" ca="yes">
					<snm>Hoof</snm>
					<fnm>Ilka</fnm>
					<insr iid="I1"/>
					<insr iid="I2"/>
					<email>ilka@cbs.dtu.dk</email>
				</au>
				<au id="A3">
					<snm>Weber</snm>
					<fnm>Tilmann</fnm>
					<insr iid="I3"/>
					<email>tilmann.weber@biotech.uni-tuebingen.de</email>
				</au>
				<au id="A4">
					<snm>Wohlleben</snm>
					<fnm>Wolfgang</fnm>
					<insr iid="I3"/>
					<email>wolfgang.wohlleben@biotech.uni-tuebingen.de</email>
				</au>
				<au id="A5">
					<snm>Huson</snm>
					<mi>H</mi>
					<fnm>Daniel</fnm>
					<insr iid="I1"/>
					<email>huson@informatik.uni-tuebingen.de</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Center for Bioinformatics T&#252;bingen (ZBIT), Eberhard-Karls-Universit&#228;t T&#252;bingen, Sand 14, 72076 T&#252;bingen, Germany</p>
				</ins>
				<ins id="I2">
					<p>Center for Biological Sequence Analysis, BioCentrum, Danmarks Tekniske Universitet, Building 208, 2800 Lyngby, Denmark</p>
				</ins>
				<ins id="I3">
					<p>Department of Microbiology/Biotechnology, Eberhard-Karls-Universit&#228;t T&#252;bingen, Auf der Morgenstelle 28, 72076 T&#252;bingen, Germany</p>
				</ins>
			</insg>
			<source>BMC Evolutionary Biology</source>
			<issn>1471-2148</issn>
			<pubdate>2007</pubdate>
			<volume>7</volume>
			<issue>1</issue>
			<fpage>78</fpage>
			<url>http://www.biomedcentral.com/1471-2148/7/78</url>
			<xrefbib>
				<pubidlist>
					<pubid idtype="pmpid">17506888</pubid>
					<pubid idtype="doi">10.1186/1471-2148-7-78</pubid>
				</pubidlist>
			</xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>19</day>
					<month>12</month>
					<year>2006</year>
				</date>
			</rec>
			<acc>
				<date>
					<day>16</day>
					<month>5</month>
					<year>2007</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>16</day>
					<month>5</month>
					<year>2007</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2007</year>
			<collab>Rausch et al; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Non-ribosomal peptide synthetases (NRPSs) are large multimodular enzymes that synthesize a wide range of biologically active natural peptide compounds, of which many are pharmacologically important. Peptide bond formation is catalyzed by the Condensation (C) domain. Various functional subtypes of the C domain exist: An <sup>L</sup>C<sub>L </sub>domain catalyzes a peptide bond between two L-amino acids, a <sup>D</sup>C<sub>L </sub>domain links an L-amino acid to a growing peptide ending with a D-amino acid, a Starter C domain (first denominated and classified as a separate subtype here) acylates the first amino acid with a <it>&#946;</it>-hydroxy-carboxylic acid (typically a <it>&#946;</it>-hydroxyl fatty acid), and Heterocyclization (Cyc) domains catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues. The homologous Epimerization (E) domain flips the chirality of the last amino acid in the growing peptide; Dual E/C domains catalyze both epimerization and condensation.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>In this paper, we report on the reconstruction of the phylogenetic relationship of NRPS C domain subtypes and analyze in detail the sequence motifs of recently discovered subtypes (Dual E/C, <sup>D</sup>C<sub>L </sub>and Starter domains) and their characteristic sequence differences, mutually and in comparison with <sup>L</sup>C<sub>L </sub>domains. Based on their phylogeny and the comparison of their sequence motifs, <sup>L</sup>C<sub>L </sub>and Starter domains appear to be more closely related to each other than to other subtypes, though pronounced differences in some segments of the protein account for the unequal donor substrates (amino vs. <it>&#946;</it>-hydroxy-carboxylic acid). Furthermore, on the basis of phylogeny and the comparison of sequence motifs, we conclude that Dual E/C and <sup>D</sup>C<sub>L </sub>domains share a common ancestor. In the same way, the evolutionary origin of a C domain of unknown function in glycopeptide (GP) NRPSs can be determined to be an <sup>L</sup>C<sub>L </sub>domain. In the case of two GP C domains which are most similar to <sup>D</sup>C<sub>L </sub>but which have <sup>L</sup>C<sub>L </sub>activity, we postulate convergent evolution.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>We systematize all C domain subtypes including the novel Starter C domain. With our results, it will be easier to decide the subtype of unknown C domains as we provide profile Hidden Markov Models (pHMMs) for the sequence motifs as well as for the entire sequences. The determined specificity conferring positions will be helpful for the mutation of one subtype into another, e.g. turning <sup>D</sup>C<sub>L </sub>to <sup>L</sup>C<sub>L</sub>, which can be a useful step for obtaining novel products.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>The biologically active products synthesized by non-ribosomal peptide synthetases (NRPSs) are of interest for a variety of reasons: Pharmaceutically, a rich collection of them are used as drugs like antibiotics (e.g. penicillin and vancomycin), anti-tumorals and cytostatics (e.g. bleomycin), anti-inflamatorials and immunosuppressants (e.g. cyclosporin A), toxins (<it>&#945;</it>-amanitine which is found in <it>Amanita phalloides </it>(death cap)), or siderophores. Scientifically, it is a challenge to discover how these structurally complex macromolecules are synthesized by the concerted interworking of the multi-domain proteins NRPS and polyketide synthases (PKS) that synthesize a peptide or ketide backbone with several other modifying and "decorating" enzymes (halogenases, glycosyl transferases etc.). NRPS belong to the family of megasynthetases, which are among the largest known enzymes with molecular weights of up to ~2.3 MDa (~21,000 residues) <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. They possess several modules, each of which contains a set of enzymatic domains that, in their specificity, number, and organization, determine the primary structure of the corresponding peptide products; for a recent review on NRPS, see Sieber and Marahiel <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, and Lautru and Challis <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. A complete module contains at least three enzymatic domains (see Fig. <figr fid="F1">1</figr>).</p>
			<fig id="F1">
				<title>
					<p>Figure 1</p>
				</title>
				<caption>
					<p>Modular structure of NRPSs</p>
				</caption>
				<text>
					<p><b>Modular structure of NRPSs</b>. Module and domain structure of NRPS. Top, center: one complete NRPS consisting of three modules. Bottom: enzymatic domains contained in a complete module: <it>Cond: </it>Condensation domain (the detail shows the approximate positions of the seven motifs shown in detail in Fig. 2), <it>Adenyl: </it>Adenylation domain (A domain), <it>N-Meth: </it>N-methylation domain (optional &#8211; does not appear in all NRPS), <it>PCP: </it>Thiolation domain (T domain or Peptidyl Carrier Protein domain), <it>Epi: </it>Epimerization domain (optional). Other optional domains are: Heterocyclization, Oxidation, Reduction and Formylation domains.</p>
				</text>
				<graphic file="1471-2148-7-78-1"/>
			</fig>
			<p>The adenylation (A) domain specifically recognizes one amino acid (or hydroxy acid) and activates it first through the formation of an aminoacyl adenylate and then via covalent bonding of the activated amino acid as a thioester to the 4'-phosphopantetheinyl (4'PPant) cofactor of the peptidyl carrier protein (PCP domain, also called phosphopantetheine attachment site or thiolation (T) domain). The third compulsory domain is the Condensation (C) domain, which catalyzes the elongation reaction of the peptidyl chain tethered to the phosphopantetheinyl arm of the upstream T domain to the amino acid bound to the downstream T domain (reviewed by Lautru and Challis <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>). This is why the first module of an NRPS usually does not contain a C domain, but only the second module has the domains CAT. The exceptions are C domains, which we name <it>Starter C </it>domains; these acylate the first amino acid with a fatty acid (with a <it>&#946;</it>-hydroxy-carboxylic acid to be precise as we will discuss below). Chain elongation is terminated by the action of a thioesterase (TE) domain. It is usually the final domain of the last module in the assembly line and catalyzes either the hydrolysis or the intramolecular cyclization of the peptide chain, yielding a linear or macrocyclic product <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Although the multi-domain proteins NRPS and PKS are also found in fungal and plant genomes, most of the known sequences stem from bacteria. The bacterial order <it>Actinomycetales </it>is known for the wealth of secondary metabolites produced by its members and comprises, among others, <it>Streptomyces </it>species, <it>Corynebacteria </it>and <it>Mycobacteria</it>. The majority of all currently known antibiotics and other therapeutic compounds are derived from <it>Streptomycetes </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Many members of <it>Corynebacteria </it>and <it>Mycobacteria </it>are human pathogens which produce toxins as secondary metabolites. The structural and functional diversity of non-ribosomal peptides, unlike ribosomally synthesized peptides, arises from the incorporation of unusual amino acids: During the assembly of the peptide backbone by the NRPS, both proteinogenic and non-proteinogenic amino acids (e.g. ornithine), including D-amino acids, may be integrated and modified "on-the-fly" by enzymatic domains within the NRPS protein. Possible (optional) modifications of the building blocks (= amino acids) are N-acylation of the first amino acid, epimerization (into D-amino acids), N-methylation, or cyclization of amino acids (cysteine, serine or threonine) with an amide-nitrogen of the peptide "backbone", resulting in oxazolines (e.g. in vibriobactin) and thiazolines (e.g. in bacitracin); these can be further oxidized or reduced by special domains <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, and further halogenation or hydroxylation may be mediated by specialized domains. Occasionally dehydration is performed on serines, resulting in dehydroalanine <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. Further modifications &#8211; glycosylation or phosphorylation &#8211; are usually performed by so-called "decorating" enzymes, usually clustered in proximity to the NRPS genes on the chromosome <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
			<p>In this paper, we report on the functional variants (subtypes) and homologues of the Condensation (C) domain of NRPS. All C domain sequences of this study were extracted from NRPS that were detected in all available completely sequenced bacterial genomes and a comprehensive collection of annotated biosynthesis clusters. Besides A domains (and thioesterase II domains; see Sieber and Marahiel <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>) C domains also show specificity for their substrates (see below). An in-depth understanding of their function is thus crucial for re-engineering NRPS to produce novel bioactive compounds. In practice, it has been shown that it is possible to engineer synthetic systems for the production of novel products: Stachelhaus <it>et al</it>. <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> demonstrated that domain swapping, which is the recombination of domain-coding regions of desired specificity to a synthetic fusion protein, worked to create new variants of surfactin and is thus one possibility, although only one amino acid position in the product was varied, which did not alter its activity, and the total yield was very low (0.5 % of wilt-type yield).</p>
			<p>Because C domains have been shown to have non-negligible specificity for the amino acid that is activated by the downstream A domain, swapping whole modules or insertion/deletion seems to be more promising, provided that the integrity of the functional domains is carefully maintained and the modules are dissected in their linker regions <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Nevertheless, reduced catalytic efficiency and product yield is a serious problem. A less invasive strategy involves the manipulation of the domains' specificity by point mutations as demonstrated by Eppelmann <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> for the A domain. Therefore, an in-depth knowledge of all functional subtypes and homologues of the C domains is indispensable. In this report, we reconstruct their phylogeny and reveal the sequence motifs of all subtypes and homologues, and their mutual differences. The insights gained will be helpful in future attempts to turn one sub-specificity into another, e.g. changing the stereoselectivity of the C domain.</p>
			<p>Furthermore, we have analyzed C domains and Epimerization (E) domains of glycopeptide NRPS. In these proteins, two Condensation domains preceded by former (now inactive) Epimerization domains have gained opposite stereoselectivity, probably due to convergent evolution, for which we accumulate evidence. Additionally, we discuss the origin of a C domain (often referred to as X* domain) at the C-terminus of glycopeptide NRPS, which is thought to be inactive.</p>
		</sec>
		<sec>
			<st>
				<p>Results and Discussion</p>
			</st>
			<sec>
				<st>
					<p>Current knowledge of subtypes <sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L</sub>, Cyc, and Dual E/C</p>
				</st>
				<p>The C domain has two binding sites: one for the electrophilic donor substrate (the acyl group of the growing chain) and one for the nucleophilic acceptor substrate (the activated amino acid). The condensation reaction involves catalysis of a nucleophilic attack by the amino group of the aminoacyl adenylate bound to the downstream PCP on the acyl group of the growing peptide chain which is bound to the upstream PCP <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B11">11</abbr></abbrgrp>. The acceptor site of the C domain was shown to exhibit a strong stereoselectivity and significant side chain selectivity. The selectivity towards a specific side chain seems to be less pronounced at the donor site which, however, exhibits strong stereoselectivity <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
				<p>In particular, C domains succeeding an E domain are expected to show specificity towards the configuration (L or D) of the C-terminal residue that is bound at the donor site because the preceding E domain does not specifically catalyze the epimerization from L to D but provides a mixture of configurations. It is the role of the C domain to select the correct enantiomer <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Moreover, the C domain represents some kind of selectivity filter in that it supports the selection of the correct downstream nucleophile and prevents product mixtures <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
				<p>C domains immediately downstream of E domains were shown to be D-specific for the upstream donor and L-specific for the downstream acceptor, thus catalyzing the condensation reaction between a D- and an L-residue. These C domains were termed <sup>D</sup>C<sub>L</sub>-catalysts because of this behavior <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>.</p>
				<p>Accordingly, <sup>L</sup>C<sub>L</sub>-catalysts promote the condensation of two L-amino acids. Both <sup>L</sup>C<sub>L</sub>- and <sup>D</sup>C<sub>L</sub>-catalysts possess a conserved His-motif in their active site. The consensus sequence of this motif is HHxxxDG where x denotes any residue (see Fig. <figr fid="F2">2</figr>, motif 3). The second His-residue seems to be essential for the catalytic function of the domain <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Core motifs C1 through C7 of C domain subtypes <sup>L</sup>C<sub>L</sub>, Starter, <sup>D</sup>C<sub>L </sub>and Dual E/C domains</p>
					</caption>
					<text>
						<p><b>Core motifs C1 through C7 of C domain subtypes <sup>L</sup>C<sub>L</sub>, Starter, <sup>D</sup>C<sub>L </sub>and Dual E/C domains</b>. Compared to Marahiel <it>et al</it>. [29], motifs are extended in both directions to include more significantly conserved positions. Yellow bars indicate significant specificity determining positions between <sup>L</sup>C<sub>L</sub>, Starter and <sup>D</sup>C<sub>L </sub>domains; those with red stars on top are the most significant positions. Numbers above the letter stacks indicate residues of functional and structural importance refered to in Subsection "Key residues in Condensation domains" and Table 1.</p>
					</text>
					<graphic file="1471-2148-7-78-2"/>
				</fig>
				<p>As a third type of C domain, so-called Dual Epimerization/Condensation (E/C) domains have recently been identified. This finding was based on the observation of NRPS which had products that contained D-residues although the NRPS itself did not show an E domain in the corresponding module. Biochemical experiments supported the hypothesis that Dual E/C domains exist which are <sup>D</sup>C<sub>L</sub>-catalysts with epimerase activity <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. In the assembly line, a Dual E/C domain follows directly after a C-A-T module which activates and incorporates an L-amino acid. The module which contains the Dual domain also activates an L-amino acid. Then the Dual domain catalyzes the epimerization of the L-residue into D configuration and subsequently promotes the condensation of those two residues. In addition to the active site His-motif which is found in all C domains, Dual E/C domains exhibit a second His-motif, HH[I/L]xxxxGD, which is located close to the N-terminus of the domain <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> (It is partly located on motifs C1 &amp; C2; see Fig. <figr fid="F2">2</figr>.)</p>
				<p>C domains may be replaced by Heterocyclization (Cyc) domains which catalyze both peptide bond formation and subsequent cyclization of cysteine (Cys), serine (Ser), and threonine (Thr) residues. The five-membered heterocyclic rings which result from this reaction are important for chelating metals or interaction with proteins, DNA or RNA. Cyc domains are structurally related to C domains and are supposed to be evolutionary specialized C domains <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. In Cyc domains, however, the active site His motif is replaced by another conserved motif, DxxxxD. Keating <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> found that the aspartate (Asp, D) residues are critical for both condensation and heterocyclization.</p>
			</sec>
			<sec>
				<st>
					<p>Collected C domain sequence data and their phylogenetic tree</p>
				</st>
				<p>A total of 481 Condensation domains (including their homologues, Epimerization and Heterocyclization domains) were extracted from 182 (non-identical) NRPS and 31 NRPS/PKS hybrid sequences found in 62 bacterial genomes out of the 256 bacterial genomes screened, employing pHMMs as described in Section Methods (Note that only one genome was considered for our analysis if sequences of several strains of the same species were available, which reduced the number of NRPS or 'hybrid NRPS/PKS' containing genomes from 62 to 43). Altogether 108 C domains were obtained from 42 NRPS sequences from gene clusters downloaded from the UniProt database. After removing doublets, all 525 non-identical C domains and homologues obtained were multiply aligned and phylogenetic trees were built. The resulting tree topology was clearly dominated by the functional categories that are known for C domains (as described in the previous section), rather than species phylogeny or substrate specificity alone. The four main functions are: <it>1</it>. condensation performed by ordinary C domains; <it>2</it>. condensation and subsequent heterocyclization catalyzed by Heterocyclization (Cyc) domains; <it>3</it>. epimerization followed by condensation which are both catalyzed by a Dual E/C domain; <it>4</it>. Starter domains (see below) which are found on initiation (= first) modules and acylate the subsequent amino acid.</p>
				<p>Ordinary C domains may further be classified into <sup>L</sup>C<sub>L</sub>-catalysts and <sup>D</sup>C<sub>L</sub>-catalysts according to the stereochemistry of their substrates. The existence of all these functional subtypes is reflected by the phylogeny. Fig. <figr fid="F3">3</figr> shows a phylogenetic tree for subsets of each C domain subtype, as the whole tree of 525 taxa is far too large to be displayed here (see Additional files <supplr sid="S1">1</supplr> and <supplr sid="S2">2</supplr>). The tree of all taxa showed a similar topology perfectly reflecting the functional categories.</p>
				<suppl id="S1">
					<title>
						<p>Additional file 1</p>
					</title>
					<text>
						<p><b>Phylogenetic tree of all 525 C domain sequences of this study reconstructed using phyml</b>. Zipped Nexus file (file name extension .nex.zip, to be unpacked and opened with SplitsTree <abbrgrp><abbr bid="B67">67</abbr><abbr bid="B71">71</abbr></abbrgrp>).</p>
					</text>
					<file name="1471-2148-7-78-S1.zip">
						<p>Click here for file</p>
					</file>
				</suppl>
				<suppl id="S2">
					<title>
						<p>Additional file 2</p>
					</title>
					<text>
						<p>Phylogenetic tree of all 525 C domain sequences of this study reconstructed using phyml.</p>
					</text>
					<file name="1471-2148-7-78-S2.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Phylogenetic trees of all C subtypes</p>
					</caption>
					<text>
						<p><b>Phylogenetic trees of all C subtypes</b>. Phylogenetic tree of all C subtypes (<sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L</sub>, Starter, Dual E/C, Epimerization and Heterocyclization domains). The phylogeny was reconstructed using phyml, employing the JTT model of amino acid substitution and a gamma-distributed rate variation with four categories. The support values are based on 100-fold bootstrapping.</p>
					</text>
					<graphic file="1471-2148-7-78-3"/>
				</fig>
				<p>For further analysis, the different subtypes were examined separately. While Cyc and Dual E/C domains could be identified by means of their characteristic sequence motifs (see Section Methods/Predicting of functional subtypes), <sup>L</sup>C<sub>L</sub>- and <sup>D</sup>C<sub>L</sub>-catalysts were either distinguished according to their domain structure or by their position in the phylogenetic tree. By this, 275 domains of all 525 C domains were classified as being <sup>L</sup>C<sub>L</sub>-catalysts, 69 were <sup>D</sup>C<sub>L</sub>-catalysts and 42 were Starter C domains (see next section).</p>
			</sec>
			<sec>
				<st>
					<p>Description of a new C domain subtype: The Starter C domain</p>
				</st>
				<p>When analyzing the Condensation (C) domain phylogeny, it became apparent that some domains did not cluster with the known C domain subtypes. A closer look at the location of these deviating C domains revealed that all of them were the very first C domain of the corresponding NRPS assembly line. The remaining C domains of these assembly lines appeared in other subtrees in the phylogeny.</p>
				<p>Included in this set of starter C domains are those stemming from the biosynthesis clusters for the lipopeptides surfactin <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>, lichenysin <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>, fengycin <abbrgrp><abbr bid="B17">17</abbr></abbrgrp> and arthrofactin <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. These lipopeptides are characterized by a <it>&#946;</it>-hydroxyl fatty acid which is connected to the first amino acid of the peptide chain <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. The peptide synthetases involved in the production of these lipopeptides all have a C domain as their very first domain. This C domain is supposed to serve as an acceptor for a fatty acid which is transferred from an acyltransferase <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. This acylation process has also been observed for surfactin <abbrgrp><abbr bid="B20">20</abbr></abbrgrp> and fengycin biosynthesis <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Moreover, common to the Starter C domains of these biosynthesis clusters is their low sequence similarity to the remaining C domains of the same biosynthesis cluster <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>.</p>
				<p>The same has been observed for the synthesis of the acidic lipopeptide CDA in <it>Streptomyces coelicolor </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp> and the recently identified lipopeptide produced by protein NP_960354.1 of <it>Mycobacterium avium </it><abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The Starter C domain of the pristinamycin cluster appears to diverge from this pattern at the first view. The C domain is the first domain of the polypeptide SnbC but the biosynthesis of pristinamycin is initiated by SnbA, which contains an A domain that activates 3-hydroxypicolinic acid (3-hydroxypyridine-2-carboxylic acid, "2-hydroxy-6-azabenzoate") but lacks an ACP <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. SnbA is homologous to EntE, which contains an A domain specific for 2,3-dihydroxybenzoate (DHB) and which is involved in the biosynthesis of enterobactin <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. A similar organization can be found in actinomycin biosynthesis. The process is initiated by AcmA, which activates 4-methyl-3-hydroxyanthranilic acid (MHA, 4-methyl-3-hydroxy-2-aminobenzoate) <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. In conclusion, what the C domains of SnbC, AcmB and EntF have in common is that they catalyze bond formation between a derivative of salicylic acid (2-hydroxy-benzoate) and an <it>&#945;</it>-amino acid. Assured by the fact that these Starter C domains match significantly well to the profile HMM built from the Starter C domain sequences that process <it>&#946;</it>-hydroxy fatty acids, we compared salicylic acid with <it>&#946;</it>-hydroxy fatty acids. Because both are <it>&#946;</it>-hydroxy-carboxylic acids with no amino-substituent at the <it>&#945; </it>position, as <it>a</it>-amino acids would have, we assume that this is the structural characteristic recognized by the prototype of Starter C domains. The profile HMM built from all Starter C domains in our data set (together with the pHMMs of the other domains) presents a powerful instrument for exploring and understanding tricky NRPS domain-product relations.</p>
				<p>Note that Formylation domains as found, for example at the N-terminus of linear gramicidin synthetase subunit A <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> are not C domains but belong to the Pfam "formyl transferase" domain family.</p>
			</sec>
			<sec>
				<st>
					<p>Characteristic Sequence Motifs of <sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L</sub>, Starter C domains and Dual E/C domains</p>
				</st>
				<p>The different core motifs in Condensation domains have first been described by de Cr&#233;cy-Lagard <it>et al</it>. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and recompiled by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> but have never been updated since then. The core motifs of the C domain homologues, Epimerization and Heterocyclization domain are listed in the publication by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> but the sequence motifs of the recently discovered <sup>D</sup>C<sub>L </sub>domains <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B30">30</abbr></abbrgrp> as well as the Dual E/C <abbrgrp><abbr bid="B13">13</abbr></abbrgrp> domains have never been comprehensively analyzed. Moreover the Starter C domain has not yet been recognized in the literature as a proper separate subtype.</p>
				<p>The sequence motifs represented in Fig. <figr fid="F2">2</figr> improve the C domain core motif consensus sequences published by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> which, at that time, were based on much fewer sequences and did not differentiate between the C domain subtypes. The motifs are represented as sequence logos <abbrgrp><abbr bid="B31">31</abbr></abbrgrp> which make it easier to identify variably conserved positions compared to simple consensus sequences. We adhere to the core motifs identified by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and also show the surrounding "landscape" if there are highly conserved positions nearby, especially if they are important for distinguishing between the C domain subtypes. The motifs were built on the basis of 40 verified and 198 predicted <sup>L</sup>C<sub>L </sub>sequences, in which "predicted" means that they were classified based purely on their position in the phylogenetic tree while "verified" sequences were checked individually taking into account their position in the succession of neighboring NRPS domains, the presence of discriminative unique motifs (see Methods Section) and/or literature information. For the <sup>D</sup>C<sub>L </sub>motifs, 23 verified and 46 predicted sequences were used, 7 verified and 35 predicted for the Starter domains, and domains 9 verified and 47 predicted for the Dual E/C domains.</p>
			</sec>
			<sec>
				<st>
					<p>Key residues in Condensation domains derived from the literature</p>
				</st>
				<p>Based on three publications, four residues are likely to be essential for the catalytic activity of the C domain. The most important residue is the 2nd His of the active site His-motif <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>.</p>
				<p>Furthermore, six residues have been identified as being structurally important or as playing a role in correct folding of the domain. In the following, these residues are presented, grouped by their role (the numbering is according to their linear occurrence on the peptide; see Fig. <figr fid="F2">2</figr>). This information is also presented in Table <tblr tid="T1">1</tblr> where the sites are sorted by their relative position in the domain.</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Residues of importance for catalytic activity, structure or correct folding. Residues for which the importance has been previously determined are shown in Fig. 2, giving their numbers, their role and the bibliographic reference of the appropriate mutation study.</p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c ca="center">
								<p>Nb. in Fig. 2</p>
							</c>
							<c ca="center">
								<p>Importance:</p>
							</c>
							<c ca="left">
								<p>Position is homologous to:</p>
							</c>
							<c ca="center">
								<p>Reference:</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>1</p>
							</c>
							<c ca="center">
								<p>structure</p>
							</c>
							<c ca="left">
								<p>Arg62 (R) in TycB1</p>
							</c>
							<c ca="center">
								<p>[34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>2</p>
							</c>
							<c ca="center">
								<p>folding</p>
							</c>
							<c ca="left">
								<p>Arg67 (R) in TycB1</p>
							</c>
							<c ca="center">
								<p>[34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>3</p>
							</c>
							<c ca="center">
								<p>folding</p>
							</c>
							<c ca="left">
								<p>His146 in TycB1 (1st His of active site His-motif)</p>
							</c>
							<c ca="center">
								<p>[34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>4</p>
							</c>
							<c ca="center">
								<p>catalytic activity</p>
							</c>
							<c ca="left">
								<p>His126 (2nd His of the active site His-motif) in VibH</p>
							</c>
							<c ca="center">
								<p>[14,33,34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>5</p>
							</c>
							<c ca="center">
								<p>structure</p>
							</c>
							<c ca="left">
								<p>Asp130 (D) in VibH</p>
							</c>
							<c ca="center">
								<p>[14,33,34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>6</p>
							</c>
							<c ca="center">
								<p>catalytic activity</p>
							</c>
							<c ca="left">
								<p>Gly131 (G of the active site His-motif) in VibH</p>
							</c>
							<c ca="center">
								<p>[33]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>7</p>
							</c>
							<c ca="center">
								<p>folding</p>
							</c>
							<c ca="left">
								<p>Trp202 (W) in TycB1</p>
							</c>
							<c ca="center">
								<p>[34]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>8</p>
							</c>
							<c ca="center">
								<p>structure</p>
							</c>
							<c ca="left">
								<p>Arg263 (R) in VibH = Arg278 (R) in EntF</p>
							</c>
							<c ca="center">
								<p>[14,33]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>9</p>
							</c>
							<c ca="center">
								<p>catalytic activity</p>
							</c>
							<c ca="left">
								<p>Trp264 (W) in VibH according to Keating <it>et al</it>., but absent in <sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L </sub>and Starter C domains</p>
							</c>
							<c ca="center">
								<p>[14]</p>
							</c>
						</r>
						<r>
							<c ca="center">
								<p>10</p>
							</c>
							<c ca="center">
								<p>catalytic activity</p>
							</c>
							<c ca="left">
								<p>Asn335 (N) in VibH</p>
							</c>
							<c ca="center">
								<p>[33]</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<sec>
					<st>
						<p>Residues of importance for catalytic activity of the domain</p>
					</st>
					<p>#4 His 126 (2nd His of the active site His-motif) with respect to (w.r.t.) VibH <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp></p>
					<p>#9 Trp264 (W) is catalytically important in VibH according to Keating <it>et al</it>. <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, but the corresponding position is not conserved in any of the C domain subtypes <sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L </sub>or Starter.</p>
					<p>#10 Asn335 (N) w.r.t. VibH <abbrgrp><abbr bid="B33">33</abbr></abbrgrp></p>
					<p>#6 Gly131 (G of the active site His-motif) w.r.t. VibH <abbrgrp><abbr bid="B33">33</abbr></abbrgrp></p>
				</sec>
				<sec>
					<st>
						<p>Residues of structural importance</p>
					</st>
					<p>#1 Arg62 (R) w.r.t. TycB1 <abbrgrp><abbr bid="B34">34</abbr></abbrgrp></p>
					<p>#5 Asp130 (D) w.r.t. VibH <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B33">33</abbr><abbr bid="B34">34</abbr></abbrgrp></p>
					<p>#8 Arg263 (R) w.r.t. VibH <abbrgrp><abbr bid="B14">14</abbr></abbrgrp> = Arg278 (R) w.r.t. EntF <abbrgrp><abbr bid="B33">33</abbr></abbrgrp></p>
				</sec>
				<sec>
					<st>
						<p>Residues important for correct folding</p>
					</st>
					<p>#2 Arg67 (R) w.r.t. TycB1 <abbrgrp><abbr bid="B34">34</abbr></abbrgrp></p>
					<p>#3 His146 w.r.t. TycB1 (1st His of active site His-motif) <abbrgrp><abbr bid="B34">34</abbr></abbrgrp></p>
					<p>#7 Trp202 (W) w.r.t. TycB1 <abbrgrp><abbr bid="B34">34</abbr></abbrgrp></p>
				</sec>
				<sec>
					<st>
						<p><sup>L</sup>C<sub>L </sub>vs. <sup>D</sup>C<sub>L</sub></p>
					</st>
					<p><sup>L</sup>C<sub>L </sub>and <sup>D</sup>C<sub>L </sub>domains do not differ significantly in any of the residues identified as being of catalytic or structural importance (except residues Nb. 9 and Nb. 10). However, using methods described in Section Methods, 20 positions in which <sup>L</sup>C<sub>L </sub>and <sup>D</sup>C<sub>L </sub>have significant differences according to SDPpred <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> could be detected, plus 5 additional high scoring positions within the extended motifs according to FRpred <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. When comparing the different motifs, motif C4 differs noticeably between <sup>L</sup>C<sub>L </sub>and <sup>D</sup>C<sub>L </sub>subtypes. The same is true for the region downstream of C4 (after the mutually very conserved TRP at pos. 184 in VibH coordinates) where a moderately conserved motif LPxDxxRP is seen in <sup>L</sup>C<sub>L </sub>which is completely absent in <sup>D</sup>C<sub>L </sub>(see Additional file <supplr sid="S3">3</supplr>).</p>
					<suppl id="S3">
						<title>
							<p>Additional file 3</p>
						</title>
						<text>
							<p><b>Comparison of the logos generated from the pHMMs for the 3 subtypes <sup>L</sup>C<sub>L</sub>, Starter and <sup>D</sup>C<sub>L </sub>domain using LogoMat-P </b><abbrgrp><abbr bid="B72">72</abbr></abbrgrp>.</p>
						</text>
						<file name="1471-2148-7-78-S3.pdf">
							<p>Click here for file</p>
						</file>
					</suppl>
				</sec>
			</sec>
			<sec>
				<st>
					<p><sup>L</sup>C<sub>L </sub>vs. Starter domain</p>
				</st>
				<p>While not being conserved at residues Nb. 5, Nb. 7, Nb. 9, and Nb. 10, all remaining 6 functionally important residues are highly conserved throughout the putative Starter domains. When comparing <sup>L</sup>C<sub>L </sub>and Starter domains, 18 discriminative positions were found by SDPpred and 5 more were found in the motifs by FRpred. Those positions are highlighted in Fig. <figr fid="F2">2</figr>. Common to these residues is the fact that they seem to be highly conserved among extender (= <sup>L</sup>C<sub>L</sub>) domains but show no conservation among Starter C domains. When we compare C domain sequence motifs, it is apparent that motifs C2 and C4, despite being well conserved in <sup>L</sup>C<sub>L</sub>, are unconserved in Starter domains, which presumably can be explained by the much broader structural range of substrates processed by Starter domains.</p>
			</sec>
			<sec>
				<st>
					<p>What the phylogeny tells about the relationship of <sup>L</sup>C<sub>L </sub>vs. Starter and <sup>D</sup>C<sub>L </sub>vs. Dual E/C domains</p>
				</st>
				<p>The reconstructed phylogeny of C domain subtypes reveals that <sup>L</sup>C<sub>L </sub>and Starter C domains are more closely related to each other than to other subtypes (see Fig. <figr fid="F3">3</figr>). Comparing sequence motifs confirms this observation, though pronounced differences in some segments of the protein (especially in motifs C2 and C3, as can be seen in Fig. <figr fid="F2">2</figr>) account for the unequal donor substrates (amino vs. <it>&#946;</it>-hydroxy-carboxylic acid). Furthermore the phylogenetic tree shows that Dual E/C and <sup>D</sup>C<sub>L </sub>domains share a common ancestor. We tested the reliability of the phylogenies depicted in Fig. <figr fid="F3">3</figr> and Fig. <figr fid="F4">4</figr> by repeating the reconstruction on biased profile alignments. These biased alignments were generated by producing MUSCLE profile-profile alignments in a step-wise manner, assuming evolutionary relationships of the different domain subtypes that are contradictory to what the original trees suggest. The topology of the resulting trees supports the shared ancestry of <sup>L</sup>C<sub>L </sub>and Starter C domains as well as of Dual E/C and <sup>D</sup>C<sub>L </sub>domains. In addition, we generated an alignment using DIALIGN <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, which is a non-progressive alignment method, and subsequently reconstructed a PHYML-tree based on this alignment. Here also, the Dual E/C and <sup>D</sup>C<sub>L </sub>domains are grouped together as are <sup>L</sup>C<sub>L </sub>and Starter C domains.</p>
				<fig id="F4">
					<title>
						<p>Figure 4</p>
					</title>
					<caption>
						<p>Phylogenetic trees of all C subtypes including C domains from glycopeptide clusters</p>
					</caption>
					<text>
						<p><b>Phylogenetic trees of all C subtypes including C domains from glycopeptide clusters</b>. Additionally, this tree includes all C domains of glycopeptide antibiotic biosynthesis clusters (in dashed boxes). The phylogeny was reconstructed using phyml, employing the JTT model of amino acid substitution and a gamma-distributed rate variation with four categories. The support values are based on 100-fold bootstrapping.</p>
					</text>
					<graphic file="1471-2148-7-78-4"/>
				</fig>
				<p>Especially in motif C5, Dual E/C and <sup>D</sup>C<sub>L </sub>domains are very similar to each other and dissimilar to <sup>L</sup>C<sub>L </sub>and Starter domains. This observation of the relationship between the four subtypes is consistent with the stereochemistry of the substrates, bearing in mind that Dual E/C domains function as <sup>D</sup>C<sub>L </sub>because the substrate L-amino acid is first epimerized by the intrinsic epimerization activity of the domain <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
				<p>Within the subtrees of <sup>D</sup>C<sub>L </sub>and <sup>L</sup>C<sub>L </sub>domains, the tree topology reflects the species phylogeny of the bacteria rather than substrate specificity of any kind. We analyzed this by reconstructing phylogenies for <sup>D</sup>C<sub>L </sub>domains and <sup>L</sup>C<sub>L </sub>domains separately to be able to see the topology within these subtypes in more detail (data not shown). The reconstructed phylogenies did not give any evidence that would support the hypothesis that C domains cluster according to their specificity towards the condensated amino acids. This analysis, however, is based on the complete C domain sequence. A strategy to investigate whether C domains exhibit substrate specificity would involve predicting putative specificity determining positions using entropy and/or conservation based approaches (e.g. SDPpred, FRpred), or inferring of putative active site residues by homology with the VibH structure (as done by Rausch et al. <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> for the adenylation domain).</p>
			</sec>
			<sec>
				<st>
					<p>Enigmatic Glycopeptide antibiotic NRPS</p>
				</st>
				<p>Glycopeptide antibiotics are a subgroup of nonribosomal peptide antibiotics of which the best known representatives are probably vancomycin and teicoplanin. To date, all identified glycopeptide antibiotics are produced by actinomycetes. They interrupt cell wall formation of gram-positive bacteria by binding to the D-Ala-D-Ala termini of the growing peptidoglycan, thereby inhibiting the transpeptidation reaction. All glycopeptide antibiotics consist of a heptapeptide backbone which is synthesized by NRPS.</p>
				<p>Modification reactions involve extensive cross-linking of the aromatic side chains to rigidify the molecule <abbrgrp><abbr bid="B39">39</abbr><abbr bid="B40">40</abbr></abbrgrp>. The modular organization of some NRPS which were identified in glycopeptide-producing actinomycetes are depicted in Fig. <figr fid="F5">5</figr>.</p>
				<fig id="F5">
					<title>
						<p>Figure 5</p>
					</title>
					<caption>
						<p>Modular organization of NRPS involved in glycopeptide synthesis</p>
					</caption>
					<text>
						<p><b>Modular organization of NRPS involved in glycopeptide synthesis</b>. Domains marked in light gray (Completstatin) are inactive and corrupt. Moreover, E domains in ComB and StaB are also thought to be inactive.</p>
					</text>
					<graphic file="1471-2148-7-78-5"/>
				</fig>
				<p>All these NRPSs comprise seven modules. They show an identical domain composition, with the exceptions of module M3 in the A47934 (<it>sta</it>) and M3 and M6 in complestatin (<it>com</it>) clusters which contain an E domain not present in the other clusters. The M3-E domain, however, is assumed to be inactive <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, while the presence of an E domain in <it>com </it>M6 has not been reported elsewhere so far. We were able to detect it with an hmmpfam scan using the specific E domain pHMM. All six NRPSs contain a domain X* of unknown function. Until now, it has been characterized as an atypical C or E domain but its role in glycopeptide synthesis remains to be clarified. In general, it is assumed that the stereochemistry of a NRPS product can be predicted from its domain structure. In the case of the known glycopeptides, the domain organization implies the stereochemistry NH<sub>2</sub>-L-D-L-D-D-L-L-COOH, provided that the E in module M3 is inactive and that the X* domain does not function as an E domain. This stereochemistry is inconsistent with the chemically determined structure of the products: NH<sub>2</sub>-D-D-L-D-D-L-L-COOH <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>. The assumption is that the A domain of the first module activates a D-amino acid. For the <it>cep </it>cluster, however, Trauger and Walsh <abbrgrp><abbr bid="B42">42</abbr></abbrgrp> show that the A domain of M1 prefers L-Leu over D-Leu in a 6:1 ratio; but on the other hand, they could not show which stereoisomer is processed further. This suggests the existence of an unknown E domain that acts on the L-Leu activated by M1. With the discovery of Dual E/C domains, a new possible strategy arises for the incorporation of a D-residue by the first module. However, no Dual E/C domain could be detected in all glyco-NRPS. Alternatively, one could imagine an external racemase as is found in the cyclosporin cluster <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, which provides a D-Leu that can be incorporated directly.</p>
				<p>Having gained knowledge about the differences between <sup>L</sup>C<sub>L</sub>, Starter and <sup>D</sup>C<sub>L </sub>domains as described above, we examined all glyco-NRPSs. When we reconstructed the phylogeny of C domains including all homologous domains from glyco-NRPSs, it was staggering to find that all C domains were clustered in the <sup>D</sup>C<sub>L </sub>subtree and the X* domain clustered in the <sup>L</sup>C<sub>L </sub>subtree (see Fig. <figr fid="F4">4</figr>). This finding could be confirmed by analyzing all instances of the C domain motifs found in these domains. How could this be interpreted, given the fact that M4 and M7 C domains clearly act as <sup>L</sup>C<sub>L </sub>domains, as we can tell by the stereochemistry of the products? Our hypothesis is that those C domains are former <sup>D</sup>C<sub>L </sub>domains that have developed <sup>L</sup>C<sub>L </sub>activity by convergent evolution. Accumulating supportive evidence is possible: When we look at the phylogeny of the C domains, the sequences of the <it>com </it>cluster from <it>Streptomyces lavendulae </it>are always most distant from the others and more closely related to the hypothetical common ancestor, implying that they can serve as a model for the archetype of glyco-C domains. It is likely that in the archetype, all C domains were true <sup>D</sup>C<sub>L </sub>catalysts, supposing that the E domains which are still present in <it>com </it>modules M4 and M7 were still active.</p>
				<p>In a similar way, we can trace back the origin of the X* domain: in the <it>com </it>cluster (and only there) it is followed by remnants of an adenylation domain (which has several larger insertions and deletions; see Additional file <supplr sid="S4">4</supplr>). This tells us that the X* domain used to be the first domain of a new module followed by an adenylation domain.</p>
				<suppl id="S4">
					<title>
						<p>Additional file 4</p>
					</title>
					<text>
						<p><b>HMMER outputs of glyco-NRPS: fossils in ComC and ComD</b>. ZIP file containing two text files.</p>
					</text>
					<file name="1471-2148-7-78-S4.zip">
						<p>Click here for file</p>
					</file>
				</suppl>
				<p>The assumption that the diverged C domains of modules M4 and M7 would have adopted mutations at positions that we have previously determined as "specificity determining positions" was disproved. Probably, a few spontaneous mutations in the <sup>D</sup>C<sub>L </sub>domains relaxed the stereo-selectivity; supposing that this altered stereochemistry of the product resulted in a highly selective advantage (arising from a vancomycin-like product), the loss of the functional E domains in M3 and M6 would have been a selective gain. Comparing all M4 and/or all M7 C domains with all <sup>D</sup>C<sub>L </sub>domains using SDPpred did not reveal any significant positions; comparing them against the other glyco-C domains gave thirty positions. As all glyco-C domains are very closely related and differences between them might also reflect substrate selectivity (not only stereo-selectivity) or different inter-domain interacting residues, we cannot decide which of them confer the altered stereo-selectivity. One point to notice however, is a (positively charged) His in all M4 glyco-C domains at position 6 in the extended motif C2 where an (uncharged polar) Gln is highly conserved in other <sup>D</sup>C<sub>L </sub>domains. This position has also been selected by FRpred as a significant (= subtyping) position. The other positions do not represent mutations in highly conserved residues (data not shown). It would be necessary to check their significance experimentally with mutation studies. It would also be helpful to compare the peculiar sequences with more glyco-C domains, but others are -unfortunately &#8211; not publicly available.</p>
				<p>However, although we could not discover which altered positions are responsible for the functional shift from <sup>D</sup>C<sub>L </sub>to <sup>L</sup>C<sub>L </sub>in glyco-C domains, interesting experimental questions can be formulated based on our findings. For example, one could think of mutational studies with the goal of altering the stereo-selectivity of a <sup>D</sup>C<sub>L </sub>domain and to determine the relevant residues experimentally. A starting point could be, for example the M6 C domain of any glyco-NRPS.</p>
			</sec>
			<sec>
				<st>
					<p>Glycopeptide-AB module M7 vs <sup>L</sup>C<sub>L</sub></p>
				</st>
				<p>The second His of the His-motif in motif C3 which is important for catalysis is replaced by Arg (R). Also, the Gly of the His-motif is not present but replaced by Arg in all but one X* domain. Note, however, that while the second active site His is invariant in C domains, Gly138 is not.</p>
				<p>SDPpred predicted 13 specificity determining residues when comparing M7-X* to <sup>L</sup>C<sub>L</sub>-domains of <it>Streptomyces </it>species. Only three of these coincide with residues of functional importance: His126, Arg278 and Asn335. Furthermore, a C terminal region could be detected in which M7-X* and <sup>L</sup><it>C</it><sub>L </sub>differ strikingly. The concordance of M7-X* with the most highly conserved residues of Streptomycete <sup>L</sup>C<sub>L </sub>domains supports the phylogenetically based suggestion that M7-X* is an inactive <sup>L</sup>C<sub>L </sub>domain.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>In this study, we present the evolutionary relationship of homologues of the NRPS Condensation domain which include enzymatic domains catalyzing Epimerization, Heterocyclization, Condensation and Epimerization with subsequent Condensation in one domain (called the Dual E/C domain). The Condensation domain itself appears in three subtypes according to the stereo-chemistry of the substrates catalyzed: <sup>L</sup>C<sub>L </sub>domains, which condense two L-aminoacids, <sup>D</sup>C<sub>L </sub>domains, which condense a D-amino acid (N-terminal part of the growing peptide) with an L-amino acid, and Starter C domains (an expression that we coin here) which connect a <it>&#946;</it>-hydroxy-carboxylic acid (e.g. <it>&#946;</it>-hydroxyl fatty acid) with an L-amino acid. The phylogeny of C domain homologues is reconstructed using NRPS sequences (including hybrid NRPS) from completely sequenced genomes (43 genomes contained NRPS) and selected biosynthesis clusters, involving 525 non-identical C domain sequences. The sequence motifs of <sup>L</sup>C<sub>L</sub>, <sup>D</sup>C<sub>L </sub>and Starter domains have been extracted and are presented as sequence logos: for <sup>L</sup>C<sub>L </sub>domains, this represents an update of consensus sequences published by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>; <sup>D</sup>C<sub>L </sub>and Starter domain motifs are analyzed and mutually compared for the first time. For comparison, the homologous motifs are also presented for Dual E/C domains, which were first described by <it>Balibar et al</it>. <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
			<p>We have investigated the "mysterious" evolutionary origin of C domains in glycopeptide antibiotic synthesis clusters and have discovered that two of the six C domains present in these glyco-NRPSs appear in the <sup>D</sup>C<sub>L </sub>subtree of the phylogenetic tree and show all DC<sub>L </sub>sequence motifs, although they clearly have <sup>L</sup>C<sub>L </sub>activity. This suggests that they might be an example of convergent evolution. Even though this is probably a rare event, its possibility has to be kept in mind when uncharacterized C domains are to be classified, e.g. using profile HMMs provided as Additional files <supplr sid="S5">5</supplr>, <supplr sid="S6">6</supplr>, <supplr sid="S7">7</supplr>. Furthermore, we found that a C domain-like segment of glyco-NRPS, called X*, is related to the <sup>L</sup>C<sub>L </sub>domains and is followed by remnants of an A domain, implying an additional complete module in the ancestor of glyco-NRPS.</p>
			<suppl id="S5">
				<title>
					<p>Additional file 5</p>
				</title>
				<text>
					<p><b>Profile HMMs of the 4 complete C domain subtypes (<sup>L</sup>C<sub>L</sub>, Starter, <sup>D</sup>C<sub>L</sub>, Dual) which can be used to detect and distinguish between the subtypes</b>. Zipped text file (file name extension .hmm to be used with the program package HMMER <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>).</p>
				</text>
				<file name="1471-2148-7-78-S5.zip">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S6">
				<title>
					<p>Additional file 6</p>
				</title>
				<text>
					<p><b>Aligned full length condensation domains of this study</b>. Zipped sequence file (aligned protein sequences in FASTA format).</p>
				</text>
				<file name="1471-2148-7-78-S6.zip">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S7">
				<title>
					<p>Additional file 7</p>
				</title>
				<text>
					<p><b>Profile HMMs of all 7 motifs of all subtypes (<sup>L</sup>C<sub>L</sub>, Starter, <sup>D</sup>C<sub>L</sub>, Dual)</b>. Zipped text file (file name extension .hmm to be used with the program package HMMER <abbrgrp><abbr bid="B49">49</abbr></abbrgrp>.</p>
				</text>
				<file name="1471-2148-7-78-S7.zip">
					<p>Click here for file</p>
				</file>
			</suppl>
			<p>Roongsawang <it>et al</it>. <abbrgrp><abbr bid="B44">44</abbr></abbrgrp> have already performed a study of the phylogeny of C domains which compares the three C domain subtypes. However, this study shows no awareness of the Dual E/C domain, which has since been discovered. Moreover, we used a much more comprehensive dataset of C domain subsequences (525, as opposed to Roongasawang et al.'s 162) compiled from all complete bacterial genomes and biosynthesis clusters. Because of the omission of Dual E/C domains, their conclusions need to be revised, as we have shown.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Genomes and sequences</p>
				</st>
				<p>The protein sequences and GenBank entries for all completely sequenced bacterial genomes available to date were obtained from the NCBI FTP site <url>ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/</url>. In total, the genomes of 256 bacterial species were downloaded and screened for NRPS protein sequences (including NRPS/PKS hybrids). Additional protein sequences of PKS and NRPS which are part of known secondary metabolite biosynthesis clusters were obtained from the UniProt database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. NRPSs were retrieved from 14 known biosynthesis clusters, of which 13 came from <it>Actinomycetes </it>and one from <it>Pseudomonas </it>(see Additional file <supplr sid="S8">8</supplr>).</p>
				<suppl id="S8">
					<title>
						<p>Additional file 8</p>
					</title>
					<text>
						<p>Listing of NRPSs from known biosynthesis clusters used in this study.</p>
					</text>
					<file name="1471-2148-7-78-S8.pdf">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
			<sec>
				<st>
					<p>Identification of enzymatic domains</p>
				</st>
				<p>A common strategy for the identification of a specific type of domain is to use Profile Hidden Markov Models (pHMMs), which are statistical models extracted from multiple sequence alignments. In contrast to simple sequence motifs of fixed length, i.e. position specific scoring matrices, pHMMs are suited for identifying motifs that are interrupted by segments of variable length, and are used to characterize position-specific sequence similarities within a family of proteins. A collection of pHMMs for a wide array of domains and domain families is availabe from the database Pfam <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> and TIGRFAMs <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. The pHMM implementation HMMER <abbrgrp><abbr bid="B48">48</abbr><abbr bid="B49">49</abbr></abbrgrp> and self-written Perl <abbrgrp><abbr bid="B50">50</abbr></abbrgrp> scripts and BioPerl <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> scripts were used to search for NRPS in the genome sequences and biosynthesis clusters and to extract single domains from a given protein sequence. To identify a protein sequence as an NRPS, the occurrence of at least one complete NRPS module with one C domain, one A domain and T domain was required (Pfam accession numbers PF00668, PF00501 and PF00550), with an E-value threshold of 0.1 (thus we accepted to miss freestanding starter modules containing only A and T domains, or had to add them manually, as in the case of the biosynthesis clusters).</p>
				<p>The Pfam pHMM Condensation (PF00668) recognizes both the Condensation (C) and Epimerization (E) domain of NRPS. The intention, however, is to be able to distinguish between these two domain types. Therefore C domain and E domain specific pHMMs were generated from a multiple sequence alignment (MSA) of Epimerization domains and non-Epimerization domains, both of which were recognized by the Pfam C pHMM. To obtain a set of Epimerization domains, all NRPS sequences with complete modules were extracted from all bacterial protein sequences in the Uniprot database <abbrgrp><abbr bid="B45">45</abbr></abbrgrp> as described above. Whenever two consecutive C domains followed by an A domain were detected with Pfam pHMMs, the "first C" domain was extracted. That way, we obtained a set consisting mainly of E domains (151 of 237 sequences). By phylogenetic subtyping (as described below) we determined the E domain sequences from the phylogenetic tree of the "first C" domains, which were forming a distinct subtree. The E and non-E sequences were aligned with MUSCLE <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>, and specific pHMMs were build for them with hmmbuild and hmmcalibrate from the HMMER package (As a control, it was not possible to detect E domains in the 771 "second C" domains). The domain sequence covered by our own pHMMs for C and E domains is identical with that of the Pfam Condensation pHMM; in other words it extends from four positions before our extended C1 motif to the fourth position after the extended C5 motif (these motifs were first revealed by de Cr&#233;cy-Lagard <it>et al</it>. <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> and reviewed by Marahiel <it>et al</it>. <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>). Phylogenetic reconstruction is always based on this part of the C domain (see Fig. <figr fid="F2">2</figr>). To extract the complete N-terminal part of the C domains, we followed the dissections applied by Roche and Walsh <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> and checked the secondary structure with Quick2D of the MPI Bioinformatics Toolkit <abbrgrp><abbr bid="B54">54</abbr><abbr bid="B55">55</abbr></abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Generation of multiple sequence alignments</p>
				</st>
				<p>The quality of a reconstructed phylogenetic tree crucially depends on the underlying multiple sequence alignment. All sequence alignments in our study were generated using MUSCLE <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>. The alignment algorithm can be divided into three stages. First, a progressive alignment is built based on a UPGMA guide-tree. In the second stage, the underlying guide-tree is iteratively improved, yielding a new progressive alignment. The third stage involves refinement of the tree: Based on the tree, bipartitions of the dataset are produced; their profiles are extracted and realigned to each other. Thus, the finally generated alignment is not solely based on a single guide-tree, which is why we can rule out that the phylogenies reconstructed on the basis of these alignments merely reflect the guide-tree used in the first step of the algorithm.</p>
			</sec>
			<sec>
				<st>
					<p>Predicting substrate specificity</p>
				</st>
				<p>C domains catalyze the condensation of two amino acids, thus, they have two binding sites: the acceptor and the donor site. To be able to investigate whether the substrate specificity of one of these sites influences the phylogeny of the domain, the specificity of the preceding and succeeding A domain in the assembly line was predicted with the NRPSpredictor <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> and stored for each C domain.</p>
			</sec>
			<sec>
				<st>
					<p>Predicting functional subtypes</p>
				</st>
				<p>Functional subtypes may be distinguished on the basis of sequence features, domain architecture or clustering behavior during tree reconstruction. Condensation and Heterocyclization domains may be discriminated by the sequence motif they exhibit at their active site. The occurrence of a sequence motif within a longer sequence can be detected with the help of a position specific score matrix (PSSM) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>. PSSMs were generated and applied for the detection of the active site His-motif of the C domain and the DxxxxD-motif of the Heterocyclization domain. These were used to discriminate between the two subtypes. The His-motif was built from 86 sequences and the Cyc motif from 15 sequences. The PSSMs were only applied to a region of 100 residues which was expected to contain the active site. In addition, a PSSM was generated for the N-terminal His-motif found in Dual E/C domains. It was constructed from 55 sequences which had been identified as Dual E/C domains by their clustering behavior in the phylogeny and by additional visual inspection of the alignment. The PSSM was applied for validation purposes to make sure that this N-terminal His-motif is unique to Dual E/C domains and cannot be found in any other C domain subtype. Predicting whether a C domain is a <sup>L</sup>C<sub>L</sub>- or a <sup>D</sup>C<sub>L</sub>-catalyst was established according to the observed domain organization of the modules in an NRPS sequence (<sup>D</sup>C<sub>L</sub>-catalysts were first described by Luo <it>et al</it>. <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>). It is assumed that the role of a module with the domain structure C-A-T-E is the activation and epimerization of a residue that is in the L stereo configuration with the intention of incorporating a D residue into the final product. Alongside this, a C domain directly following an E domain is expected to be selective for residues in D-configuration, which is why it was assigned to the <sup>D</sup>C<sub>L</sub>-type. All other C domains were assumed to be <sup>L</sup>C<sub>L</sub>-catalysts. Classification as a <sup>D</sup>C<sub>L</sub>-catalyst is supposed to be fairly reliable. A false positive should only occur if the preceding epimerase turns out to be nonfunctional. The <sup>L</sup>C<sub>L </sub>classification, however, is prone to errors when the respective C domain is the very first (N-terminal) domain in the protein. In this case, the type of the condensation reaction can only be assigned if the order in which the proteins act in the assembly line is known. To overcome this problem, we checked all assignments with the classification suggested by the phylogeny.</p>
				<p>If the order of the subunits is unknown, temporarily incorrect assignments can only be revised later in the analysis.</p>
			</sec>
			<sec>
				<st>
					<p>Analysis of multiple sequence alignments for specificity determining positions</p>
				</st>
				<p>In a set of homologous enzymes, we may find subsets that each contain sequences with one distinct substrate specificity. These subsets of common function are called subtypes and often vary at certain positions, whereas the same positions may be conserved within a given subtype. Li <it>et al</it>. <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> call these specificity-determining residues (SDR); Kalinina <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> refer to them as specificity determining positions (SDP). To determine SDPs from an alignment, calculating each column's mutual information is a possible way, as described by Li <it>et al</it>. <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> and Kalinina <it>et al</it>. <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. For this paper, SDPs were determined using the freely accessible SDPpred server <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. Here, the mutual information is based on so-called smoothed frequencies, which allow substitution of residues with similar physico-chemical properties. In addition to that, the significance of the mutual information of each position is estimated by calculating Z-scores and evaluating their significance. Predictions by SDPpred were compared with the highest scoring positions predicted by FRpred <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B57">57</abbr></abbrgrp> which combines a mutual information term with a conservation score.</p>
			</sec>
			<sec>
				<st>
					<p>Reconstruction of phylogenetic trees</p>
				</st>
				<p>Several methods were applied for reconstructing phylogenetic trees from the multiple sequence alignments that were generated for each domain type. Trees presented in this article were reconstructed using protein sequences, as amino acid sequences are preferred to nucleotide sequences because they are more conserved and are not influenced by compositional bias like G+C content and codon usage. In addition, the mathematical model for the evolutionary change of amino acid sequences is much simpler than that of nucleotide sequences, which reduces the risk that the phylogeny is based on wrong evolutionary assumptions, since just a suitable substitution matrix has to be selected <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. The amino acid substitution matrix employed in this study was the Jones-Taylor-Thornton (JTT) matrix <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>.</p>
				<p>In some cases, the rate of amino acid substitution may be assumed to be the same for all positions in the alignment. In general, however, this does not reflect reality since the substitution rate is usually higher at positions of lower functional importance. A more realistic model is achieved if the substitution rate is taken to vary among sites according to the gamma distribution <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>.</p>
				<p>Apart from PHYLIP <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, all methods used in this study offer an estimation of parameter <it>&#945; </it>which determines the shape of the &#915; distribution as an option. Whenever a gamma distributed rate variation was assumed, four gamma-rate categories were used to approximate the distribution. Several tree reconstruction methods were applied to each dataset to determine whether different methods yield different topologies, which in turn would indicate that the proposed topologies are unreliable. As a distance-based method, the Neighbor-Joining (NJ) method <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> was applied. The distances were calculated with the program protdist and NJ was performed with neighbor, both available from the PHYLIP package. For NJ, only uniform substitution rates were used. As a maximum likelihood method, the programs IQPNNI <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> and PHYML <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> were applied.</p>
				<p>Bootstrapping <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> was performed to test the reliability of the topologies.</p>
				<p>In general, a topology is taken as reliable if tree reconstruction results in the same topology for at least 95% of the datasets generated by bootstrapping. This is a quite strict approach and it has been shown that subtrees of a tree may be accepted as being significant if they are supported by only 70% of the trees <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Using the PHYLIP package, bootstrap datasets were generated with seqboot and used as input data for neighbor. PHYML also offers an option that allows a bootstrap analysis of the original data. This results in a set of trees which can be visualized as a <it>consensus network </it>using SplitsTree4 <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. The specification of a cutoff value allows a clearer view of the bootstrap tree/network where only those edges which are supported by boostrap values higher than the cutoff are included.</p>
			</sec>
			<sec>
				<st>
					<p>Detection of sequence motifs and their representation</p>
				</st>
				<p>The program meme <abbrgrp><abbr bid="B68">68</abbr><abbr bid="B69">69</abbr></abbrgrp> was used to detect the sequence motifs in C domains. Meme discovers one or more motifs in a collection of unaligned DNA or protein sequences. The C domain subtypes were aligned using MUSCLE <abbrgrp><abbr bid="B52">52</abbr><abbr bid="B53">53</abbr></abbrgrp>, the multiple alignments were visualized using JalView <abbrgrp><abbr bid="B70">70</abbr></abbrgrp> and the motifs found by meme were extracted (cut out). It was ascertained that the C domain motifs described by Sieber and Marahiel <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> were included as well as remarkable sequence positions in the proximity of the motifs, such as single conserved residues or positions which were important for discerning the subtypes. The dissected motif sequences were used to create pHHMs with HMMER and also to create sequence logos using seqlogo by Crooks <it>et al</it>. <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. Sequence logos were prefered over consensus sequences, as they provide a more precise description of sequence similarity and reveal significant features of the alignment which are otherwise difficult to perceive. For sequence logos, positions with > 10% gaps were removed. Sequence logos of all C domain motifs created with seqlogo are available online as Additional file <supplr sid="S9">9</supplr>.</p>
				<suppl id="S9">
					<title>
						<p>Additional file 9</p>
					</title>
					<text>
						<p><b>Sequence logos of all C domain motifs created with weblogo </b><abbrgrp><abbr bid="B31">31</abbr></abbrgrp>. ZIP file containing image files in the PNG file format.</p>
					</text>
					<file name="1471-2148-7-78-S9.zip">
						<p>Click here for file</p>
					</file>
				</suppl>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>Authors CR and IH are the principle authors of this article. IH gathered the sequences and constructed and analyzed the phylogenetic trees. CR analyzed the subtype determining residues and constructed and interpreted the sequence logos. CR wrote the manuscript with the participation of IH in several sections. Authors TW and WW made important contributions to biological questions and DHH contributed to phylogenetic questions. All authors read and approved the final version of the manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Beatrix Weber who constructed E domain and C (non-E) domain pHMMs which were used during this study as part of her diploma thesis. We also acknowledge Nadine Schracke for providing the C domain detail in Fig. <figr fid="F1">1</figr>. Tilmann Weber was supported by a grant by the German Ministry for Education and Research (BMBF-FKZ: 03138505J). Funding for Christian Rausch, Daniel H. Huson and (partially) Ilka Hoof, and also the publication costs for this article, was provided by the Deutsche Forschungsgemeinschaft (funding for the ZBIT, BIZ 1/1-2 and BIZ 1/1-3).</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Identification of peptaibols from <it>Trichoderma virens </it>and cloning of a peptaibol synthetase</p>
				</title>
				<aug>
					<au>
						<snm>Wiest</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Grzegorski</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Xu</snm>
						<fnm>BW</fnm>
					</au>
					<au>
						<snm>Goulard</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Rebuffat</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Ebbole</snm>
						<fnm>DJ</fnm>
					</au>
					<au>
						<snm>Bodo</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Kenerley</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>2002</pubdate>
				<volume>277</volume>
				<issue>23</issue>
				<fpage>20862</fpage>
				<lpage>20868</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.M201654200</pubid>
						<pubid idtype="pmpid" link="fulltext">11909873</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Molecular mechanisms underlying nonribosomal peptide synthesis: approaches to new antibiotics</p>
				</title>
				<aug>
					<au>
						<snm>Sieber</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Chem Rev</source>
				<pubdate>2005</pubdate>
				<volume>105</volume>
				<issue>2</issue>
				<fpage>715</fpage>
				<lpage>738</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">15700962</pubid>
						<pubid idtype="doi">10.1021/cr0301191</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Substrate recognition by nonribosomal peptide synthetase multi-enzymes</p>
				</title>
				<aug>
					<au>
						<snm>Lautru</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Challis</snm>
						<fnm>GL</fnm>
					</au>
				</aug>
				<source>Microbiology</source>
				<pubdate>2004</pubdate>
				<volume>150</volume>
				<issue>Pt 6</issue>
				<fpage>1629</fpage>
				<lpage>1636</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">15184549</pubid>
						<pubid idtype="doi">10.1099/mic.0.26837-0</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>The thioesterase domain of the fengycin biosynthesis cluster: a structural base for the macrocyclization of a non-ribosomal lipopeptide</p>
				</title>
				<aug>
					<au>
						<snm>Samel</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Wagner</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Essen</snm>
						<fnm>LO</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2006</pubdate>
				<volume>359</volume>
				<issue>4</issue>
				<fpage>876</fpage>
				<lpage>889</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16697411</pubid>
						<pubid idtype="doi">10.1016/j.jmb.2006.03.062</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Bioactive microbial metabolites</p>
				</title>
				<aug>
					<au>
						<snm>Br&#233;dy</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Antibiot (Tokyo)</source>
				<pubdate>2005</pubdate>
				<volume>58</volume>
				<fpage>1</fpage>
				<lpage>26</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15813176</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>Structural organization of microcystin biosynthesis in <it>Microcystis aeruginosa </it>PCC7806: an integrated peptide-polyketide synthetase system</p>
				</title>
				<aug>
					<au>
						<snm>Tillett</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Dittmann</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Erhard</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>von D&#246;hren</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>B&#246;rner</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Neilan</snm>
						<fnm>BA</fnm>
					</au>
				</aug>
				<source>Chem Biol</source>
				<pubdate>2000</pubdate>
				<volume>7</volume>
				<issue>10</issue>
				<fpage>753</fpage>
				<lpage>764</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1074-5521(00)00021-1</pubid>
						<pubid idtype="pmpid" link="fulltext">11033079</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B7">
				<title>
					<p>Rational design of peptide antibiotics by targeted replacement of bacterial and fungal domains</p>
				</title>
				<aug>
					<au>
						<snm>Stachelhaus</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Schneider</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1995</pubdate>
				<volume>269</volume>
				<issue>5220</issue>
				<fpage>69</fpage>
				<lpage>72</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.7604280</pubid>
						<pubid idtype="pmpid" link="fulltext">7604280</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Construction of hybrid peptide synthetases by module and domain fusions</p>
				</title>
				<aug>
					<au>
						<snm>Mootz</snm>
						<fnm>HD</fnm>
					</au>
					<au>
						<snm>Schwarzer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<issue>11</issue>
				<fpage>5848</fpage>
				<lpage>5853</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">18522</pubid>
						<pubid idtype="pmpid" link="fulltext">10811885</pubid>
						<pubid idtype="doi">10.1073/pnas.100075897</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>Decreasing the ring size of a cyclic nonribosomal peptide antibiotic by in-frame module deletion in the biosynthetic genes</p>
				</title>
				<aug>
					<au>
						<snm>Mootz</snm>
						<fnm>HD</fnm>
					</au>
					<au>
						<snm>Kessler</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Linne</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Eppelmann</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Schwarzer</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>J Am Chem Soc</source>
				<pubdate>2002</pubdate>
				<volume>124</volume>
				<issue>37</issue>
				<fpage>10980</fpage>
				<lpage>10981</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/ja027276m</pubid>
						<pubid idtype="pmpid" link="fulltext">12224936</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Exploitation of the selectivity-conferring code of nonribosomal peptide synthetases for the rational design of novel peptide antibiotics</p>
				</title>
				<aug>
					<au>
						<snm>Eppelmann</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Stachelhaus</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2002</pubdate>
				<volume>41</volume>
				<issue>30</issue>
				<fpage>9718</fpage>
				<lpage>9726</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/bi0259406</pubid>
						<pubid idtype="pmpid" link="fulltext">12135394</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Biosynthesis of nonribosomal peptides</p>
				</title>
				<aug>
					<au>
						<snm>Finking</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Annu Rev Microbiol</source>
				<pubdate>2004</pubdate>
				<volume>58</volume>
				<fpage>453</fpage>
				<lpage>488</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">15487945</pubid>
						<pubid idtype="doi">10.1146/annurev.micro.58.030603.123615</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Chirality of peptide bond-forming condensation domains in nonribosomal peptide synthetases: the C5 domain of tyrocidine synthetase is a <sup>D</sup>C<sub>L </sub>catalyst</p>
				</title>
				<aug>
					<au>
						<snm>Clugston</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Sieber</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2003</pubdate>
				<volume>42</volume>
				<issue>41</issue>
				<fpage>12095</fpage>
				<lpage>12104</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">14556641</pubid>
						<pubid idtype="doi">10.1021/bi035090+</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Generation of D amino acid residues in assembly of arthrofactin by dual condensation/epimerization domains</p>
				</title>
				<aug>
					<au>
						<snm>Balibar</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Vaillancourt</snm>
						<fnm>FH</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Chem Biol</source>
				<pubdate>2005</pubdate>
				<volume>12</volume>
				<issue>11</issue>
				<fpage>1189</fpage>
				<lpage>1200</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16298298</pubid>
						<pubid idtype="doi">10.1016/j.chembiol.2005.08.010</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>The structure of VibH represents nonribosomal peptide synthetase condensation, cyclization and epimerization domains</p>
				</title>
				<aug>
					<au>
						<snm>Keating</snm>
						<fnm>TA</fnm>
					</au>
					<au>
						<snm>Marshall</snm>
						<fnm>CG</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
					<au>
						<snm>Keating</snm>
						<fnm>AE</fnm>
					</au>
				</aug>
				<source>Nat Struct Biol</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<issue>7</issue>
				<fpage>522</fpage>
				<lpage>526</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12055621</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Surfactin, a crystalline peptidelipid surfactant produced by <it>Bacillus subtilis </it>: isolation, characterization and its inhibition of fibrin clot formation</p>
				</title>
				<aug>
					<au>
						<snm>Arima</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kakinuma</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Tamura</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Biochem Biophys Res Commun</source>
				<pubdate>1968</pubdate>
				<volume>31</volume>
				<issue>3</issue>
				<fpage>488</fpage>
				<lpage>494</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/0006-291X(68)90503-2</pubid>
						<pubid idtype="pmpid" link="fulltext">4968234</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Structural analysis of <it>Bacillus licheniformis </it>86 surfactant</p>
				</title>
				<aug>
					<au>
						<snm>Horowitz</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Griffin</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>J Ind Microbiol</source>
				<pubdate>1991</pubdate>
				<volume>7</volume>
				<fpage>45</fpage>
				<lpage>52</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1007/BF01575602</pubid>
						<pubid idtype="pmpid">1367206</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Sequence completion, identification and definition of the fengycin operon in <it>Bacillus subtilis </it>168</p>
				</title>
				<aug>
					<au>
						<snm>Tosato</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Albertini</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Zotti</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sonda</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bruschi</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Microbiology</source>
				<pubdate>1997</pubdate>
				<volume>143</volume>
				<issue>Pt 11</issue>
				<fpage>3443</fpage>
				<lpage>3450</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9387222</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>A new lipopeptide biosurfactant produced by <it>Arthrobacter </it>sp. strain MIS38</p>
				</title>
				<aug>
					<au>
						<snm>Morikawa</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Daido</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Takao</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>ra ta</snm>
						<fnm>SM</fnm>
					</au>
					<au>
						<snm>moni shi</snm>
						<fnm>YS</fnm>
					</au>
					<au>
						<snm>Imanaka</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1993</pubdate>
				<volume>175</volume>
				<issue>20</issue>
				<fpage>6459</fpage>
				<lpage>6466</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">206754</pubid>
						<pubid idtype="pmpid" link="fulltext">8407822</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>Molecular and biochemical characterization of the protein template controlling biosynthesis of the lipopeptide lichenysin</p>
				</title>
				<aug>
					<au>
						<snm>Konz</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Doekel</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1999</pubdate>
				<volume>181</volume>
				<fpage>133</fpage>
				<lpage>140</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">103541</pubid>
						<pubid idtype="pmpid" link="fulltext">9864322</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Sequence and analysis of the genetic locus responsible for surfactin synthesis in <it>Bacillus subtilis</it></p>
				</title>
				<aug>
					<au>
						<snm>Cosmina</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Rodriguez</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>de Ferra</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Grandi</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Perego</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Venema</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>van Sinderen</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Mol Microbiol</source>
				<pubdate>1993</pubdate>
				<volume>8</volume>
				<issue>5</issue>
				<fpage>821</fpage>
				<lpage>831</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1111/j.1365-2958.1993.tb01629.x</pubid>
						<pubid idtype="pmpid">8355609</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>A putative new peptide synthase operon in <it>Bacillus subtilis </it>: partial characterization</p>
				</title>
				<aug>
					<au>
						<snm>Tognoni</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Franchi</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Magistrelli</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Colombo</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Cosmina</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Grandi</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Microbiology</source>
				<pubdate>1995</pubdate>
				<volume>141</volume>
				<issue>Pt 3</issue>
				<fpage>645</fpage>
				<lpage>648</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7711903</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Structure, biosynthetic origin, and engineered biosynthesis of calcium-dependent antibiotics from <it>Streptomyces coelicolor</it></p>
				</title>
				<aug>
					<au>
						<snm>Hojati</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Milne</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Harvey</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Gordon</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Borg</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Flett</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Wilkinson</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Sidebottom</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Rudd</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Hayes</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Smith</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Micklefield</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Chem Biol</source>
				<pubdate>2002</pubdate>
				<volume>9</volume>
				<issue>11</issue>
				<fpage>1175</fpage>
				<lpage>1187</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1074-5521(02)00252-1</pubid>
						<pubid idtype="pmpid" link="fulltext">12445768</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>A major cell wall lipopeptide of <it>Mycobacterium avium </it>subspecies <it>paratuberculosis</it></p>
				</title>
				<aug>
					<au>
						<snm>Eckstein</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Chandrasekaran</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Mahapatra</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>McNeil</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Chatterjee</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Rithner</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ryan</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>JT</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Inamine</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>2006</pubdate>
				<volume>281</volume>
				<issue>8</issue>
				<fpage>5209</fpage>
				<lpage>5215</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.M512465200</pubid>
						<pubid idtype="pmpid" link="fulltext">16339155</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Pristinamycin I biosynthesis in <it>Streptomyces pristinaespiralis </it>: molecular characterization of the first two structural peptide synthetase genes</p>
				</title>
				<aug>
					<au>
						<snm>de Cr&#233;ecy-Lagard</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Blanc</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Gil</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Naudin</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Lorenzon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Famechon</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Bamas-Jacques</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Crouzet</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Thibaut</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1997</pubdate>
				<volume>179</volume>
				<issue>3</issue>
				<fpage>705</fpage>
				<lpage>713</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">178751</pubid>
						<pubid idtype="pmpid" link="fulltext">9006024</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Subcloning, expression, and purification of the enterobactin biosynthetic enzyme 2,3-dihydroxybenzoate-AMP ligase: demonstration of enzyme-bound (2,3-dihydroxybenzoyl)adenylate product</p>
				</title>
				<aug>
					<au>
						<snm>Rusnak</snm>
						<fnm>FW</fnm>
					</au>
					<au>
						<snm>Faraci</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>1989</pubdate>
				<volume>28</volume>
				<issue>17</issue>
				<fpage>6827</fpage>
				<lpage>6835</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/bi00443a008</pubid>
						<pubid idtype="pmpid">2531000</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Molecular cloning of the actinomycin synthetase gene cluster from <it>Streptomyces chrysomallus </it>and functional heterologous expression of the gene encoding actinomycin synthetase II</p>
				</title>
				<aug>
					<au>
						<snm>Schauwecker</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Pfennig</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Schroder</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Keller</snm>
						<fnm>U</fnm>
					</au>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>1998</pubdate>
				<volume>180</volume>
				<issue>9</issue>
				<fpage>2468</fpage>
				<lpage>2474</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">107190</pubid>
						<pubid idtype="pmpid" link="fulltext">9573200</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Formylation domain: an essential modifying enzyme for the nonribosomal biosynthesis of linear gramicidin</p>
				</title>
				<aug>
					<au>
						<snm>Schoenafinger</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Schracke</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Linne</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>J Am Chem Soc</source>
				<pubdate>2006</pubdate>
				<volume>128</volume>
				<issue>23</issue>
				<fpage>7406</fpage>
				<lpage>7407</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16756271</pubid>
						<pubid idtype="doi">10.1021/ja0611240</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Multienzymatic non ribosomal peptide biosynthesis: identification of the functional domains catalysing peptide elongation and epimerisation</p>
				</title>
				<aug>
					<au>
						<snm>de Cr&#233;cy-Lagard</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Marli&#232;re</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Saurin</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>C R Acad Sci III</source>
				<pubdate>1995</pubdate>
				<volume>318</volume>
				<issue>9</issue>
				<fpage>927</fpage>
				<lpage>936</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8521076</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>Modular Peptide Synthetases Involved in Nonribosomal Peptide Synthesis</p>
				</title>
				<aug>
					<au>
						<snm>Marahiel</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Stachelhaus</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Mootz</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Chem Rev</source>
				<pubdate>1997</pubdate>
				<volume>97</volume>
				<issue>7</issue>
				<fpage>2651</fpage>
				<lpage>2674</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/cr960029e</pubid>
						<pubid idtype="pmpid" link="fulltext">11851476</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Timing of epimerization and condensation reactions in nonribosomal peptide assembly lines: kinetic analysis of phenylalanine activating elongation modules of tyrocidine synthetase B</p>
				</title>
				<aug>
					<au>
						<snm>Luo</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Kohli</snm>
						<fnm>RM</fnm>
					</au>
					<au>
						<snm>Onishi</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Linne</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2002</pubdate>
				<volume>41</volume>
				<issue>29</issue>
				<fpage>9184</fpage>
				<lpage>9196</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1021/bi026047+</pubid>
						<pubid idtype="pmpid" link="fulltext">12119033</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>WebLogo: a sequence logo generator</p>
				</title>
				<aug>
					<au>
						<snm>Crooks</snm>
						<fnm>GE</fnm>
					</au>
					<au>
						<snm>Hon</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Chandonia</snm>
						<fnm>JM</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>SE</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<issue>6</issue>
				<fpage>1188</fpage>
				<lpage>1190</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">419797</pubid>
						<pubid idtype="pmpid" link="fulltext">15173120</pubid>
						<pubid idtype="doi">10.1101/gr.849004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Peptide bond formation in nonribosomal peptide biosynthesis. Catalytic role of the condensation domain</p>
				</title>
				<aug>
					<au>
						<snm>Stachelhaus</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Mootz</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Bergendahl</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1998</pubdate>
				<volume>273</volume>
				<issue>35</issue>
				<fpage>22773</fpage>
				<lpage>22781</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1074/jbc.273.35.22773</pubid>
						<pubid idtype="pmpid" link="fulltext">9712910</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Dissection of the EntF condensation domain boundary and active site residues in nonribosomal peptide synthesis</p>
				</title>
				<aug>
					<au>
						<snm>Roche</snm>
						<fnm>ED</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Biochemistry</source>
				<pubdate>2003</pubdate>
				<volume>42</volume>
				<issue>5</issue>
				<fpage>1334</fpage>
				<lpage>1344</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">12564937</pubid>
						<pubid idtype="doi">10.1021/bi026867m</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Mutational analysis of the C-domain in nonribosomal peptide synthesis</p>
				</title>
				<aug>
					<au>
						<snm>Bergendahl</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Linne</snm>
						<fnm>U</fnm>
					</au>
					<au>
						<snm>Marahiel</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Eur J Biochem</source>
				<pubdate>2002</pubdate>
				<volume>269</volume>
				<issue>2</issue>
				<fpage>620</fpage>
				<lpage>629</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.0014-2956.2001.02691.x</pubid>
						<pubid idtype="pmpid" link="fulltext">11856321</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families</p>
				</title>
				<aug>
					<au>
						<snm>Kalinina</snm>
						<fnm>OV</fnm>
					</au>
					<au>
						<snm>Mironov</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Gelfand</snm>
						<fnm>MS</fnm>
					</au>
					<au>
						<snm>Rakhmaninova</snm>
						<fnm>AB</fnm>
					</au>
				</aug>
				<source>Protein Sci</source>
				<pubdate>2004</pubdate>
				<volume>13</volume>
				<issue>2</issue>
				<fpage>443</fpage>
				<lpage>456</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">14739328</pubid>
						<pubid idtype="doi">10.1110/ps.03191704</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>FRpred &#8211; A Package for Prediction of Functional Residues in Protein Multiple Sequence Alignments</p>
				</title>
				<aug>
					<au>
						<snm>Fischer</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>Ponjavic</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Kohlbacher</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Lupas</snm>
						<fnm>AN</fnm>
					</au>
					<au>
						<snm>S&#246;ding</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Proceedings of the German Conference in Bioinformatics 2006 &#8211; Poster Abstracts</source>
				<pubdate>2006</pubdate>
			</bibl>
			<bibl id="B37">
				<title>
					<p>DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment</p>
				</title>
				<aug>
					<au>
						<snm>Morgenstern</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>1999</pubdate>
				<volume>15</volume>
				<issue>3</issue>
				<fpage>211</fpage>
				<lpage>218</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">10222408</pubid>
						<pubid idtype="doi">10.1093/bioinformatics/15.3.211</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B38">
				<title>
					<p>Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs)</p>
				</title>
				<aug>
					<au>
						<snm>Rausch</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Weber</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Kohlbacher</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Wohlleben</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Huson</snm>
						<fnm>DH</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<issue>18</issue>
				<fpage>5799</fpage>
				<lpage>5808</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1253831</pubid>
						<pubid idtype="pmpid" link="fulltext">16221976</pubid>
						<pubid idtype="doi">10.1093/nar/gki885</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B39">
				<title>
					<p>The Biosynthesis of Vancomycin-Type Glycopeptide Antibiotics-The Order of the Cyclization Steps</p>
				</title>
				<aug>
					<au>
						<snm>Bischoff</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Pelzer</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Bister</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Nicholson</snm>
						<fnm>GJ</fnm>
					</au>
					<au>
						<snm>Stockert</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Schirle</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Wohlleben</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Jung</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>S&#252;ssmuth</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>Angew Chem Int Ed Engl</source>
				<pubdate>2001</pubdate>
				<volume>40</volume>
				<issue>24</issue>
				<fpage>4688</fpage>
				<lpage>4691</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1521-3773(20011217)40:24&lt;4688::AID-ANIE4688>3.0.CO;2-M</pubid>
						<pubid idtype="pmpid" link="fulltext">12404385</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B40">
				<title>
					<p>The Biosynthesis of Vancomycin-Type Glycopeptide Antibiotics &#8211; New Insights into the Cyclization Steps</p>
				</title>
				<aug>
					<au>
						<snm>Bischoff</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Pelzer</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>H&#246;ltzel</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Nicholson</snm>
						<fnm>GJ</fnm>
					</au>
					<au>
						<snm>Stockert</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Wohlleben</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Jung</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>S&#252;ssmuth</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>Angew Chem Int Ed Engl</source>
				<pubdate>2001</pubdate>
				<volume>40</volume>
				<issue>9</issue>
				<fpage>1693</fpage>
				<lpage>1696</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1521-3773(20010504)40:9&lt;1693::AID-ANIE16930>3.0.CO;2-8</pubid>
						<pubid idtype="pmpid" link="fulltext">11353482</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B41">
				<title>
					<p>The gene cluster for the biosynthesis of the glycopeptide antibiotic A40926 by <it>Nonomuraea </it>species</p>
				</title>
				<aug>
					<au>
						<snm>Sosio</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Stinchi</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Beltrametti</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Lazzarini</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Donadio</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Chem Biol</source>
				<pubdate>2003</pubdate>
				<volume>10</volume>
				<issue>6</issue>
				<fpage>541</fpage>
				<lpage>549</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S1074-5521(03)00120-0</pubid>
						<pubid idtype="pmpid" link="fulltext">12837387</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B42">
				<title>
					<p>Heterologous expression in <it>Escherichia coli </it>of the first module of the nonribosomal peptide synthetase for chloroeremomycin, a vancomycin-type glycopeptide antibiotic</p>
				</title>
				<aug>
					<au>
						<snm>Trauger</snm>
						<fnm>JW</fnm>
					</au>
					<au>
						<snm>Walsh</snm>
						<fnm>CT</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<issue>7</issue>
				<fpage>3112</fpage>
				<lpage>3117</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">16201</pubid>
						<pubid idtype="pmpid" link="fulltext">10716695</pubid>
						<pubid idtype="doi">10.1073/pnas.040560597</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B43">
				<title>
					<p>Purification and characterization of eucaryotic alanine racemase acting as key enzyme in cyclosporin biosynthesis</p>
				</title>
				<aug>
					<au>
						<snm>Hoffmann</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Schneider-Scherzer</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Kleinkauf</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Zocher</snm>
						<fnm>R</fnm>
					</au>
				</aug>
				<source>J Biol Chem</source>
				<pubdate>1994</pubdate>
				<volume>269</volume>
				<issue>17</issue>
				<fpage>12710</fpage>
				<lpage>12714</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8175682</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B44">
				<title>
					<p>Phylogenetic analysis of condensation domains in the nonribosomal peptide synthetases</p>
				</title>
				<aug>
					<au>
						<snm>Roongsawang</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Lim</snm>
						<fnm>SP</fnm>
					</au>
					<au>
						<snm>Washio</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Takano</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kanaya</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Morikawa</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>FEMS Microbiol Lett</source>
				<pubdate>2005</pubdate>
				<volume>252</volume>
				<fpage>143</fpage>
				<lpage>151</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16182472</pubid>
						<pubid idtype="doi">10.1016/j.femsle.2005.08.041</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B45">
				<title>
					<p>The Universal Protein Resource (UniProt): an expanding universe of protein information</p>
				</title>
				<aug>
					<au>
						<snm>Wu</snm>
						<fnm>CH</fnm>
					</au>
					<au>
						<snm>Apweiler</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Bairoch</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Natale</snm>
						<fnm>DA</fnm>
					</au>
					<au>
						<snm>Barker</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>Boeckmann</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Ferro</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gasteiger</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Huang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lopez</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Magrane</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Martin</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Mazumder</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>O'Donovan</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Redaschi</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Suzek</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2006</pubdate>
				<issue>34 Database</issue>
				<fpage>D187</fpage>
				<lpage>D191</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1347523</pubid>
						<pubid idtype="pmpid" link="fulltext">16381842</pubid>
						<pubid idtype="doi">10.1093/nar/gkj161</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B46">
				<title>
					<p>The Pfam protein families database</p>
				</title>
				<aug>
					<au>
						<snm>Bateman</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Coin</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Durbin</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Finn</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Hollich</snm>
						<fnm>V</fnm>
					</au>
					<au>
						<snm>Griffiths-Jones</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Khanna</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Marshall</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Moxon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
					<au>
						<snm>Studholme</snm>
						<fnm>DJ</fnm>
					</au>
					<au>
						<snm>Yeats</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<issue>32 Database</issue>
				<fpage>D138</fpage>
				<lpage>D141</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">308855</pubid>
						<pubid idtype="pmpid" link="fulltext">14681378</pubid>
						<pubid idtype="doi">10.1093/nar/gkh121</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B47">
				<title>
					<p>The TIGRFAMs database of protein families</p>
				</title>
				<aug>
					<au>
						<snm>Haft</snm>
						<fnm>DH</fnm>
					</au>
					<au>
						<snm>Selengut</snm>
						<fnm>JD</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>371</fpage>
				<lpage>373</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">165575</pubid>
						<pubid idtype="pmpid" link="fulltext">12520025</pubid>
						<pubid idtype="doi">10.1093/nar/gkg128</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B48">
				<aug>
					<au>
						<snm>Durbin</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Eddy</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Krogh</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mitchison</snm>
						<fnm>G</fnm>
					</au>
				</aug>
				<source>Biological Sequence Analysis</source>
				<publisher>Cambridge, UK: Cambridge University Press</publisher>
				<pubdate>1998</pubdate>
			</bibl>
			<bibl id="B49">
				<title>
					<p>Biosequence analysis using profile hidden Markov models</p>
				</title>
				<url>http://hmmer.janelia.org</url>
			</bibl>
			<bibl id="B50">
				<title>
					<p>The Perl Directory</p>
				</title>
				<url>http://www.perl.org</url>
			</bibl>
			<bibl id="B51">
				<title>
					<p>The Bioperl toolkit: Perl modules for the life sciences</p>
				</title>
				<aug>
					<au>
						<snm>Stajich</snm>
						<fnm>JE</fnm>
					</au>
					<au>
						<snm>Block</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Boulez</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Brenner</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Chervitz</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Dagdigian</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Fuellen</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Gilbert</snm>
						<fnm>JGR</fnm>
					</au>
					<au>
						<snm>Korf</snm>
						<fnm>I</fnm>
					</au>
					<au>
						<snm>Lapp</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Lehv&#228;slaiho</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Matsalla</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Mungall</snm>
						<fnm>CJ</fnm>
					</au>
					<au>
						<snm>Osborne</snm>
						<fnm>BI</fnm>
					</au>
					<au>
						<snm>Pocock</snm>
						<fnm>MR</fnm>
					</au>
					<au>
						<snm>Schattner</snm>
						<fnm>P</fnm>
					</au>
					<au>
						<snm>Senger</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Stein</snm>
						<fnm>LD</fnm>
					</au>
					<au>
						<snm>Stupka</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Wilkinson</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Birney</snm>
						<fnm>E</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2002</pubdate>
				<volume>12</volume>
				<issue>10</issue>
				<fpage>1611</fpage>
				<lpage>1618</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">187536</pubid>
						<pubid idtype="pmpid" link="fulltext">12368254</pubid>
						<pubid idtype="doi">10.1101/gr.361602</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B52">
				<title>
					<p>MUSCLE: a multiple sequence alignment method with reduced time and space complexity</p>
				</title>
				<aug>
					<au>
						<snm>Edgar</snm>
						<fnm>RC</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>113</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">517706</pubid>
						<pubid idtype="pmpid" link="fulltext">15318951</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-113</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B53">
				<title>
					<p>MUSCLE: multiple sequence alignment with high accuracy and high throughput</p>
				</title>
				<aug>
					<au>
						<snm>Edgar</snm>
						<fnm>RC</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<issue>5</issue>
				<fpage>1792</fpage>
				<lpage>1797</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">390337</pubid>
						<pubid idtype="pmpid" link="fulltext">15034147</pubid>
						<pubid idtype="doi">10.1093/nar/gkh340</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B54">
				<title>
					<p>Quick2D</p>
				</title>
				<url>http://toolkit.tuebingen.mpg.de/quick2_d</url>
			</bibl>
			<bibl id="B55">
				<title>
					<p>The MPI Bioinformatics Toolkit for protein sequence analysis</p>
				</title>
				<aug>
					<au>
						<snm>Biegert</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Mayer</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Remmert</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>S&#246;ding</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Lupas</snm>
						<fnm>AN</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2006</pubdate>
				<issue>34 Web Server</issue>
				<fpage>W335</fpage>
				<lpage>W339</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">1538786</pubid>
						<pubid idtype="pmpid" link="fulltext">16845021</pubid>
						<pubid idtype="doi">10.1093/nar/gkl217</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B56">
				<title>
					<p>Amino acids determining enzyme-substrate specificity in prokaryotic and eukaryotic protein kinases</p>
				</title>
				<aug>
					<au>
						<snm>Li</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Shakhnovich</snm>
						<fnm>EI</fnm>
					</au>
					<au>
						<snm>Mirny</snm>
						<fnm>LA</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2003</pubdate>
				<volume>100</volume>
				<issue>8</issue>
				<fpage>4463</fpage>
				<lpage>4468</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">153578</pubid>
						<pubid idtype="pmpid" link="fulltext">12679523</pubid>
						<pubid idtype="doi">10.1073/pnas.0737647100</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B57">
				<title>
					<p>FRpred &#8211; Bioinformatics Toolkit &#8211; Max Planck Institute for 
Developmental Biology</p>
				</title>
				<url>http://toolkit.tuebingen.mpg.de/frpred</url>
			</bibl>
			<bibl id="B58">
				<aug>
					<au>
						<snm>Nai</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kumar</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>Biological Sequence Analysis</source>
				<publisher>Oxford University Press Inc, USA</publisher>
				<pubdate>2000</pubdate>
			</bibl>
			<bibl id="B59">
				<title>
					<p>The rapid generation of mutation data matrices from protein sequences</p>
				</title>
				<aug>
					<au>
						<snm>Jones</snm>
						<fnm>DT</fnm>
					</au>
					<au>
						<snm>Taylor</snm>
						<fnm>WR</fnm>
					</au>
					<au>
						<snm>Thornton</snm>
						<fnm>JM</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1992</pubdate>
				<volume>8</volume>
				<issue>3</issue>
				<fpage>275</fpage>
				<lpage>282</lpage>
				<xrefbib>
					<pubid idtype="pmpid">1633570</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B60">
				<title>
					<p>A simple method for estimating the parameter of substitution rate variation among sites</p>
				</title>
				<aug>
					<au>
						<snm>Gu</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1997</pubdate>
				<volume>14</volume>
				<issue>11</issue>
				<fpage>1106</fpage>
				<lpage>1113</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">9364768</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B61">
				<title>
					<p>PHYLIP (PHYLogeny Inference Package) version 3.66</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Distributed by the author Department of Genome Sciences, University of Washington, Seattle</source>
				<pubdate>2006</pubdate>
				<url>http://evolution.genetics.washington.edu/phylip.html</url>
			</bibl>
			<bibl id="B62">
				<title>
					<p>The Neighbor-Joining method: a new method for reconstructing phylogenetic trees</p>
				</title>
				<aug>
					<au>
						<snm>Saitou</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Nei</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>1987</pubdate>
				<volume>4</volume>
				<issue>4</issue>
				<fpage>406</fpage>
				<lpage>425</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">3447015</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B63">
				<title>
					<p>IQPNNI: moving fast through tree space and stopping in time</p>
				</title>
				<aug>
					<au>
						<snm>Vinh</snm>
						<fnm>LS</fnm>
					</au>
					<au>
						<snm>von Haeseler</snm>
						<fnm>A</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<issue>8</issue>
				<fpage>1565</fpage>
				<lpage>71</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">15163768</pubid>
						<pubid idtype="doi">10.1093/molbev/msh176</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B64">
				<title>
					<p>A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood</p>
				</title>
				<aug>
					<au>
						<snm>Guindon</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Gascuel</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>2003</pubdate>
				<volume>52</volume>
				<issue>5</issue>
				<fpage>696</fpage>
				<lpage>704</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1080/10635150390235520</pubid>
						<pubid idtype="pmpid">14530136</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B65">
				<title>
					<p>Confidence limits on phylogenies: An approach using the bootstrap</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Evolution</source>
				<pubdate>1985</pubdate>
				<volume>39</volume>
				<fpage>783</fpage>
				<lpage>791</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2408678</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B66">
				<title>
					<p>An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis</p>
				</title>
				<aug>
					<au>
						<snm>Hillis</snm>
						<fnm>DM</fnm>
					</au>
					<au>
						<snm>Bull</snm>
						<fnm>JJ</fnm>
					</au>
				</aug>
				<source>Syst Biol</source>
				<pubdate>1993</pubdate>
				<volume>42</volume>
				<issue>2</issue>
				<fpage>182</fpage>
				<lpage>192</lpage>
				<xrefbib>
					<pubid idtype="doi">10.2307/2992540</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B67">
				<title>
					<p>Application of phylogenetic networks in evolutionary studies</p>
				</title>
				<aug>
					<au>
						<snm>Huson</snm>
						<fnm>DH</fnm>
					</au>
					<au>
						<snm>Bryant</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2006</pubdate>
				<volume>23</volume>
				<issue>2</issue>
				<fpage>254</fpage>
				<lpage>267</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">16221896</pubid>
						<pubid idtype="doi">10.1093/molbev/msj030</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B68">
				<title>
					<p>Fitting a mixture model by expectation maximization to discover motifs in biopolymers</p>
				</title>
				<aug>
					<au>
						<snm>Bailey</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Elkan</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Proc Int Conf Intell Syst Mol Biol</source>
				<pubdate>1994</pubdate>
				<volume>2</volume>
				<fpage>28</fpage>
				<lpage>36</lpage>
				<xrefbib>
					<pubid idtype="pmpid">7584402</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B69">
				<title>
					<p>MEME &#8211; Multiple EM for Motif Elicitation</p>
				</title>
				<url>http://meme.sdsc.edu</url>
			</bibl>
			<bibl id="B70">
				<title>
					<p>The Jalview Java alignment editor</p>
				</title>
				<aug>
					<au>
						<snm>Clamp</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Cuff</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Searle</snm>
						<fnm>SM</fnm>
					</au>
					<au>
						<snm>Barton</snm>
						<fnm>GJ</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>20</volume>
				<issue>3</issue>
				<fpage>426</fpage>
				<lpage>427</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmpid" link="fulltext">14960472</pubid>
						<pubid idtype="doi">10.1093/bioinformatics/btg430</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B71">
				<title>
					<p>SplitsTree4</p>
				</title>
				<url>http://www.splitstree.org</url>
			</bibl>
			<bibl id="B72">
				<title>
					<p>HMM Logos for visualization of protein families</p>
				</title>
				<aug>
					<au>
						<snm>Schuster-B&#246;ckler</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Schultz</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Rahmann</snm>
						<fnm>S</fnm>
					</au>
				</aug>
				<source>BMC Bioinformatics</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>7</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">341448</pubid>
						<pubid idtype="pmpid" link="fulltext">14736340</pubid>
						<pubid idtype="doi">10.1186/1471-2105-5-7</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>
