<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2003-4-6-r36</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>The mosaic structure of the symbiotic plasmid of <it>Rhizobium etli </it> CFN42 and its relation to other symbiotic genome compartments</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Gonz&#225;lez</snm>
               <fnm>V&#237;ctor</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A2">
               <snm>Bustos</snm>
               <fnm>Patricia</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3">
               <snm>Ram&#237;rez-Romero</snm>
               <mi>A</mi>
               <fnm>Miguel</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A4">
               <snm>Medrano-Soto</snm>
               <fnm>Arturo</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A5">
               <snm>Salgado</snm>
               <fnm>Heladia</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A6">
               <snm>Hern&#225;ndez-Gonz&#225;lez</snm>
               <fnm>Ismael</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A7">
               <snm>Hern&#225;ndez-Celis</snm>
               <mnm>Carlos</mnm>
               <fnm>Juan</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A8">
               <snm>Quintero</snm>
               <fnm>Ver&#243;nica</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A9">
               <snm>Moreno-Hagelsieb</snm>
               <fnm>Gabriel</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A10">
               <snm>Girard</snm>
               <fnm>Lourdes</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A11">
               <snm>Rodr&#237;guez</snm>
               <fnm>Oscar</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A12">
               <snm>Flores</snm>
               <fnm>Margarita</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A13">
               <snm>Cevallos</snm>
               <mi>A</mi>
               <fnm>Miguel</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A14">
               <snm>Collado-Vides</snm>
               <fnm>Julio</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A15">
               <snm>Romero</snm>
               <fnm>David</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A16" ca="yes">
               <snm>D&#225;vila</snm>
               <fnm>Guillermo</fnm>
               <insr iid="I1"/>
               <email>davila@cifn.unam.mx</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Centro de Investigaci&#243;n Sobre Fijaci&#243;n de Nitr&#243;geno, Universidad Nacional Aut&#243;noma de M&#233;xico, Cuernavaca, Morelos, M&#233;xico 62210</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>6</issue>
         <fpage>R36</fpage>
         <url>http://genomebiology.com/2003/4/6/R36</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">12801410</pubid>
               <pubid idtype="doi">10.1186/gb-2003-4-6-r36</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>1</day>
               <month>11</month>
               <year>2002</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>6</day>
               <month>3</month>
               <year>2003</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>2</day>
               <month>4</month>
               <year>2003</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>13</day>
               <month>5</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2003</year>
         <collab>Gonz&#225;lez et al.; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all
media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <shorttitle>
         <p>The mosaic structure of the symbiotic plasmid of <it>Rhizobium etli </it> CFN42 and its relation to other symbiotic genome compartments</p>
      </shorttitle>
      <shortabs>
         <p>In rhizobia, essential genes for symbiosis are compartmentalized in symbiotic plasmids or in chromosomal symbiotic islands. The complete sequence of the symbiotic plasmid of <it>Rhizobium etli</it> CFN42, a microsymbiont of beans is reported, along with and a comparison with other symbiotic genome compartments sequences available.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Symbiotic bacteria known as rhizobia interact with the roots of legumes and induce the formation of nitrogen-fixing nodules. In rhizobia, essential genes for symbiosis are compartmentalized either in symbiotic plasmids or in chromosomal symbiotic islands. To understand the structure and evolution of the symbiotic genome compartments (SGCs), it is necessary to analyze their common genetic content and organization as well as to study their differences. To date, five SGCs belonging to distinct species of rhizobia have been entirely sequenced. We report the complete sequence of the symbiotic plasmid of <it>Rhizobium etli </it>CFN42, a microsymbiont of beans, and a comparison with other SGC sequences available.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The symbiotic plasmid is a circular molecule of 371,255 base-pairs containing 359 coding sequences. Nodulation and nitrogen-fixation genes common to other rhizobia are clustered in a region of 125 kilobases. Numerous sequences related to mobile elements are scattered throughout. In some cases the mobile elements flank blocks of functionally related sequences, thereby suggesting a role in transposition. The plasmid contains 12 reiterated DNA families that are likely to participate in genomic rearrangements. Comparisons between this plasmid and complete rhizobial genomes and symbiotic compartments already sequenced show a general lack of synteny and colinearity, with the exception of some transcriptional units. There are only 20 symbiotic genes that are shared by all SGCs.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Our data support the notion that the symbiotic compartments of rhizobia genomes are mosaic structures that have been frequently tailored by recombination, horizontal transfer and transposition.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010009">Genetics</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Nitrogen-fixing symbiotic bacteria grouped within the Rhizobiaceae, Phyllobacteriaceae and Bradyrhizobiaceae families are widespread in nature <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Ordinarily known as rhizobia, these organisms contain genomes of one or two chromosomes and several large plasmids ranging in size from about 100 kilobases (kb) to more than 2 megabases (Mb). A common feature of the genomes of rhizobia is that the genes involved in the symbiotic process are located in specific symbiotic genome compartments (SGCs), either as independent replicons known as symbiotic plasmids (pSym) or as symbiotic islands or regions within the chromosome. Complete genome sequences have been recently reported for <it>Mesorhizobium loti </it>MAFF303099 <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, <it>Sinorhizobium meliloti </it>1021 <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp>, <it>Bradyrhizobium japonicum </it>USDA110 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> and the non-nitrogen-fixing close relative <it>Agrobacterium tumefaciens </it>C58 <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. In addition, the sequence of the pSym of <it>Rhizobium </it>species NGR234 - pNGR234a <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> - as well as that of the chromosomal symbiotic regions of <it>B. japonicum </it>USDA110 and <it>M. loti </it>R7A have been reported <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>. Genomic comparisons reveal that the chromosomes of <it>S. meliloti</it>, <it>M. loti</it>, and the circular chromosome of <it>A. tumefaciens </it>have more than 50% of orthologous genes in common <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>. A clear syntenic relationship is observed between the circular chromosomes of <it>S. meliloti </it>and <it>A. tumefaciens </it><abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp> and albeit to a lesser extent, synteny is also apparent when both are compared to the chromosome of <it>M. loti </it><abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. These results lead to the hypothesis that rhizobial chromosomes have a common ancestral origin <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>. Other genome constituents of rhizobia (that is, other chromosomes and plasmids) are thought to be the result of subsequent events of genomic rearrangements and horizontal transfer <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp>, but the precise mechanisms involved in their generation have not been elucidated so far.</p>
         <p>Here we report the complete DNA sequence of the pSym (p42d) of <it>Rhizobium etli </it>CFN42 and its comparative analysis with other rhizobial SGCs. <it>R. etli </it>is the symbiont of the common bean <it>Phaseolus vulgaris </it>and has been widely used as model for metabolic and genome dynamics studies <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Its genome is composed by one chromosome and six plasmids ranging in size from 184 kb to about 600 kb <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. The physical map of p42d was previously determined and was the basis for obtaining the entire sequence <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. In this study we show that the SGCs are heterogeneous in sequence, gene composition and gene order. There are only 20 symbiotic genes that are shared by all SGCs. There are also some conserved gene clusters of related function that are present in some SGCs, but absent in others. Besides genes unique to a particular SGC, several orthologous genes are located in different genome contexts in other rhizobia. Other common features to all SGCs, such as reiterated genes, pseudogenes, and a large amount of insertion sequences (ISs), support the view that p42d, as well as other SGCs, is a mosaic structure that may have assembled from different genome contexts, either chromosomal or plasmidic.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>General features of p42d</p>
            </st>
            <p>The symbiotic plasmid p42d is a circular molecule of 371,255 base-pairs (bp) (Figure <figr fid="F1">1</figr>) that belongs to the <it>repABC </it>type of replicator <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. We identified 359 coding sequences (CDS), of which 63% have an assigned function, 17% have homologs in databases without an assigned function, and 20% are orphan (Figure <figr fid="F1">1</figr>, see also Additional data file 1). The CDS distribution between the two strands is asymmetrical, with 61% of them located in the minus strand. The plus and minus strands were defined according to the previously reported physical map <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Moreover, the plus strand contains two reiterated <it>nifHDK </it>gene clusters in a clockwise orientation (NRa and NRb, Figure <figr fid="F1">1</figr>). The main functional classes of genes identified are: transport, nitrogen fixation, nodulation and transcriptional regulation. Ten pseudogenes related to known genes were identified that carry deletions and frameshifts at their amino or carboxyl termini. The plasmid also contains many reiterated sequences and a large number of elements related to insertion sequences (ERIS) accounting for 10% of the entire sequence. The major reiterations (28 elements) were grouped into 12 families on the basis of their sequence similarity (see below).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Structure of the symbiotic plasmid p42d of <it>R. etli </it>CNF42</p>
               </caption>
               <text>
                  <p>Structure of the symbiotic plasmid p42d of <it>R. etli </it>CNF42. The structure of p42d is represented in five concentric circles. Outermost circle, relevant regions referred to in the text: NRa, b and c, regions containing nitrogenase structural genes; FIX1 and FIX2, clusters containing nitrogen-fixation genes; NOD, major cluster of nodulation genes; CPX, cluster for cytochrome P450; TRA, cluster for <it>tra </it>genes; REP, replicator region; TSSIII and IV, clusters for transport secretion system genes. The 125 kb region that contains most of the symbiotic genes, described in the text as a putative mobile element, is shown in green. Second circle, organization of predicted CDSs located according to the direction of transcription color-coded as below; those transcribed on the plus strand are shown in the outer half of the circle. For each class, the number of CDSs and the percentage of the total are: hypothetical (70) 19.5% (dark red); hypothetical conserved (62) 17.3% (red); integration recombination (55) 15.3% (purple); various enzymatic functions (45) 12.3% (khaki); transport secretion systems (37) 10.3% (gray); nitrogen fixation (35) 9.8% (yellow); nodulation (18) 5% (dark blue); transcriptional regulation (15) 4.2% (light blue); plasmid maintenance (10) 2.8% (orange); electron transfer (7) 2.1% (magenta); chemotaxis (3) 0.8% (pink); and polysaccharide synthesis (2) 0.6% (green). Third circle, elements related to insertion sequences (ERIS). Putative partial ISs (purple), and putative complete ISs (black). Fourth circle, reiterated DNA families. The major reiterated families (see text) are shown in different colors. Innermost circle, potential genomic rearrangements. Arrowheads indicate the sites for homologous recombination leading to genomic rearrangements. Black lines connect sites for amplification or deletion events; red lines connect sites for inversion.</p>
               </text>
               <graphic file="gb-2003-4-6-r36-1"/>
            </fig>
            <p>The average GC content of the plasmid is 58.1%. When genes were classified as low, average or high GC content (using the mean GC &#177; 1 standard deviation as thresholds), we observed a clear distinction between high or low GC in some gene clusters (Figure <figr fid="F2">2a</figr>). Several hypothetical genes and the <it>nod </it>genes show the lowest GC values (&lt; 55%), whereas the highest GC values (> 62%) were displayed by the genes for cytochrome P450 (CPX), <it>tra </it>genes (TRA), and the genes for type III (TSSIII) and type IV (TSSIV) transport secretion systems. Similarly, when the genes were classified according to poor, typical or rich codon usage (CU) (see Materials and methods for details), genes with high GC also exhibited a rich CU (Figure <figr fid="F2">2b</figr>), whereas the GC-CU correlation was found to be lower for other genes. For example, the regions that contain the nitrogenase structural genes, other <it>nif </it>genes (NRa, b and c, see below), and the <it>fixNOQPGHIS </it>genes (FIX1) showed average GC content but rich CU. The variable correlation between GC content and CU levels reveals sequence heterogeneity within p42d and suggests a dynamic structure for this plasmid, presumably as a consequence of extensive genomic rearrangements, recombination rates, lateral transfer, and relaxation or intensification of selective pressures.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Compositional features of the coding sequences (CDS) of p42d</p>
               </caption>
               <text>
                  <p>Compositional features of the coding sequences (CDS) of p42d. <b>(a) </b>GC content, and <b>(b) </b>CU of the 359 CDS of p42d. Red lines indicate the average in GC (58.1%) and CU (0.58). Blue lines indicate 1 standard deviation of GC &#177; 3.5% and CU &#177; 0.16. Highest and lowest percentage values of GC are 69.4 and 45.8 respectively. The CU limit values varies from 0.11 to 1.00. <b>(c) </b>CDS distribution with the color codes for functional classes and the relevant regions described in Figure <figr fid="F1">1</figr>.</p>
               </text>
               <graphic file="gb-2003-4-6-r36-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Organization of genes involved in nodulation</p>
            </st>
            <p>Most <it>nod </it>genes present in p42d are clustered in a region of 16 kb (NOD); however, <it>nodA </it>is separated from <it>nodBC </it>by 27 kb. The Nod factor backbone of <it>R. etli </it>CFN42 is an <it>N</it>-acetylglucosamine pentasaccharide synthesized by the common <it>nodA </it>and <it>nodBC </it>gene products. Modifications to this backbone consist of methyl and carbamoyl groups at the non-reducing end, while the reducing one is modified by the addition of a fucosyl group that is in turn acetylated <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The methyltransferase, fucosyltransferase and acetyltransferase activities required for these modifications are encoded by <it>nodS</it>, <it>nodZ </it>and <it>nolL</it>, respectively. It is unclear, however, which gene product is responsible for the carbamoylation of the Nod factor, as <it>nodU</it>, the most likely gene to carry out this function, is a pseudogene. The two membrane proteins encoded by <it>nodI </it>and <it>nodJ </it>(located downstream of <it>nodBCSU</it>), participate in the transport of the Nod factor to the outside of the cell <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. Other genes present in p42d whose homologs in other rhizobia have a role in nodulation are <it>nolO</it>, <it>nolE</it>, <it>nolT </it>and <it>nolV</it>, the last two being part of the TSSIII system (see below). In addition to <it>nodU</it>, two other pseudogenes, <it>noeI </it>and <it>nodQ</it>, were identified.</p>
            <p>The expression of <it>nod </it>genes depends on the activity of NodD proteins <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, which interact with specific sites known as <it>nod </it>boxes located upstream of the <it>nod </it>operons <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The sequence of the p42d revealed three <it>nodD </it>genes; <it>nodD</it><sub>1 </sub>is present in the NOD region while <it>nodD</it><sub>2 </sub>and <it>nodD</it><sub>3 </sub>are 50 kb apart. We also predict 15 potential <it>nod </it>boxes (see Materials and methods and Additional data file 2), seven of which are associated with almost all <it>nod </it>genes: <it>nodA</it>, <it>nodZ</it>, <it>nodBCSU</it>, <it>nolE</it>, <it>nodD</it><sub>1</sub>, <it>nodD</it><sub>2</sub>, and <it>nodD</it><sub>3</sub>. The rest of the <it>nod </it>boxes are located proximal to genes so far unrelated to the nodulation process; namely, the genes <it>bglS </it>(&#946;-glucosidase), <it>yp108 </it>(putative monooxygenase), and the orphans <it>yh005</it>, <it>yh007 </it>and <it>yh050</it>. There is also a putative <it>nod </it>box upstream of the gene encoding NifA, the major transcriptional regulator of the nitrogen-fixation genes. Even though the regulation of <it>nifA </it>is variable among rhizobia, dependence on flavonoid induction is unknown.</p>
         </sec>
         <sec>
            <st>
               <p>Organization of the genes involved in nitrogen fixation</p>
            </st>
            <p>The <it>nif </it>and <it>fix </it>genes are distributed in five regions spanning a total of 125 kb (Figure <figr fid="F1">1</figr>); the NOD cluster mentioned previously maps within this section as well. There are three copies of the nitrogenase reductase gene <it>nifH </it><abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, defining the three <it>nif </it>regions (NR) a, b and c (Figure <figr fid="F1">1</figr>). NRa contains <it>nifHDK </it>genes and a truncated <it>nifE </it>pseudogene; NRb contains the <it>nifHDKENX </it>genes; NRc contains <it>nifH </it>and a truncated <it>nifD </it>pseudogene. The largest reiterated regions found in p42d correspond to NRa and NRb regions that share 4,470 identical nucleotides. The NRc region of 1,131 nucleotides is identical to sequences within NRa and NRb. The orientation of NRc is inverted in relation to the direction of NRa and NRb. Recent duplications of these NR regions might underlie the unusually high sequence identity between them. Alternatively, a mechanism of 'copy-correction' resembling gene conversion may be involved in maintaining nucleotide identity <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>.</p>
            <p>The highest density of <it>nif </it>and <it>fix </it>genes in p42d occurs 10 kb upstream of NRb, in the FIX2 region. This contains the <it>fixABCX </it>genes that encode a flavoprotein <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> (see below); <it>nifB</it>, which is needed for the synthesis of the iron-molybdenum cofactor <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>; <it>nifW </it>and <it>nifZ</it>, whose products may be required for protection of the nitrogenase from oxygen <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr></abbrgrp>; and the genes for the regulatory proteins NifA and RpoN2. The genes <it>nifU</it>, <it>nifS </it>and <it>hesB </it>(also named <it>iscN</it>) also map in the FIX2 region. The products of these genes have been implicated in the formation of the Fe-S cluster required for nitrogenase complex function <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. In <it>R. etli </it>CNPAF512, the inactivation of <it>hesB </it>(<it>iscN</it>) results in a Fix<sup>- </sup>phenotype <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Other genes commonly found in <it>nif </it>regions of rhizobia were also identified in the FIX2 region. These are the ferredoxin gene <it>fdxN</it>, which is essential for nitrogen fixation in <it>S. meliloti </it><abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, and the gene for the anaerobic transcriptional regulator FnrNd (see below). The products of <it>nifV </it>and <it>nifQ </it>have been involved in the synthesis of the iron-molybdenum cofactor <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>; nevertheless, <it>nifV </it>is absent in the p42d and <it>nifQ </it>is located upstream of <it>nifHc </it>in the NRc region.</p>
         </sec>
         <sec>
            <st>
               <p>RpoN regulation</p>
            </st>
            <p>The RpoN (&#963;<sup>N</sup>, also known as &#963;<sup>54</sup>) subunit of the RNA polymerase, encoded by <it>rpoN</it>, and the transcriptional activator NifA protein, encoded by <it>nifA </it>(both present in the FIX2 region, Figure <figr fid="F1">1</figr>), participate in the regulation of <it>nif </it>genes. RpoN binds to specific promoter regions and interacts with the NifA protein that binds to specific upstream activator sequences (UAS) <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. In <it>R. etli </it>CNPAF512, two <it>rpoN </it>genes have been described <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, one located in the chromosome (<it>rpoN</it><sub>1</sub>), and the other in the pSym (<it>rpoN</it><sub>2</sub>). The <it>rpoN </it>gene found in the p42d is orthologous to <it>rpoN</it><sub>2</sub>. Regulation by RpoN and NifA has been demonstrated for <it>nifH </it>a, b and c <abbrgrp><abbr bid="B34">34</abbr></abbrgrp>. We predicted, as described in Materials and methods, potential RpoN-binding sites and UAS for NifA in the upstream region of several genes (see Additional data file 3). Both types of sites were also identified upstream of other genes; the reiterated <it>yp003</it>, <it>yp021 </it>and <it>yp099 </it>genes that encode the recently described BacS protein, highly expressed in nodules <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>; <it>yp010 </it>in the putative operon for terpenoid synthesis <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>; the <it>fixA</it>, <it>hesB</it>, and <it>cpxA5 </it>genes; and <it>yp104</it>, which encodes a toxin-transport-related protein. The expression of <it>yp003 </it>(<it>bacS</it>) and <it>hesB </it>(<it>iscN</it>) has recently been shown to depend on NifA <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B35">35</abbr></abbrgrp>.</p>
            <p>RpoN-like promoters were also predicted upstream of several genes for which no associated NifA-binding sites could be detected (see Additional data file 3). Among them are the nitrogen-fixation genes <it>fixO</it>, <it>nifQ </it>and <it>nifB</it>; the genes for the putative decarboxylase, <it>pcaC1</it>, and alcohol dehydrogenase, <it>xylB2</it>. Furthermore, potential sites for RpoN were also found in several genes of unknown function. Recently, Dombrecht <it>et al. </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp> predicted RpoN promoter sites in all complete rhizobial genomes and p42d; we report here a larger set of genes potentially regulated by RpoN in p42d. It includes genes for nitrogen fixation, electron transfer, transport, and several of unknown function. The genes reported by Dombrecht <it>et al. </it>are mainly in <it>nif </it>and <it>fix </it>genes, the ferredoxins (<it>fdxB </it>and <it>N</it>; not predicted by us), and some genes of unknown function <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. The differences between their results and ours may be explained in part by the different strategies used to construct the weight-matrices in both studies, which in our case includes only 85 RpoN promoters whose transcription start sites have been experimentally determined, instead of the whole set of 186 promoters used by Dombrecht <it>et al. </it>(<abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, see also <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>); see Materials and methods for details.</p>
         </sec>
         <sec>
            <st>
               <p>Energy supply and anaerobic regulation</p>
            </st>
            <p>The electron flux and supply of energy for the reduction of molecular nitrogen requires the flavoprotein encoded by the <it>fixABCX </it>genes mentioned above (FIX2 region, Figure <figr fid="F1">1</figr>), a specific cytochrome oxidase encoded by <it>fixNOQP </it>genes, and a cation pump encoded by <it>fixGHIS </it>genes. The latter clusters in the region FIX1 (Figure <figr fid="F1">1</figr>). A second copy of <it>fixNOQP </it>and <it>fixG </it>genes has been found in the plasmid p42f <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>.</p>
            <p>In the symbiotic state, the cytochrome terminal oxidases encoded by the <it>fixNOQP </it>operon provide the energy required to fix nitrogen <abbrgrp><abbr bid="B32">32</abbr></abbrgrp>. The cytochrome production is regulated in response to oxygen concentration and the products of <it>fixLJ</it>, <it>fixK </it>and <it>fnrNd </it>genes are also known to be involved in such regulation <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. In <it>R. etli</it>, the duplicated <it>fixNOQP </it>operons are differentially regulated and only the <it>fixNOQP</it>d is required for symbiosis <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. An inactive <it>fixK</it>d is present in p42d but no <it>fixJ </it>genes have been found in <it>R. etli </it>CFN42 <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. It has been shown that FixKf controls both <it>fixNOQP </it>operons; loss of FixLf (presumably a fusion protein of FixL and FixJ) suppresses <it>fixNOQP</it>f expression, but has only a moderate effect on that of <it>fixNOQP</it>d <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>. Two <it>fnrN </it>genes have been described in <it>R. etli </it>CFN42; one is chromosomal (<it>fnrN</it>chr), and the other is on p42d (<it>fnrN</it>d). Both regulators participate in the activation of the operon <it>fixNOQP</it>d <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>.</p>
            <p>In <it>Escherichia coli</it>, Fnr is an oxygen-responsive global transcriptional regulator that binds to conserved boxes upstream of several genes (anaeroboxes) <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. By computational methods we predict 45 possible anaeroboxes in p42d (see Materials and methods). In some cases there are pairs of anaeroboxes in the same region. For example, two anaeroboxes lie within the intergenic region of the divergent operons <it>fixK</it>d and <it>fixNOQP</it>d and two more were detected upstream of <it>fixG</it>, <it>nocR</it>, <it>nodD</it><sub>3</sub>, and <it>fnrN</it>d. Other genes that display single anaeroboxes are <it>fixX</it>, <it>nifW</it>, <it>nifU</it>, <it>hemN</it><sub>2</sub>, <it>psiB</it>, <it>hesB</it>, <it>mcpC</it>, <it>teuB</it><sub>1 </sub>and some other genes of unknown function. Although there is no direct transcriptional evidence about the expression of these genes in microaerobic conditions, previous observations suggest that several regions of p42d are activated under these conditions <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Complex systems for macromolecular transport</p>
            </st>
            <p>A variety of transporters, which account for 10% of the CDSs, are scattered throughout p42d. These include several partial and complete ABC transporters for sugars, as well as the type III (TSSIII), and type IV (TSSIV) large-molecule secretion systems (Figure <figr fid="F1">1</figr>).</p>
            <p>In several pathogenic bacteria, the TSSIII translocate virulence factors into eukaryotic cells <abbrgrp><abbr bid="B44">44</abbr></abbrgrp>. In <it>Rhizobium </it>this system was first found in pNGR234a <abbrgrp><abbr bid="B10">10</abbr></abbrgrp> and it has been shown to have a role in nodulation efficiency in some host plants <abbrgrp><abbr bid="B45">45</abbr></abbrgrp>. Genes that encode proteins implicated in this system are also present in some of the SGCs <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and were detected in the sequence of p42d. Interestingly, a gene homologous with an elicitor of the hypersensitive response in plants, <it>hrpW</it>, is exclusively present in p42d. This gene might form an operon with <it>pcrD</it>, which encodes a calcium-binding membrane protein that is also part of the type-III secretion system.</p>
            <p>The TSSIV encoded by the <it>virB </it>genes has been described in several &#945;-proteobacterial pathogens and plant symbionts <abbrgrp><abbr bid="B46">46</abbr></abbrgrp> (see below). It consists of a membrane channel for delivering proteins or DNA into eukaryotic cells. In p42d, a complete set of <it>virB </it>genes, from <it>virB</it><sub>1 </sub>to <it>virB</it><sub>11</sub>, is present (Figure <figr fid="F1">1</figr>). Other TSSIV correspond to the <it>tra </it>genes that participate in bacterial conjugation. Although p42d is not a self-conjugative plasmid <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>, it contains the <it>traACDG </it>genes, an <it>oriT </it>and a truncated <it>traI </it>pseudogene (<it>yp096</it>), suggesting that p42d might have lost its self-conjugative capability.</p>
         </sec>
         <sec>
            <st>
               <p>Other functions</p>
            </st>
            <p>In addition to <it>nifA</it>, <it>fnrN</it>, <it>rpoN</it><sub>2</sub>, and <it>nodD</it><sub>1-3</sub>, 12 predicted genes encoding potential transcriptional regulators are present in p42d. They belong to different families, including LysR, AraC, Crp and GntR. The plasmid also encodes other functions including plasmid-maintenance, electron transfer, polysaccharide biosynthesis, melanin synthesis and secondary metabolism. The sequence of p42d revealed a putative methionyl-tRNA synthetase that could represent a reiterated gene or could have another functional role (for example in antibiotic resistance) <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Elements related to insertion sequences (ERIS)</p>
            </st>
            <p>In general, large numbers of ERIS have been found in the symbiotic compartments of rhizobia <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B7">7</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. The genome of <it>S. meliloti</it>, however, contains a relatively low abundance of these elements and their distribution is asymmetric; that is ERIS are more abundant in the pSymA, especially near symbiotic genes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. In p42d, ERIS belonging to 12 known IS families comprise 10% of the entire DNA sequence. The great majority of them belong to the IS3 and IS66 families. Although most ERIS represent incomplete, presumably inactive, IS sequences, some of them are organized in complete IS elements (Figure <figr fid="F1">1</figr>).</p>
            <p>The positions of some ERIS might suggest a role in plasmid shuffling. The 125 kb region that contains most of the symbiotic genes (Figure <figr fid="F1">1</figr>) is flanked by two complete IS elements. Both elements share identical 30 bp direct repeats at their borders, suggesting a potential transposition capability. The presence of the gene for an integrase-like protein (<it>yp018</it>) and the fact that the 125 kb region separates the <it>repABC </it>and the <it>tra </it>genes, has prompted the idea that the entire symbiotic region could be a mobile element. Furthermore, some groups of genes flanked by ERIS might have arrived in p42d as part of composite transposons, such as the cytochrome P450 cluster (see below), the NRb region, and a putative ATPase of an ABC transporter.</p>
         </sec>
         <sec>
            <st>
               <p>Reiterated DNA families and genomic rearrangements</p>
            </st>
            <p>It has previously been shown that p42d contains several reiterated DNA sequences <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> that can recombine, leading to genomic rearrangements <abbrgrp><abbr bid="B14">14</abbr><abbr bid="B49">49</abbr><abbr bid="B50">50</abbr></abbrgrp>. The sequence of the plasmid revealed a large amount of DNA reiteration. The major reiterated families were defined by containing a continuous stretch of at least 300 nucleotides with identical sequence. There are 12 such families, with two or three members each (Figure <figr fid="F1">1</figr>). In addition to the <it>nif </it>family described above, five families are related to ERIS and the rest are various genes such as those that encode the BacS protein <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, or gene fragments.</p>
            <p>As previously shown with pNGR234a <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>, the DNA sequence allows prediction, identification and isolation of the potential rearrangements that may be generated by homologous recombination. The complete sequence of p42d will allow the identification of the precise sites of previously identified genomic rearrangements <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>. In the present study we have predicted the major potential rearrangements in p42d as it was previously described <abbrgrp><abbr bid="B51">51</abbr></abbrgrp>; these include amplifications, deletions and inversions such as those illustrated in Figure <figr fid="F1">1</figr>.</p>
            <p>In other SGCs the differences in number, organization, orientation and length of the reiterated elements predict specific genome rearrangements, as exemplified by the rearrangements that involve the <it>nifH </it>reiteration of p42d and pNGR234a <abbrgrp><abbr bid="B50">50</abbr><abbr bid="B51">51</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Genetic information of p42d in the context of other genomes</p>
            </st>
            <p>The putative protein sequences of p42d were compared to the proteomes of several complete bacterial genomes extracted from GenBank <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> (see Materials and methods) as well as to the SGC sequences available to date (Figure <figr fid="F3">3</figr>). We identified all pairs of potential orthologs between p42d and each of the genomes analyzed, following the strategy and definition described in Materials and methods. As expected, the highest percentage of orthologs common to p42d and to any other bacterial genome was found among the nitrogen-fixing symbiotic bacteria. <it>S. meliloti </it>and <it>M. loti </it>have, respectively, 51% and 45% of the orthologs found in p42d (see Additional data file 5). Members of the &#945;-proteobacterial subclass such as <it>Caulobacter crescentus</it>, <it>Brucella melitensis </it>and <it>A. tumefaciens </it>(a plant pathogenic member of the Rhizobiaceae) have from 25% to 32% of the orthologs present in p42d. The percentage of p42d orthologs within the genomes of plant pathogens varies from 17% for <it>Xyllella fastidiosa </it>to 31% for <it>Ralstonia solanacearum</it>. Human bacterial pathogens such as <it>Haemophilus influenzae </it>and <it>Helicobacter pylori</it>, those with small genomes as <it>Rickettsia prowazekii </it>and <it>Mycoplasma genitalium</it>, and the archaea compared here, display the lowest number of shared orthologs. Instances of putative orthologs found in p42d and some complete bacterial genomes are shown in Figure <figr fid="F3">3a</figr>. In general, a collection of orthologs involved in diverse enzymatic activities is present in p42d and in most genomes compared here. They include the genes <it>hemN</it><sub>1</sub>, <it>hemN</it><sub>2</sub>, <it>ctrE</it>, <it>hisC</it>, <it>icfA</it>, <it>pgmV</it>, <it>aatC</it>, <it>pcaC</it><sub>1</sub>, <it>adhE</it>, <it>ribAB</it>, <it>bglS</it>, <it>kprS</it>, <it>mcpG</it>, <it>mcpA </it>and <it>mmsB </it>(see Additional data file 1, for the assigned function). Their identity is in most cases 50% or lower.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Comparison of predicted proteins from p42d with those from other genomes and SGCs</p>
               </caption>
               <text>
                  <p>Comparison of predicted proteins from p42d with those from other genomes and SGCs. Bidirectional best hits (BDBHs) between p42d and other genomes are shown. The bars in all rows represent the percentage identity (number of identities/length of the alignment) of BDBHs between p42d and the indicated genome (see below for color code). The horizontal red line in each row indicates 50% of similarity. A color code is shown for each genome or compartment. <b>(a) </b>Different organisms: <it>Bacillus subtilis </it>(dark magenta); <it>Brucella melitensis </it>(yellow); <it>Caulobacter crescentus </it>(red); <it>Escherichia coli </it>K12 (light magenta), <it>Methanobacterium thermoautotrophicum </it>(dark purple), and <it>Ralstonia solanacearum </it>(purple). <b>(b) </b><it>A. tumefaciens </it>C58 circular chromosome (white), linear chromosome (pale gray), pAT (gray), and pTi (dark gray). <b>(c) </b><it>B. japonicum </it>USDA110 SGC (turquoise). <b>(d) </b>pNGR234a (blue green). <b>(e) </b><it>M. loti </it>R7A SGC (green). <b>(f) </b><it>M. loti </it>MAFF303099 SGC (dark blue), and the rest of the chromosome (light blue). <b>(g) </b><it>S. meliloti </it>pSymA (pale yellow), pSymB (yellow), and the chromosome (dark yellow). <b>(h) </b>CDS distribution for p42d with the color codes for functional classes and the relevant regions as indicated in Figure <figr fid="F1">1</figr>.</p>
               </text>
               <graphic file="gb-2003-4-6-r36-3"/>
            </fig>
            <p>When we examined the distribution of orthologs in the six SGCs (see above), including p42d and using the genomes of <it>M. loti </it>and <it>S. meliloti </it>as reference, it was found that half of the hits lie in the respective SGCs and the rest are dispersed among other replicons, including the chromosomes (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F3">3f,3g</figr>). In general, the genes for symbiosis are very well conserved in the SGCs, whereas the orthologs of genes not involved in symbiosis are distributed in nonsymbiotic plasmids and in the chromosomes (Figure <figr fid="F3">3c,3d,3e</figr>).</p>
            <tbl id="T1" hint_layout="double">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Number of bidirectional best hits between pairs of SGCs or complete genomes</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>p42d</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>pNGR234a</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>pNGR23a</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SGC<it>Bj</it></p>
                     </c>
                     <c ca="center">
                        <p>88*</p>
                     </c>
                     <c ca="center">
                        <p>133</p>
                     </c>
                     <c ca="center">
                        <p>SGC<it>Bj</it></p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Sm</it>Chr</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>59</p>
                     </c>
                     <c ca="center">
                        <p><it>Sm</it>Chr</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>pSymA</p>
                     </c>
                     <c ca="center">
                        <p>100</p>
                     </c>
                     <c ca="center">
                        <p>77</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>pSymA</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>pSymB</p>
                     </c>
                     <c ca="center">
                        <p>20</p>
                     </c>
                     <c ca="center">
                        <p>43</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>pSymB</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p><it>Ml</it>Chr</p>
                     </c>
                     <c ca="center">
                        <p>62</p>
                     </c>
                     <c ca="center">
                        <p>127</p>
                     </c>
                     <c ca="center">
                        <p>66</p>
                     </c>
                     <c ca="center">
                        <p>2367</p>
                     </c>
                     <c ca="center">
                        <p>321</p>
                     </c>
                     <c ca="center">
                        <p>613</p>
                     </c>
                     <c ca="center">
                        <p><it>Ml</it>Chr</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>pMLa</p>
                     </c>
                     <c ca="center">
                        <p>15</p>
                     </c>
                     <c ca="center">
                        <p>29</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>18</p>
                     </c>
                     <c ca="center">
                        <p>17</p>
                     </c>
                     <c ca="center">
                        <p>10</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>pMLa</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>pMLb</p>
                     </c>
                     <c ca="center">
                        <p>5</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>11</p>
                     </c>
                     <c ca="center">
                        <p>8</p>
                     </c>
                     <c ca="center">
                        <p>23</p>
                     </c>
                     <c ca="center">
                        <p>12</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>pMLb</p>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SGC<it>Ml</it></p>
                     </c>
                     <c ca="center">
                        <p>81</p>
                     </c>
                     <c ca="center">
                        <p>116</p>
                     </c>
                     <c ca="center">
                        <p>79</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>65</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>ND</p>
                     </c>
                     <c ca="center">
                        <p>SGC<it>Ml</it></p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>SGCR7A</p>
                     </c>
                     <c ca="center">
                        <p>101</p>
                     </c>
                     <c ca="center">
                        <p>135</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                     <c ca="center">
                        <p>86</p>
                     </c>
                     <c ca="center">
                        <p>73</p>
                     </c>
                     <c ca="center">
                        <p>54</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                     <c ca="center">
                        <p>21</p>
                     </c>
                     <c ca="center">
                        <p>2</p>
                     </c>
                     <c ca="center">
                        <p>240</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Bidirectional best hits (BDBHs) were calculated in pairwise comparisons using BLASTP. All reciprocal matches with e-value up to 1e<sup>-04 </sup>and a coverage of at least 50% on the length of the shorter CDS were collected. p42d, the symbiotic plasmid of <it>R. etli</it>, 371 kb, 359 CDS; pNGR234a, the symbiotic plasmid of <it>Rhizobium </it>sp. 536 kb, 416 CDS; SGC<it>Bj</it>, <it>B. japonicum </it>USDA110 symbiotic chromosomal region, 410 kb, 388 CDS; <it>Sm</it>Chr, <it>S. meliloti </it>chromosome, 3,600 kb, 3396 CDS; pSymA, <it>S. meliloti </it>symbiotic plasmid A, 1,354 kb, 1,295 CDS; pSymB, <it>S. meliloti </it>symbiotic plasmid B, 1,683 kb, 1,571 CDS; <it>Ml</it>Chr, <it>M. loti </it>MAFF303099 chromosome without the symbiotic island, 6,425 kb, 6,172 CDS; pMLa, <it>M. loti </it>MAFF303099 cryptic plasmid a, 351 kb, 320 CDS; pMLb, <it>M. loti </it>MAFF303099 cryptic plasmid b, 208 kb, 209 CDS; SGC<it>Ml</it>, <it>M. loti </it>MAFF303099 symbiotic island, 611 kb, 580 CDS; SGCR7A, <it>M. loti </it>R7A symbiotic island, 502 kb, 414 CDS. ND, not determined. *The number of BDBHs with the complete genome of <it>B. japonicum </it>USDA110 is 150.</p>
               </tblfn>
            </tbl>
            <p>A total of 177 p42d CDSs (49%) have orthologs at least in one SGC. A subset of these (80 CDSs) belongs to the symbiotic region of 120 kb (Figure <figr fid="F3">3c,3d,3e,3f,3g</figr>, from NRa to NRb regions; Table <tblr tid="T1">1</tblr>) and the rest are interspersed in the remaining 251 kb of the plasmid. Among the SGCs compared, pNGR234a shares the highest percentage of orthologs (30%) with p42d (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F3">3d</figr>), followed by the pSymA (28%) and the SGC of <it>M. loti </it>R7A (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F3">3g</figr> and <figr fid="F3">3e</figr>, respectively). The SGC of <it>M. loti </it>MAFF303099 and <it>B. japonicum </it>share the fewest orthologs (24%) with p42d (Table <tblr tid="T1">1</tblr>, Figure <figr fid="F3">3f</figr> and <figr fid="F3">3c</figr>, respectively). The <it>A. tumefaciens </it>plasmids display the highest similarity with the TSSIV, TRA, and REP regions of p42d; the rest of the matches are distributed in the circular and the linear chromosomes (Figure <figr fid="F3">3b</figr>).</p>
            <p>There are 20 genes common to all SGCs. These correspond exclusively to symbiotic genes including both nitrogen fixation (<it>nifHDKENXAB</it>, <it>fixABCX</it>, <it>fdxN</it>, <it>fdxB</it>) and nodulation (<it>nodABCIJD</it>) genes. The essential <it>nodBC </it>genes, however, have possible paralogs in some plant pathogens such as <it>A. tumefaciens</it>, <it>Ralstonia solanacearum </it>and <it>Xanthomonas</it>. Possible paralogs of the transport genes <it>nodIJ </it>are present in all the genomes analyzed. In these bacterial species, putative paralogs of <it>nod </it>genes might participate in the synthesis and secretion of outer membrane lipopolysaccharides <abbrgrp><abbr bid="B53">53</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Conserved gene clusters in SGCs and other genomes</p>
            </st>
            <p>The <it>fixNOQPGHIS </it>common to different nitrogen-fixing symbiotic rhizobia are not always confined to the SGCs <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. As mentioned above, in <it>R. etli </it>CFN42, the <it>fix </it>genes are distributed in two replicons, p42d and p42f, and some of them are reiterated, as is frequently observed in other genomes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. In <it>S. meliloti</it>, these genes are reiterated three times in pSymA <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and in <it>M. loti </it>there are two copies of the entire operon <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. In <it>B. japonicum </it>they lie outside of the SGC (410 kb) determined by Gottfert <it>et al. </it><abbrgrp><abbr bid="B11">11</abbr></abbrgrp> but are included in the equivalent 681 kb SGC of the complete genome <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. Moreover, in <it>Rhizobium </it>sp. NGR234 they are chromosomal <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. The <it>fixNOQPGHIS </it>cluster was identified in the genome of the plant pathogen <it>A. tumefaciens </it>(circular chromosome), the intracellular parasite <it>Brucella melitensis </it>(chromosome I), and in the free-living aquatic bacterium <it>C. crescentus</it>; all of them belonging to the &#945;-proteobacterial subdivision. Among &#947;-proteobacteria, the plant pathogen <it>Pseudomonas aeruginosa </it>has this <it>fix </it>cluster, which is absent in <it>E. coli</it>. Also, orthologs of this gene cluster are conserved in <it>R. solanacearum</it>, a plant pathogen that belongs to the &#946;-proteobacteria.</p>
            <p>The <it>fixABCX </it>operon is highly conserved in diazotrophs as well as in a wide variety of other bacterial and archaeal species such as <it>E. coli</it>, <it>Mycoplasma genitalium</it>, <it>Bacillus subtilis</it>, <it>Thermotoga maritima </it>and <it>Archeoglobus fulgidus</it>. In <it>E. coli </it>these <it>fix </it>genes are related to the carnitine pathway, but their function is unknown in the other species <abbrgrp><abbr bid="B55">55</abbr></abbrgrp>. The ferredoxins FdxN and FdxB are always linked to <it>nif </it>genes in symbiotic as well as nonsymbiotic organisms. In <it>S. meliloti</it>, mutations in <it>fdxN </it>significantly impair the nitrogen-fixation process <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>.</p>
            <p>As mentioned above, the CPX cluster (9 kb, 15 CDSs) in p42d exhibits GC and CU profiles that diverge from the rest of the genes; CPX gene function is not known and no symbiotic role has so far been assigned to them. In <it>B. japonicum </it>some of these genes might participate in terpenoid synthesis <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. The genes included in the CPX region showed similar organization in the SGC of <it>M. loti </it>(strains MAFF303099 and R7A), in pNGR234a, and in p42d. In <it>B. japonicum</it>, the CPX cluster was not located in the 410 kb SGC <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B36">36</abbr></abbrgrp> but is present in the 680 kb SGC determined by Kaneko <it>et al. </it><abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. In pSymA of <it>S. meliloti</it>, this cluster is partially represented by homologs of <it>cpxP2</it>, <it>cpxP4</it>, <it>ctrE </it>and some conserved hypothetical genes, <it>yp013</it>-<it>yp015</it>. Homologs of IS are located at the right border of the CPX region in pNGR234a and pSymA, while they are to the left of the SGC in <it>M. loti </it>R7A. The CPX region in p42d is flanked by ERIS, highlighting its potential for transposition.</p>
            <p>A common feature in the SGCs is the presence of either the TSSIII or the TSSIV transport secretion systems. The TSSIII is found in pNGR234a, in the SGCs of <it>B. japonicum</it>, and in <it>M. loti </it>MAFF303099. The TSSIV is located in pSymA of <it>S. meliloti </it>and the symbiotic island of <it>M. loti </it>R7A. Both transport systems are present in p42d. In the pTi and pRi plasmids of <it>A. tumefaciens </it>C58, the TSSIV system is used for transferring the T-DNA to plant cells. In the absence of T-DNA in the SGCs, the precise function of these systems is not clear. Furthermore, both TSSIII and TSSIV are found in bacterial pathogens of plants and animals as well as in some &#945;-proteobacteria. Complete or partial TSSIII or TSSIV are present in <it>Brucella melitensis</it>, <it>C. crescentus</it>, <it>X. citri </it>and <it>X. campestris</it>, whereas in <it>Rickettsia prowazekii</it>, some <it>virB </it>genes are conserved. <it>P. aeruginosa </it>contains a complete TSSIII but lacks homologs of the TSSIV, while in <it>Xyllella fastidiosa</it>, nine putative conjugative proteins of the plasmid pXF41 are clearly orthologs of the corresponding <it>virB </it>gene set found in other microorganisms.</p>
         </sec>
         <sec>
            <st>
               <p>Absence of synteny among SGCs</p>
            </st>
            <p>It is generally known that gene order is conserved in closely related strains and species. The six SGCs compared here, except the SGCs of the two <it>M. loti </it>strains, have 20-30% of genes in common according to our estimates (Table <tblr tid="T1">1</tblr>). Most of these genes are located within the conserved clusters described above. Furthermore, genes unique to each of the individual genomes are interspersed among genes present in all in SGCs. For example, p42d contains 71 orphan genes throughout its structure.</p>
            <p>The SGCs in <it>M. loti </it>strains MAFF303099 and R7A share large conserved segments that contain all the symbiotic genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> (Figure <figr fid="F4">4</figr>, panel 10). The colinearity is disrupted by genes unique to either of the SGCs. The smallest region that encloses the 20 common orthologous genes (essentially <it>nod </it>and <it>nif </it>genes) can be delimited to about 50 kb in pSymA, 120 kb in p42d, 250 kb in pNGR234a, 300 kb in the SGC of <it>B. japonicum</it>, and 320 kb in the two SGCs of <it>M. loti </it>(Figure <figr fid="F5">5</figr>). Such variability in gene order suggests that the SGCs have recombined frequently with other genome elements.</p>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>Analysis of synteny among the SGCs</p>
               </caption>
               <text>
                  <p>Analysis of synteny among the SGCs. Pairs of orthologous proteins among different genomes or SGCs are plotted. Each protein pair is shown according to the location of the corresponding coordinate of the predicted translation start of the gene on the DNA region. The axes correspond to the total length of the respective DNA region: p42d 371,255 bp; <it>M. loti </it>MAFF303099 symbiotic island 610,975 bp; <it>M. loti </it>R7A symbiotic island 502,000 bp; <it>S. meliloti </it>pSymA 354,226 bp; pNGR234a 536,165 bp and <it>B. japonicum </it>symbiotic region 410,573 bp. For each group the first region mentioned corresponds to the <it>x</it>-axis. <b>(a) </b>p42d vs pNGR234a; <b>(b) </b>p42d vs pSymA; <b>(c) </b>p42d vs <it>B. japonicum </it>symbiotic region; <b>(d) </b>p42d vs <it>M. loti </it>MAFF303099 symbiotic island; <b>(e) </b>p42d vs <it>M. loti </it>R7A, symbiotic island; <b>(f) </b>pNGR234a vs <it>S. meliloti </it>pSymA; <b>(g) </b>pNGR234a vs <it>B. japonicum </it>symbiotic region; <b>(h) </b>pNGR234a vs <it>M. loti </it>303099 symbiotic island; <b>(i) </b>pNGR234a vs <it>M. loti </it>R7A symbiotic island; <b>(j) </b><it>M. loti </it>MAFF303099 symbiotic island vs <it>M. loti </it>R7A symbiotic island.</p>
               </text>
               <graphic file="gb-2003-4-6-r36-4"/>
            </fig>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Distribution of the 20 genes common to all the SGCs analyzed</p>
               </caption>
               <text>
                  <p>Distribution of the 20 genes common to all the SGCs analyzed. <b>(a) </b>p42d; <b>(b) </b><it>M. loti </it>MAFF303099 SGC; <b>(c) </b>pNGR234a; <b>(d) </b><it>M. loti </it>R7A SGC; <b>(e) </b><it>B. japonicum </it>SGC; <b>(f) </b><it>S. meliloti </it>pSymA. The color bars indicate the position of the genes. The nodulation genes <it>nodABCDIJ </it>are represented in blue, and the nitrogen-fixation genes <it>nifHDKNEXAB, fixABCX </it>and <it>fdxBN </it>are represented in yellow.</p>
               </text>
               <graphic file="gb-2003-4-6-r36-5"/>
            </fig>
            <p>Several transcriptional units that are conserved in some SGCs appear to have undergone rearrangements in others. Examples taken from the <it>nif</it>, <it>fix </it>and <it>nod </it>operons are illustrated in Additional data file 6. The <it>nifHDK </it>and <it>nifENX</it>, are neighboring conserved transcriptional units in p42d, in the two SGCs of <it>M. loti</it>, and in pNGR234a. However, <it>nifH </it>and <it>nifN </it>are separated from their respective operons in the SGC of <it>B. japonicum </it>and in pSymA of <it>S. meliloti</it>, respectively. Similarly, <it>nodA </it>is located away from the <it>nodBC </it>genes in p42d, and <it>nodB </it>is distant in the SGC of <it>M. loti </it>strains. The operon <it>fixABCX </it>is disrupted in the SGC of <it>B. japonicum</it>, where <it>fixA </it>is in an operon with <it>nifA</it>. In turn, in other SGCs, <it>nifA </it>is commonly found in an operon with <it>nifB </it>and <it>fdxN</it>. Phylogenetic analyses of the 20 common genes in the six SGCs result in nonequivalent trees, even for genes that are organized in operons (data not shown). For example, trees derived from the genes of the operons <it>nifHDK </it>and <it>fixABCX </it>are incongruent, indicating that intraoperon recombination has been also frequent.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>Our results indicate that p42d contains several regions that significantly deviate from the average GC content and typical CU. The plasmid harbors a large amount of ERIS and several reiterated DNA families. In addition, it contains 10 pseudogenes. These features resemble those found in other SGCs. All SGCs sequenced so far are heterogeneous regarding their gene content, and the genes common to most of them are mainly those involved in nodulation and nitrogen fixation. Other common genes are present either in SGCs or in other genome locations (see above). The lack of synteny between p24d and the different SGCs analyzed gives further support to the notion that the symbiotic compartments of rhizobial genomes are mosaic structures <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>, presumably assembled from regions derived from diverse genomic contexts, that might have been frequently modified as a consequence of transposition, recombination and lateral transfer events.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Sequencing strategy</p>
            </st>
            <p>A minimal set of cosmids that covers the entire p42d <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> were used to generate shotgun libraries (1-2 kb mean insert size) cloned in M13 or pUC19 vectors. DNA sequencing reactions were performed using the Big-Dye Terminator kit in an automatic 373A DNA Sequencer (Applied Biosystems, Foster City, CA). Gaps were filled by a primer-walking strategy as well as by sequencing appropriate clones from pBR328 and pSUP202 libraries. A total of 6,210 readings of 450 bases in average were collected to achieve a coverage of 7&#215; for the entire p42d.</p>
         </sec>
         <sec>
            <st>
               <p>Assembly</p>
            </st>
            <p>Base calling was done using the program PHRED and the assembly was obtained by PHRAP <abbrgrp><abbr bid="B56">56</abbr><abbr bid="B57">57</abbr></abbrgrp>. Graphic representation and edition of the assembly were accomplished using the CONSED program <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. Low-quality and single-stranded regions were located, and further sequencing was done to cover these areas. An error rate of less than 1 per 10,000 bases was estimated using base qualities determined by the PHRAP assembler. To confirm the assembly, pairs of forward and reverse primers were designed and used to raise overlapping PCR products with an average size of 5 kb, covering the entire plasmid in a single circular contig. The PCR products obtained agree well with the determined sequence (data not shown).</p>
         </sec>
         <sec>
            <st>
               <p>CDS prediction and annotation</p>
            </st>
            <p>The coding capacity of p42d was determined by applying GLIMMER 2.02 <abbrgrp><abbr bid="B59">59</abbr><abbr bid="B60">60</abbr></abbrgrp> iteratively to enhance the overall prediction efficiency. Given the evidence indicating that GLIMMER-based predictions are less effective in plasmids <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>, our approach also took into consideration the existence of several gene classes with different codon-usage (CU) patterns <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, and a potential ribosome-binding site (RBS) specific to p42d to aid GLIMMER in the selection of start codons.</p>
            <sec>
               <st>
                  <p>RBS prediction</p>
               </st>
               <p>An initial set of presumably functional genes with a corresponding upstream RBS was detected by running BLASTX <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> comparisons (using a maximum e-value cutoff of 0.001) of the entire plasmid against the nonredundant (nr) database <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> at the National Center for Biotechnology Information (NCBI). All matches with hypothetical or putative proteins as well as those with an upstream neighbor hit closer than 50 bp were discarded to avoid genes within operons. We took into consideration only hits displaying an identity &#8805; 40%, starting at the first amino acid, and alignment coverage of at least 80% of the matched protein. We then extended the selected hits towards the 5' terminus and kept those with an upstream in-frame stop codon before any other possible start codon. This procedure left 21 hits. From the p42d sequence we extracted 20-bp regions upstream of the start codons of these hits and inferred the most probable RBS (6 bp in length) by applying the CONSENSUS program <abbrgrp><abbr bid="B63">63</abbr></abbrgrp>. The resulting consensus matrix supported the sequence GGAGAG with an expected frequency of 2.034 &#215; 10<sup>-8</sup>.</p>
            </sec>
            <sec>
               <st>
                  <p>CDS prediction</p>
               </st>
               <p>To train GLIMMER, we took the initial output of the BLASTX comparison detailed above, and selected as training set all hits with an alignment length &#8805; 100 amino acids. Again, all matches with hypothetical/putative proteins were discarded. Overlapping hits matching the same protein were merged into a single larger hit, generating a total of 183 DNA segments. The RBS and the training set obtained were then used to run GLIMMER, yielding a prediction of 460 CDSs that included 93% of the 183 segments in the training set. However, we noticed that running GLIMMER iteratively yielded better results, because it produced a lower number of predicted CDSs and a greater number of segments in the training set mapping within predicted CDSs. We applied the method of A.M.-S., G.M.-H., A. Christen and J.C.-V. (unpublished work) to split the initial set of 460 CDSs into three groups displaying poor, typical and rich codon usage. Essentially, this method quantifies the extent to which individual genes use the most abundant codons in the plasmid. Each group was used as a training set and GLIMMER was run for 20 iterations to predict CDSs &#8805; 300 bp (CDSs &#8805; 500 bp in the first iteration composed the training set for the second iteration, and so forth). The best prediction for each CU group was selected, and the three resulting predictions were incorporated into a single one that produced 396 CDSs and recovered 97.75% of the initial training set.</p>
            </sec>
            <sec>
               <st>
                  <p>Annotation</p>
               </st>
               <p>All CDSs were manually curated using BLASTX comparisons (e-value &#8804; 0.001) against the nr database. The following criteria were applied to annotate the CDSs: CDSs were tagged as hypothetical (<it>yh</it>) when no homolog could be detected; hypothetical conserved CDSs (<it>yp</it>) were those displaying strong similarity to hypothetical proteins or weak similarity to known genes; CDSs with similarity &#8805; 50% along the entire length of known genes were assigned the same name as the matching gene; CDSs related to insertion sequences (IS) and transposons (<it>yi</it>) were compared with BLASTN and BLASTX against the IS database <abbrgrp><abbr bid="B64">64</abbr></abbrgrp> to identify the family they belong to. These elements were also analyzed for the presence of inverted repeats at their borders applying OLIGO 6.4 <abbrgrp><abbr bid="B65">65</abbr></abbrgrp> and BLAST2 programs. Functional classification was carried out following the categories proposed in Freiberg <it>et al</it>. <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. Additional support for annotation was obtained by searching for protein domains and motifs with the Interpro suite <abbrgrp><abbr bid="B66">66</abbr></abbrgrp>. Transmembrane domains and leader peptides were searched using the PSORT program <abbrgrp><abbr bid="B67">67</abbr></abbrgrp>. A relational database that compiles all this information is available at <abbrgrp><abbr bid="B68">68</abbr></abbrgrp>, and Additional data file 1, which shows the set of 359 annotated CDS. </p>
            </sec>
         </sec>
         <sec>
            <st>
               <p>Transcription units, RpoN promoters and regulatory binding sites</p>
            </st>
            <p>We predicted that all CDS in p42d are organized in 235 transcription units (TUs) by applying a previously reported distance-based methodology <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. Binding sites were detected using upstream regions of variable length (but properly specified in each case) for all annotated CDS in the pSym. To identify genes potentially expressed by RpoN promoters, we compiled an initial training set containing 85 prokaryotic promoters for which the transcription start site has been experimentally mapped <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>. The CONSENSUS/PATSER set of programs <abbrgrp><abbr bid="B63">63</abbr></abbrgrp> was then used to predict promoters 16 bp long in upstream regions of 250 bp. A final set of 37 RpoN promoters was obtained using as PATSER threshold the mean (&#956;) minus one standard deviation (&#963;) estimated from the set of 85 promoters (&#956; - 1&#963; = 6.33). Binding sites for NifA or UAS were predicted using seven reported sites <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B70">70</abbr><abbr bid="B71">71</abbr><abbr bid="B72">72</abbr><abbr bid="B73">73</abbr></abbrgrp> as the training set. CONSENSUS/PATSER was run to predict sites of 16 bp in length within -400 to +50 bp regions, yielding 21 sites with PATSER score &#8805; 8.03 (&#956; - 1&#963;); if a more stringent threshold is used instead, several known sites are undetected. We further discarded all predicted UAS without an associated RpoN promoter. In the case of <it>nod </it>boxes, we applied the dyad-sweeping method <abbrgrp><abbr bid="B74">74</abbr></abbrgrp> to a set of six reported sites <abbrgrp><abbr bid="B75">75</abbr><abbr bid="B76">76</abbr><abbr bid="B77">77</abbr><abbr bid="B78">78</abbr></abbrgrp> in order to pinpoint the location of potential <it>nod </it>boxes in the p42d (as a conglomerate of five or more dyads), and then used CONSENSUS/PATSER to determine 47-bp sites within -600 to +50 bp regions.</p>
            <p>Seven putative <it>nod </it>boxes were found by these approaches; however, several known functional sites were still undetected, and thus we trained CONSENSUS/PATSER again with the seven p42d sites found. Given that the mean PATSER score for these sites is too high (21.48), the usual threshold (&#956; - 1&#963;) is also correspondingly high (16.21), and thus we could not predict any additional sites. For these reasons, we decided to use as threshold the lowest PATSER score (9.15) obtained from the seven training sequences, in this way we finally predicted 15 <it>nod </it>boxes. CONSENSUS/PATSER programs were also applied to identify regulatory motifs for Fnr based on 30 known binding sites in <it>E. coli </it>extracted from RegulonDB <abbrgrp><abbr bid="B79">79</abbr></abbrgrp>. Predictions were carried out in the -400 to +50 bp regions using as threshold the PATSER score &#8805; 6.2 (&#956; - 1&#963;), which yielded 45 potential Fnr binding sites in p42d. If we get stricter and use the mean PATSER score (9.77) as threshold, only eight sites are detected. Nonetheless, given the evidence suggesting there is high transcriptional activity in the p42d under low-oxygen conditions <abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, we decided to relax the score to allow more Fnr sites.</p>
         </sec>
         <sec>
            <st>
               <p>Genome comparisons</p>
            </st>
            <p>Protein sequences from different genomes or symbiotic compartments were obtained from GenBank <abbrgrp><abbr bid="B52">52</abbr></abbrgrp>: pNGR234a U00090; <it>B. japonicum </it>USDA110 symbiotic region AF322012 and AF322013; <it>S. meliloti </it>AL591688; <it>M. loti </it>MAFF303039 NC_002678; <it>M. loti </it>R7A symbiotic island AL672111; <it>A. tumefaciens </it>C58 (U. Washington) AE008688 and AE008689; <it>A. tumefaciens </it>C58 (Cereon) AE007869 and AE007870; <it>Ralstonia solanacearum </it>AL646052; <it>C. crescentus </it>AE005673; <it>Rickettsia prowazekii </it>AJ235269; <it>E. coli </it>O157:H7 BA000007; <it>E. coli </it>K12 U00096; <it>Brucella melitensis </it>AE008917; <it>P. aeruginosa </it>AE004091; <it>Xanthomonas citri </it>AE008923; <it>Nostoc </it>NC_003272; <it>Xanthomonas campestris </it>AE008922; <it>Synechocystis </it>PCC6803 AB001339; <it>Xylella fastidiosa </it>AE003851; <it>Borrelia burgdorferi </it>AE000783; <it>Buchnera </it>sp. APS AP000398; <it>Mycoplasma genitalium </it>L43967; <it>Thermotoga maritima </it>AE000512; <it>Aquifex aeolicus </it>AE000657; <it>Archeoglobus fulgidus </it>AE000782; <it>Aeropyrum pernix </it>BA000002; <it>Methanobacterium thermoautotrophicum </it>AE000666; <it>Methanococcus jannaschii </it>L77117; <it>Methanopyrus kandleri </it>AE009439. Most probable orthologs were detected applying a previously reported method <abbrgrp><abbr bid="B69">69</abbr></abbrgrp>. Essentially, the method performs BLASTP pairwise comparisons against the protein sequences of p42d, and bidirectional best hits (BDBHs) were used to define the most likely orthologous genes. All BDBHs with an e-value &#8804; 0.0001 and alignment coverage of at least 50% of the smaller CDS were taken into consideration.</p>
         </sec>
         <sec>
            <st>
               <p>Nucleotide sequence accession number</p>
            </st>
            <p>The nucleotide sequence reported here has been deposited in GenBank under the accession number U80928.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The most relevant features of the functional annotation of the p42d can be found in Additional data file <supplr sid="s1">1</supplr> available with the online version of this paper. It contains the name, the predicted protein size, the best nr-matching homolog, and the percentage of similarity/identity. Lists of predicted binding sites are shown in Additional data file <supplr sid="s2">2</supplr> (<it>nod </it>boxes), Additional data file <supplr sid="s3">3</supplr> (RpoN promoters and NifA UAS) and Additional data file <supplr sid="s4">4</supplr> (anaeroboxes). The number of BDBHs between several complete genomes and the p42d is given in Additional data file <supplr sid="s5">5</supplr>. The topological representation of the 20 common genes in the six SGCs is detailed in Additional data file <supplr sid="s6">6</supplr>, A, p42d; B, <it>M. loti </it>MAFF303099; C, pNGR234a; D, <it>M. loti </it>R7A SGC; E, <it>B. japonicum </it>SGC; F, <it>S. meliloti </it>pSymA.</p>
         <suppl id="s1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>The most relevant features of the functional annotation of the p42d</p>
            </caption>
            <text>
               <p>The most relevant features of the functional annotation of the p42d</p>
            </text>
            <file name="gb-2003-4-6-r36-s1.pdf">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>A list of predicted binding sites for <it>nod </it>boxes</p>
            </caption>
            <text>
               <p>A list of predicted binding sites for <it>nod </it>boxes</p>
            </text>
            <file name="gb-2003-4-6-r36-s2.doc">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>A list of predicted binding sites for RpoN promoters and NifA UAS</p>
            </caption>
            <text>
               <p>A list of predicted binding sites for RpoN promoters and NifA UAS</p>
            </text>
            <file name="gb-2003-4-6-r36-s3.doc">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>A list of predicted binding sites for anaeroboxes</p>
            </caption>
            <text>
               <p>A list of predicted binding sites for anaeroboxes</p>
            </text>
            <file name="gb-2003-4-6-r36-s4.doc">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>The number of BDBHs between several complete genomes and the p42d</p>
            </caption>
            <text>
               <p>The number of BDBHs between several complete genomes and the p42d</p>
            </text>
            <file name="gb-2003-4-6-r36-s5.doc">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
         <suppl id="s6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>The topological representation of the 20 common genes in the six SGCs</p>
            </caption>
            <text>
               <p>The topological representation of the 20 common genes in the six SGCs</p>
            </text>
            <file name="gb-2003-4-6-r36-s6.pdf">
               <p>Click here for additional data file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We dedicate this paper to Rafael Palacios and Jaime Mora in gratitude for their support and stimulating critical discussions. We are grateful for the skillful technical support and advice given by J.A. Gama, R.E. G&#243;mez, R.I. Santamar&#237;a, S. Caro, J. Esp&#237;ritu, D. Garc&#237;a, F. S&#225;nchez, E. D&#237;az, E. P&#233;rez-Rueda, V. del Moral, K.D. Noel, J. Sanjuan, M. Rosenblueth, P. Gayt&#225;n, E. L&#243;pez, P. Rabinowicz, and P.M. Reddy. This work was partially supported by a public grant from CONACyT (M&#233;xico) under the Program for Emerging Areas (N-028).</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Taxonomic outline of the Procaryotes. Release 3.0.</p>
            </title>
            <aug>
               <au>
                  <snm>Garrity</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Bell</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Searles</snm>
                  <fnm>DB</fnm>
               </au>
            </aug>
            <source>In Bergey's Manual of Systematic Bacteriology.</source>
            <publisher>New York: Springer-Verlag</publisher>
            <pubdate>2002</pubdate>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Complete genome structure of the nitrogen-fixing symbiotic bacterium <it>Mesorhizobium loti</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaneko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Asamizu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Kato</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sasamoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Idesawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>DNA Res</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>331</fpage>
            <lpage>338</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55547</pubid>
                  <pubid idtype="pmpid">11214968</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Nucleotide sequence and predicted functions of the entire <it>Sinorhizobium meliloti </it>pSymA megaplasmid.</p>
            </title>
            <aug>
               <au>
                  <snm>Barnett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Komp</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Abola</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bowser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Capela</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9883</fpage>
            <lpage>9888</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55546</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481432</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294798</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Analysis of the chromosome sequence of the legume symbiont <it>Sinorhizobium meliloti </it>strain 1021.</p>
            </title>
            <aug>
               <au>
                  <snm>Capela</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bothe</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ampe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Batut</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Boistard</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boutry</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cadieu</snm>
                  <fnm>E</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9877</fpage>
            <lpage>9882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55548</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481430</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294398</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont <it>Sinorhizobium meliloti</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Finan</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Weidner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Buhrmester</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chain</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vorholter</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Hern&#225;ndez-Lucas</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cowie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <fpage>9889</fpage>
            <lpage>9894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">95015</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481431</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>The composite genome of the legume symbiont <it>Sinorhizobium meliloti</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Finan</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>P&#252;hler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Abola</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ampe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barnett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boistard</snm>
                  <fnm>P</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <fpage>668</fpage>
            <lpage>672</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">135072</pubid>
                  <pubid idtype="pmpid" link="fulltext">11474104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Complete genomic sequence of nitrogen-fixing symbiotic bacterium <it>Bradyrhizobium japonicum </it>USDA110.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaneko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Minamisawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Uchiumi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sasamoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Idesawa</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Iriguchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawashima</snm>
                  <fnm>K</fnm>
               </au>
               <etal/>
            </aug>
            <source>DNA Res</source>
            <pubdate>2002</pubdate>
            <volume>9</volume>
            <fpage>189</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">46067</pubid>
                  <pubid idtype="pmpid">12597275</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genome sequence of the plant pathogen and biotechnology agent <it>Agrobacterium tumefaciens </it>C58.</p>
            </title>
            <aug>
               <au>
                  <snm>Goodner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hinkle</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Gattung</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Blanchard</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Qurollo</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Goldman</snm>
                  <fnm>BS</fnm>
               </au>
               <au>
                  <snm>Cao</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Askenazi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Halling</snm>
                  <fnm>C</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>2323</fpage>
            <lpage>2328</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">107084</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743194</pubid>
                  <pubid idtype="doi">10.1126/science.1066803</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>The genome of the natural genetic engineer <it>Agrobacterium tumefaciens </it>C58.</p>
            </title>
            <aug>
               <au>
                  <snm>Wood</snm>
                  <fnm>DW</fnm>
               </au>
               <au>
                  <snm>Setubal</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Kaul</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Monks</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Kitajima</snm>
                  <fnm>JP</fnm>
               </au>
               <au>
                  <snm>Okura</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wood</snm>
                  <fnm>GE</fnm>
               </au>
               <au>
                  <snm>Almeida NF</snm>
                  <fnm>Jr</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>294</volume>
            <fpage>2317</fpage>
            <lpage>2323</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">151178</pubid>
                  <pubid idtype="pmpid" link="fulltext">11743193</pubid>
                  <pubid idtype="doi">10.1126/science.1066804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Molecular basis of symbiosis between <it>Rhizobium </it>and legumes.</p>
            </title>
            <aug>
               <au>
                  <snm>Freiberg</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Fellay</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Bairoch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Broughton</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Rosenthal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Perret</snm>
                  <fnm>X</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>387</volume>
            <fpage>394</fpage>
            <lpage>401</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148710</pubid>
                  <pubid idtype="pmpid">9163424</pubid>
                  <pubid idtype="doi">10.1038/387394a0</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Potential symbiosis-specific genes uncovered by sequencing a 410-kilobase DNA region of the <it>Bradyrhizobium japonicum </it>chromosome.</p>
            </title>
            <aug>
               <au>
                  <snm>Gottfert</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rothlisberger</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kundig</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Beck</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Marty</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hennecke</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2001</pubdate>
            <volume>183</volume>
            <fpage>1405</fpage>
            <lpage>1412</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">95546</pubid>
                  <pubid idtype="pmpid" link="fulltext">11157954</pubid>
                  <pubid idtype="doi">10.1128/JB.183.4.1405-1412.2001</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Comparative sequence analysis of the symbiosis island of <it>Mesorhizobium loti </it>strain R7A.</p>
            </title>
            <aug>
               <au>
                  <snm>Sullivan</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Trzebiatowski</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Cruickshank</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Elliot</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Fleetwood</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>McCallum</snm>
                  <fnm>NG</fnm>
               </au>
               <au>
                  <snm>Rossbach</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Stuart</snm>
                  <fnm>GS</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2002</pubdate>
            <volume>184</volume>
            <fpage>3086</fpage>
            <lpage>3095</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">46627</pubid>
                  <pubid idtype="pmpid" link="fulltext">12003951</pubid>
                  <pubid idtype="doi">10.1128/JB.184.11.3086-3095.2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Fermentative and aerobic metabolism in <it>Rhizobium etli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Encarnaci&#243;n</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dunn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Willms</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Mora</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1995</pubdate>
            <volume>177</volume>
            <fpage>3058</fpage>
            <lpage>3066</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16835</pubid>
                  <pubid idtype="pmpid" link="fulltext">7768801</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Amplification and deletion of a <it>nod-nif </it>region in the symbiotic plasmid of <it>Rhizobium phaseoli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Romero</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Brom</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Mart&#237;nez-Salazar</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Girard</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Palacios</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>D&#225;vila</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1991</pubdate>
            <volume>173</volume>
            <fpage>2435</fpage>
            <lpage>2441</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">16145</pubid>
                  <pubid idtype="pmpid">2013567</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Different plasmids of <it>Rhizobium leguminosarum </it>bv. <it>phaseoli </it>are required for optimal symbiotic performance.</p>
            </title>
            <aug>
               <au>
                  <snm>Brom</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Garc&#237;a de los Santos</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Stepkowsky</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Flores</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>D&#225;vila</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Palacios</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1992</pubdate>
            <volume>174</volume>
            <fpage>5183</fpage>
            <lpage>5189</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148753</pubid>
                  <pubid idtype="pmpid">1644746</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Structural complexity of the symbiotic plasmid of <it>Rhizobium leguminosarum </it>bv. <it>phaseoli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Girard</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Flores</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brom</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Palacios</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>D&#225;vila</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1991</pubdate>
            <volume>173</volume>
            <fpage>2411</fpage>
            <lpage>2419</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147303</pubid>
                  <pubid idtype="pmpid">2013564</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Sequence, localization and characteristics of the replicator region of the symbiotic plasmid of <it>Rhizobium etli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Ram&#237;rez-Romero</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Bustos</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Girard</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Rodr&#237;guez</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Cevallos</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>D&#225;vila</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Microbiology</source>
            <pubdate>1997</pubdate>
            <volume>143</volume>
            <fpage>2825</fpage>
            <lpage>2831</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9274036</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Wild type <it>Rhizobium etli</it>, a bean symbiont, produces acetyl-fucosylated, <it>N</it>-methylated, and carbamoylated nodulation factors.</p>
            </title>
            <aug>
               <au>
                  <snm>Poupot</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Mart&#237;nez-Romero</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Gautier</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Prome</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>6050</fpage>
            <lpage>6055</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">18656</pubid>
                  <pubid idtype="pmpid" link="fulltext">7890737</pubid>
                  <pubid idtype="doi">10.1074/jbc.270.11.6050</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>The role of the <it>nodI </it>and <it>nodJ </it>genes in the transport of Nod metabolites in <it>Rhizobium etli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>C&#225;rdenas</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Dom&#237;nguez</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Santana</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Quinto</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1996</pubdate>
            <volume>173</volume>
            <fpage>183</fpage>
            <lpage>187</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">88811</pubid>
                  <pubid idtype="pmpid" link="fulltext">8964496</pubid>
                  <pubid idtype="doi">10.1016/0378-1119(96)00166-7</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Genetic organization and transcriptional regulation of rhizobial nodulation genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Schlaman</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kondorosi</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>In The Rhizobiacea.</source>
            <publisher>Dordrecht: Kluwer</publisher>
            <editor>Spaink HP, Kondorosi A, Hoykaas PJJ</editor>
            <pubdate>1998</pubdate>
            <fpage>361</fpage>
            <lpage>386</lpage>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Conservation and extended promoter regions of nodulation genes in <it>Rhizobium</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Rostas</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Kondorosi</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Horvarth</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Simoncsits</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kondorosi</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1986</pubdate>
            <volume>83</volume>
            <fpage>1757</fpage>
            <lpage>1761</lpage>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Reiteration of nitrogen fixation gene sequences in <it>Rhizobium phaseoli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Quinto</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>de la Vega</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Flores</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fern&#225;ndez</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Ballado</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sober&#243;n</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Palacios</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1982</pubdate>
            <volume>299</volume>
            <fpage>724</fpage>
            <lpage>726</lpage>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Multiple recombination events maintain sequence identity among members of the nitrogenase multigene family in <it>Rhizobium etli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Rodr&#237;guez</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Romero</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1998</pubdate>
            <volume>149</volume>
            <fpage>785</fpage>
            <lpage>794</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29794</pubid>
                  <pubid idtype="pmpid" link="fulltext">9611191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Genetic and structural analysis of the <it>Rhizobium meliloti fixA, fixB, fixC</it>, and <it>fixX </it>genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Earl</snm>
                  <fnm>CD</fnm>
               </au>
               <au>
                  <snm>Ronson</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Ausubel</snm>
                  <fnm>FM</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1987</pubdate>
            <volume>169</volume>
            <fpage>1127</fpage>
            <lpage>1136</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">211910</pubid>
                  <pubid idtype="pmpid">3029021</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>A survey of symbiotic nitrogen fixation by Rhizobia.</p>
            </title>
            <aug>
               <au>
                  <snm>Kaminski</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Batut</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Boistard</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>In The Rhizobiacea.</source>
            <publisher>Dordrecht: Kluwer</publisher>
            <editor>Spaink HP, Kondorosi A, Hoykaas PJJ</editor>
            <pubdate>1998</pubdate>
            <fpage>431</fpage>
            <lpage>460</lpage>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Evidence for the direct interaction of the <it>nifW </it>gene product with the MoFe protein.</p>
            </title>
            <aug>
               <au>
                  <snm>Kim</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Burgess</snm>
                  <fnm>BK</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1996</pubdate>
            <volume>271</volume>
            <fpage>9764</fpage>
            <lpage>9770</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.271.16.9764</pubid>
                  <pubid idtype="pmpid" link="fulltext">8621656</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Activity of purified NIFA, a transcriptional activator of nitrogen fixation genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Lee</snm>
                  <fnm>HS</fnm>
               </au>
               <au>
                  <snm>Berger</snm>
                  <fnm>DK</fnm>
               </au>
               <au>
                  <snm>Kustu</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>1993</pubdate>
            <volume>90</volume>
            <fpage>2266</fpage>
            <lpage>2270</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">46067</pubid>
                  <pubid idtype="pmpid" link="fulltext">8460132</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>The <it>Rhizobium etli </it>gene <it>iscN </it>is highly expressed in bacteroids and required for nitrogen fixation.</p>
            </title>
            <aug>
               <au>
                  <snm>Dombrecht</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tesfay</snm>
                  <fnm>MZ</fnm>
               </au>
               <au>
                  <snm>Verreth</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Heusdens</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Napoles</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Vanderleyden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Michiels</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Mol Genet Genomics</source>
            <pubdate>2002</pubdate>
            <volume>267</volume>
            <fpage>820</fpage>
            <lpage>828</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1007/s00438-002-0715-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12207230</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>The <it>Rhizobium meliloti fdxN </it>gene encoding a ferredoxin-like protein is necessary for nitrogen fixation and is cotranscribed with <it>nifA and nifB</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Klipp</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Reilander</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Schluter</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Krey</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>P&#252;hler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Mol Gen Genet</source>
            <pubdate>1989</pubdate>
            <volume>216</volume>
            <fpage>293</fpage>
            <lpage>302</lpage>
            <xrefbib>
               <pubid idtype="pmpid">2747618</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Identification of the V factor needed for synthesis of the iron-molybdenum cofactor of nitrogenase as homocitrate.</p>
            </title>
            <aug>
               <au>
                  <snm>Hoover</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Robertson</snm>
                  <fnm>AD</fnm>
               </au>
               <au>
                  <snm>Cerny</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Hayes</snm>
                  <fnm>RN</fnm>
               </au>
               <au>
                  <snm>Imperial</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Ludden</snm>
                  <fnm>PW</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1987</pubdate>
            <volume>329</volume>
            <fpage>855</fpage>
            <lpage>857</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/329855a0</pubid>
                  <pubid idtype="pmpid" link="fulltext">3313054</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Role of the nifQ gene product in the incorporation of molybdenum into nitrogenase in <it>Klebsiella pneumoniae</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Imperial</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ugalde</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Shah</snm>
                  <fnm>VK</fnm>
               </au>
               <au>
                  <snm>Brill</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1984</pubdate>
            <volume>158</volume>
            <fpage>187</fpage>
            <lpage>194</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">215397</pubid>
                  <pubid idtype="pmpid">6370956</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Genetic regulation of nitrogen fixation in rhizobia.</p>
            </title>
            <aug>
               <au>
                  <snm>Fischer</snm>
                  <fnm>HM</fnm>
               </au>
            </aug>
            <source>Microbiol Rev</source>
            <pubdate>1994</pubdate>
            <volume>58</volume>
            <fpage>352</fpage>
            <lpage>386</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7968919</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The <it>Rhizobium etli rpoN </it>locus: DNA sequence analysis and phenotypical characterization of <it>rpoN, ptsN</it>, and <it>ptsA </it>mutants.</p>
            </title>
            <aug>
               <au>
                  <snm>Michiels</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Van Soom</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>D'Hooghe</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Dombrecht</snm>
                  <fnm>B</fnm>
               </au>
        