<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2180-8-72</ui>
   <ji>1471-2180</ji>
   <fm>
      <dochead>Research article</dochead>
      <bibl>
         <title>
            <p>Identification of new genes in <it>Sinorhizobium meliloti </it>using the Genome Sequencer FLX system</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Mao</snm>
               <fnm>Chunhong</fnm>
               <insr iid="I1"/>
               <email>chmao@vt.edu</email>
            </au>
            <au id="A2">
               <snm>Evans</snm>
               <fnm>Clive</fnm>
               <insr iid="I1"/>
               <email>cevans@vbi.vt.edu</email>
            </au>
            <au id="A3">
               <snm>Jensen</snm>
               <mi>V</mi>
               <fnm>Roderick</fnm>
               <insr iid="I1"/>
               <email>rvjensen@vt.edu</email>
            </au>
            <au id="A4">
               <snm>Sobral</snm>
               <mi>WS</mi>
               <fnm>Bruno</fnm>
               <insr iid="I1"/>
               <email>sobral@vbi.vt.edu</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA</p>
            </ins>
         </insg>
         <source>BMC Microbiology</source>
         <issn>1471-2180</issn>
         <pubdate>2008</pubdate>
         <volume>8</volume>
         <issue>1</issue>
         <fpage>72</fpage>
         <url>http://www.biomedcentral.com/1471-2180/8/72</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">18454850</pubid>
               <pubid idtype="doi">10.1186/1471-2180-8-72</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>09</day>
               <month>1</month>
               <year>2008</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>02</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>02</day>
               <month>5</month>
               <year>2008</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2008</year>
         <collab>Mao et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>Sinorhizobium meliloti </it>is an agriculturally important model symbiont. There is an ongoing need to update and improve its genome annotation. In this study, we used a high-throughput pyrosequencing approach to sequence the transcriptome of <it>S. meliloti</it>, and search for new bacterial genes missed in the previous genome annotation. This is the first report of sequencing a bacterial transcriptome using the pyrosequencing technology.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Our pilot sequencing run generated 19,005 reads with an average length of 136 nucleotides per read. From these data, we identified 20 new genes. These new gene transcripts were confirmed by RT-PCR and their possible functions were analyzed.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>Our results indicate that high-throughput sequence analysis of bacterial transcriptomes is feasible and next-generation sequencing technologies will greatly facilitate the discovery of new genes and improve genome annotation.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p><it>Sinorhizobium meliloti </it>is a micro-symbiont associated with legume plants. This soil bacterium inhabits nodules on the roots of host legume plants, where it reduces atmospheric nitrogen to organic nitrogenous compounds that can be utilized by its hosts. Because of its agricultural and ecological importance, <it>S. meliloti </it>has been extensively studied as a model symbiont. The <it>S. meliloti </it>1021 genome sequence and the initial annotation of the genome were completed in 2001 <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. The <it>S.meliloti </it>genome comprises three replicons, the 3.65 Mb chromosome, the 1.35 Mb megaplasmid pSymA, and the1.68 Mb megaplasmid pSymB <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. According to RefSeq <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, the <it>S. meliloti </it>1021 genome has 6205 predicted protein-encoding genes. Among these, more than one-third were annotated as "hypothetical" or "unknown". Many research papers have been published on <it>S. meliloti </it>since its genome sequence was completed. Also, more genomes of closely related species such as <it>Brucella </it>spp., <it>Rhodopseudomonas palustris</it>, and <it>S. medicae </it>WSM419 have been sequenced. Comparative genomics including newly sequenced genomes provides new information about the genome of <it>S. meliloti</it>. There is an ongoing need to update and improve its genome annotation. So far, there are no systematic efforts of direct sequencing of its entire transcriptome. Microarray data are available, but most microarray designs are based on annotated genes <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. High-density whole-genome tiling arrays are not yet available.</p>
         <p>The goal of this study was to develop a high-throughput experimental approach to search for new genes of <it>S. meliloti </it>missed in the previous genome annotation <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. We used pyrosequencing <abbrgrp><abbr bid="B8">8</abbr></abbrgrp> to sequence the transcriptome of <it>S.meliloti</it>. The GS FLX system from Roche and 454 Life Sciences can generate more than 100 million bases per sequencing run with an average yield of greater than 400,000 reads of average length of 250 bases. This platform provides a broad range of applications including whole genome sequencing <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>, transcriptome and gene regulation studies <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr><abbr bid="B15">15</abbr></abbrgrp>, metagenomics analysis <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> and amplicon sequencing <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Although pyrosequencing has been used to sequence microbial genomes, relatively few applications of transcriptome analysis have been reported. Here, we present the first report of sequencing a bacterial transcriptome using the GS FLX platform as an experimental approach for gene discovery.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>Gene prediction</p>
            </st>
            <p>We used an automated gene annotation pipeline provided by PATRIC <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> to predict genes in <it>S. meliloti</it>. This pipeline uses a combination of gene prediction programs, Glimmer <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, GeneMark <abbrgrp><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>, TICO <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and RBSfinder <abbrgrp><abbr bid="B24">24</abbr></abbrgrp> to predict genes and compares with genes in RefSeq. A total of 512 new protein-coding genes (with length >90 nt) in the intergenic regions of the genome were predicted through this automated pipeline (Additional file <supplr sid="S1">1</supplr>). The number of predicted genes in different length ranges is shown in Figure <figr fid="F1">1</figr>. Most of the predicted new genes are relatively small (length &lt;400 nt). The average length is about 200 nt. These genes were BLASTed against the NCBI non-redundant (NR) protein database <abbrgrp><abbr bid="B25">25</abbr><abbr bid="B26">26</abbr></abbrgrp>. The result showed that 159 candidates had BLAST hits in the NR database with E-values less than 0.01, whereas the remaining 353 of the candidates had no significant hits. Small gene size and lack of BLAST hits may be the reasons that the predicted new gene candidates were missed in the original genome annotation process.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Length distribution of predicted gene candidates</p>
               </caption>
               <text>
                  <p>
                     <b>Length distribution of predicted gene candidates.</b>
                  </p>
               </text>
               <graphic file="1471-2180-8-72-1"/>
            </fig>
            <suppl id="S1">
               <title>
                  <p>Additional file 1</p>
               </title>
               <text>
                  <p>Supplementary Table S1. New protein-coding genes predicted by PATRIC that are located in the intergenic regions</p>
               </text>
               <file name="1471-2180-8-72-S1.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Sequence analysis</p>
            </st>
            <p>Total RNAs were extracted from <it>S. meliloti </it>1021 cells grown to mid-exponential phase in the TY medium and treated with DNase I to remove genomic DNA (Methods). The 16s and 23s rRNAs were depleted and the RNA samples were then amplified to produce cDNA fragments of average length about 150 nt (Methods). With two test cDNA samples loaded on to 4 lanes per sample of a 16 lane sequencing plate, the titration run generated a total of 19,005 high quality reads with average length of 136 nt (Table <tblr tid="T1">1</tblr>). Although our rRNA removal step indicated that more than 90% rRNAs were depleted as judged by Agilent 2100 Bioanalyzer, approximately 90% of the reads still aligned to the rRNA operons (Figure <figr fid="F2">2</figr>). This may be due to relative low mRNA population in the <it>S. meliloti </it>cells. Out of 17092 reads aligned to the rRNA operons, 3 reads matched 5s rRNA, 2860 reads matched 16s rRNA, 13983 reads matched 23s rRNA, and the remaining 246 reads aligned to the integenic regions between the rRNA genes in the rRNA operons. For the 1854 non-rRNA sequences, 1774 matched to 737 of the 6271 RefSeq genes (proteins and RNAs) and 59 matched to 32 of 512 new protein-coding genes predicted through our gene prediction pipeline. The remaining 21 sequences mainly matched sequences either immediately before or after a coding region, presumably 5' UTR or 3' UTR.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>GS FLX sequencing results</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="right">
                        <p>Sample 1</p>
                     </c>
                     <c ca="right">
                        <p>Sample 2</p>
                     </c>
                     <c ca="right">
                        <p>Total</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p># Sequence reads</p>
                     </c>
                     <c ca="right">
                        <p>8694</p>
                     </c>
                     <c ca="right">
                        <p>10311</p>
                     </c>
                     <c ca="right">
                        <p>19005</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Average sequence length</p>
                     </c>
                     <c ca="right">
                        <p>139</p>
                     </c>
                     <c ca="right">
                        <p>133</p>
                     </c>
                     <c ca="right">
                        <p>136</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p># Sequences aligned to genes</p>
                     </c>
                     <c ca="right">
                        <p>1165</p>
                     </c>
                     <c ca="right">
                        <p>689</p>
                     </c>
                     <c ca="right">
                        <p>1854</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p># Sequences in rRNA operons</p>
                     </c>
                     <c ca="right">
                        <p>7513</p>
                     </c>
                     <c ca="right">
                        <p>9579</p>
                     </c>
                     <c ca="right">
                        <p>17092</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p># Sequences not aligned to the genome (e&lt;0.01)</p>
                     </c>
                     <c ca="right">
                        <p>16</p>
                     </c>
                     <c ca="right">
                        <p>43</p>
                     </c>
                     <c ca="right">
                        <p>59</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Removal of 16s and 23s rRNAs using MICROB <it>Express</it>&#8482; kit from Ambion</p>
               </caption>
               <text>
                  <p><b>Removal of 16s and 23s rRNAs using MICROB <it>Express</it>&#8482; kit from Ambion</b>. Total RNA samples before (in red) and after (in blue) rRNA depletion were analyzed on the Agilent 2100 Bioanalyzer.</p>
               </text>
               <graphic file="1471-2180-8-72-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Validation of new genes using RT-PCR</p>
            </st>
            <p>Twenty new gene candidates with multiple GS FLX sequence hits or long sequence hits (>80 nt) were chosen for further verification using RT-PCR analysis of the original total RNA samples. These new gene candidates were predicted by PATRIC pipeline as protein-coding genes. Figure <figr fid="F3">3</figr> shows an example of a new gene candidate (VBISMc1000) in the <it>Sinorhizobium </it>Genome Browser, which was built using the GBrowse software <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. All 20 genes were detected in the RT-PCR experiment and the PCR products were sequenced and confirmed (Figure <figr fid="F4">4</figr>). The negative controls indicated that there was no genomic DNA contamination in the RNA samples tested.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Genome view of a new gene (in red)</p>
               </caption>
               <text>
                  <p><b>Genome view of a new gene (in red)</b>. GLX sequences (in green from prep 1 and in blue from prep 2) are aligned to VBISMc1000.</p>
               </text>
               <graphic file="1471-2180-8-72-3"/>
            </fig>
            <fig id="F4">
               <title>
                  <p>Figure 4</p>
               </title>
               <caption>
                  <p>RT-PCR of 20 gene candidates</p>
               </caption>
               <text>
                  <p><b>RT-PCR of 20 gene candidates</b>. Lane 1: low molecular weight DNA ladder from New England Biolabs. Size range: 25 bp to 766 bp. Lane 2-21: 20 gene candidates, VBISMa0080, VBISMa0492, VBISMa1337, VBISMb0078, VBISMb0839, VBISMc0095, VBISMc0802, VBISMc1000, VBISMc1221, VBISMc1492, VBISMc1793, VBISMc2171, VBISMc2174, VBISMc2596, VBISMc2940, VBISMc2955, VBISMc3188, VBISMc3282, VBISMc4046 and VBISMc4289, respectively. For majority RT-PCR reactions, each produced one corresponding PCR product (lane 3-5, 7, 9-12, 14-19 and 21). These PCR products were directly sequenced and their sequences matched to the corresponding gene candidates. Multiple PCR products were found in lane 2, 6, 8, 13 and 20. The bands with the correct PCR product sizes are labeled with *. These PCR products were used to do a second round of PCR to produce enough DNA for sequencing. The sequencing results confirmed that they matched to the corresponding gene candidates. The most abundant PCR product in lane 2 was sequenced and determined to be a part of 23s rRNA sequence. Lane 22-29: negative controls using the RNA sample that was not reverse transcribed and primer pairs of new genes to show no genomic DNA contamination. In each lane of 22 to 29, combined primer pairs of two or three genes were used. Lane 30: no template control. Primer pairs of cm0012a, cm012b and cm016a, cm016b were used.</p>
               </text>
               <graphic file="1471-2180-8-72-4"/>
            </fig>
            <p>To test whether the new genes are co-transcribed with their upstream or downstream flanking genes (if any), a set of primer pairs were designed (Figure <figr fid="F5">5</figr>, Additional file <supplr sid="S2">2</supplr>) to detect RT-PCR product of the transcripts with the flanking genes. The results are summarized in Table <tblr tid="T2">2</tblr>. Columns "co-transcribed with upstream gene" and "co-transcribed with downstream gene" indicated whether the RT-PCR products of the transcripts, which span from the upstream flanking gene to the new gene or from the new gene to downstream flanking gene, were detected. Ten of the predicted genes were not detected to be co-transcribed with either upstream or downstream flanking genes (Table <tblr tid="T2">2</tblr>).</p>
            <fig id="F5">
               <title>
                  <p>Figure 5</p>
               </title>
               <caption>
                  <p>Primer design for testing co-transcription using RT-PCR</p>
               </caption>
               <text>
                  <p><b>Primer design for testing co-transcription using RT-PCR</b>. PL1 and PR1 are primer set for testing transcript from upstream flanking gene to new gene and PL2 and PR2 are primer set for testing transcript from new gene to downstream flanking gene.</p>
               </text>
               <graphic file="1471-2180-8-72-5"/>
            </fig>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Summary of new genes</p>
               </caption>
               <tblbdy cols="11">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Gene</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Replicon*</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Predicted start</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Predicted end</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Strand</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Length</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Co-transcribed with upstream gene</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Co-transcribed with downstream gene</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Target description for predicted genes</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>E-value</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Percent similarity</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="11">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMa0080</p>
                     </c>
                     <c ca="center">
                        <p>A</p>
                     </c>
                     <c ca="center">
                        <p>65258</p>
                     </c>
                     <c ca="center">
                        <p>65494</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>SMa0121</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMa0492</p>
                     </c>
                     <c ca="center">
                        <p>A</p>
                     </c>
                     <c ca="center">
                        <p>410418</p>
                     </c>
                     <c ca="center">
                        <p>410660</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>243</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMa1337</p>
                     </c>
                     <c ca="center">
                        <p>A</p>
                     </c>
                     <c ca="center">
                        <p>1191271</p>
                     </c>
                     <c ca="center">
                        <p>1191525</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>255</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>putative dioxygenase, slightly similar to catechol 1,2-dioxygenase protein [<it>S. meliloti </it>1021]</p>
                     </c>
                     <c ca="center">
                        <p>2e-4</p>
                     </c>
                     <c ca="center">
                        <p>25</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMb0078</p>
                     </c>
                     <c ca="center">
                        <p>B</p>
                     </c>
                     <c ca="center">
                        <p>66440</p>
                     </c>
                     <c ca="center">
                        <p>66619</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>180</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>SMb20056</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein BMEII0534 [<it>B. melitensis </it>16M]</p>
                     </c>
                     <c ca="center">
                        <p>4e-3</p>
                     </c>
                     <c ca="center">
                        <p>30</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMb0839</p>
                     </c>
                     <c ca="center">
                        <p>B</p>
                     </c>
                     <c ca="center">
                        <p>679069</p>
                     </c>
                     <c ca="center">
                        <p>679800</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>732</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>two component transcriptional regulator, LuxR family [<it>S. medicae </it>WSM419]</p>
                     </c>
                     <c ca="center">
                        <p>2e-68</p>
                     </c>
                     <c ca="center">
                        <p>69</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc0095</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>83906</p>
                     </c>
                     <c ca="center">
                        <p>84025</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>120</p>
                     </c>
                     <c ca="center">
                        <p>tRNA-Ala</p>
                     </c>
                     <c ca="center">
                        <p>23s rRNA</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc0802</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>662922</p>
                     </c>
                     <c ca="center">
                        <p>663287</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>366</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc1000</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>815598</p>
                     </c>
                     <c ca="center">
                        <p>816023</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>426</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein Smed_0338 [<it>S. medicae </it>WSM419]</p>
                     </c>
                     <c ca="center">
                        <p>6e-20</p>
                     </c>
                     <c ca="center">
                        <p>36</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc1221</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>997383</p>
                     </c>
                     <c ca="center">
                        <p>997565</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>183</p>
                     </c>
                     <c ca="center">
                        <p>ctaE</p>
                     </c>
                     <c ca="center">
                        <p>SMc00014</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein Smed_0524 [<it>S. medicae </it>WSM419]</p>
                     </c>
                     <c ca="center">
                        <p>1e-15</p>
                     </c>
                     <c ca="center">
                        <p>63</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc1492</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>1208848</p>
                     </c>
                     <c ca="center">
                        <p>1209090</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>243</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc1793</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>1444596</p>
                     </c>
                     <c ca="center">
                        <p>1444832</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>237</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>rne</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc2171</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>1725819</p>
                     </c>
                     <c ca="center">
                        <p>1726607</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>789</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>SMc01204</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein Smed_1270 [<it>S. medicae </it>WSM419]</p>
                     </c>
                     <c ca="center">
                        <p>8e-124</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc2174</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>1728081</p>
                     </c>
                     <c ca="center">
                        <p>1728221</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>141</p>
                     </c>
                     <c ca="center">
                        <p>SMc01200</p>
                     </c>
                     <c ca="center">
                        <p>SMc01202</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc2596</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>2060940</p>
                     </c>
                     <c ca="center">
                        <p>2061143</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>204</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>csp4</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc2940</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>2312866</p>
                     </c>
                     <c ca="center">
                        <p>2313225</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>360</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>conserved hypothetical signal peptide protein [<it>S. medicae </it>WSM419]</p>
                     </c>
                     <c ca="center">
                        <p>2e-54</p>
                     </c>
                     <c ca="center">
                        <p>89</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc2955</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>2321074</p>
                     </c>
                     <c ca="center">
                        <p>2321289</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>216</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc3188</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>2520862</p>
                     </c>
                     <c ca="center">
                        <p>2520993</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>132</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc3282</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>2595016</p>
                     </c>
                     <c ca="center">
                        <p>2595537</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>522</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>SMc01535</p>
                     </c>
                     <c ca="center">
                        <p>Nodulation protein nolR</p>
                     </c>
                     <c ca="center">
                        <p>9e-50</p>
                     </c>
                     <c ca="center">
                        <p>59</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc4046</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>3233965</p>
                     </c>
                     <c ca="center">
                        <p>3234153</p>
                     </c>
                     <c ca="center">
                        <p>+</p>
                     </c>
                     <c ca="center">
                        <p>189</p>
                     </c>
                     <c ca="center">
                        <p>SMc03108</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein pRL110117 [<it>R. leguminosarum </it>bv. viciae 3841]</p>
                     </c>
                     <c ca="center">
                        <p>1e-5</p>
                     </c>
                     <c ca="center">
                        <p>56</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>VBISMc4289</p>
                     </c>
                     <c ca="center">
                        <p>C</p>
                     </c>
                     <c ca="center">
                        <p>3417257</p>
                     </c>
                     <c ca="center">
                        <p>3419215</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>1959</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>-</p>
                     </c>
                     <c ca="center">
                        <p>hypothetical protein Cvib_0070 [<it>P. vibrioformis </it>DSM 265]</p>
                     </c>
                     <c ca="center">
                        <p>4e-35</p>
                     </c>
                     <c ca="center">
                        <p>47</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>*: replicon A: pSymA; B: pSymB; C: Chromosome.</p>
               </tblfn>
            </tbl>
            <suppl id="S2">
               <title>
                  <p>Additional file 2</p>
               </title>
               <text>
                  <p>Supplementary Table S2. Primers</p>
               </text>
               <file name="1471-2180-8-72-S2.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
         <sec>
            <st>
               <p>Functional annotation of the new genes</p>
            </st>
            <p>The sequences of putative new genes were searched against the NR database from NCBI and SwissProt from EBI <abbrgrp><abbr bid="B28">28</abbr></abbrgrp> using BLASTX <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and Smith-Waterman programs (<abbrgrp><abbr bid="B29">29</abbr></abbrgrp>; Table <tblr tid="T2">2</tblr>). Both programs produced very similar results: 10 of the 20 new genes had no significant hits, and the other 10 had either a full length or partial match with proteins in the NR database (Table <tblr tid="T2">2</tblr>). The genes with no significant hits were relatively short with lengths ranging from 120 to 366 nt.</p>
            <p>Four predicted genes showed significant matches with genes with known or putative functions (Table <tblr tid="T2">2</tblr>). VBISMc2940 had 89% similarity to a conserved hypothetical signal peptide protein from <it>S. medicae </it>WSM419. VBISMb0839 had 69% similary to a two component transcriptional regulator, LuxR family from <it>S. medicae </it>WSM419. VBISMa1337 partially matched to a putative dioxygenase with 25% similarity. VBISMc3282 matched to nodulation protein NolR with 59% similarity. The NolR protein is a transcriptional regulator for common nodulation genes as well as the three nodD copies present in <it>S. meliloti</it>. Previous studies have shown that nolR gene in <it>S. meliloti </it>strain 1021 has a single insertion in the C-terminal coding sequence which abolishes the DNA-binding ability of the NolR protein <abbrgrp><abbr bid="B30">30</abbr><abbr bid="B31">31</abbr></abbrgrp>. Thus, Rm1021 has no NolR activity. NolR<sup>- </sup>strains nodulate host plants less efficiently than NolR<sup>+ </sup>strains. In the RefSeq database, VBISMc3282 was not previously annotated as a gene although it is a mutant form of the <it>nolR </it>gene. Here, we demonstrate and confirm that this mutant gene is expressed.</p>
            <p>No neighboring genes were detected to be co-transcribed with VBISMc2940, VBISMb0839 or VBISMa1337, while VBISMc3282 was co-transcribed with the downstream gene SMc01535, a hypothetical protein. The other six genes with BLAST hits matched only hypothetical proteins (Table <tblr tid="T2">2</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Gene expression levels</p>
            </st>
            <p>Because the RNA amplification step was linear (Methods), we expect that the cDNA samples we prepared represent relative mRNA levels in the cell. Thus, for a full sequencing run, with high coverage, the number of sequences would be a good indication of gene expression levels. Due to the low coverage of our pilot experiment, we cannot yet estimate gene expression levels based on number of sequences for each gene. However, we expect that most of the genes that showed five or more matches to our transcriptome sequences should be highly expressed in the cell population (Additional file <supplr sid="S3">3</supplr>). The known genes with high sequence copy number are consistent with our knowledge about the high expression level of those genes under the same growth condition (our unpublished microarray data).</p>
            <suppl id="S3">
               <title>
                  <p>Additional file 3</p>
               </title>
               <text>
                  <p>Supplementary Table S3. Genes with high copy number of GS FLX sequences</p>
               </text>
               <file name="1471-2180-8-72-S3.xls">
                  <p>Click here for file</p>
               </file>
            </suppl>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Our study demonstrated that there are many genes missed in the initial genome annotation and it is useful to have large-scale transcriptome analysis to reveal these genes and validate their status. Our results showed that sequencing bacterial transcriptomes using the GS FLX system is feasible and it helps to discover new genes and improve the genome annotation. A full GS FLX sequencing run can produce an average yield of more than 400,000 reads which is 20-fold greater than the yield from our titration run for this study. Even with 90% rRNA population in the sample, there will still be more than 40,000 reads that are non-rRNA transcripts. This provides an average 6X coverage of non-rRNA genes. Our pilot experiment with only 1854 reads already identified 20 new genes. With a full sequencing run, which produces more than 20-fold reads than the titration run, we expect to discover many more new gene transcripts that have been previously missed. However, a full sequencing run with 6X coverage of non-rRNA genes will still not be sufficient to discover all possible new genes expressed, especially for ones with low expression levels, and considering that conditions under which genes are expressed may not be known or studied by any particular set of experiments. According to the previous microarray studies, about 70-80% annotated genes are expressed under the same growth conditions as used in this study (<abbrgrp><abbr bid="B6">6</abbr></abbrgrp> and our unpublished data). Nevertheless, our study suggests two ways to improve the results: the first is to more effectively remove rRNA or use a normalized cDNA library; the second is to employ "deep" sequencing techniques, either by performing multiple GS FLX runs, or by using Illumina <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> or ABI <abbrgrp><abbr bid="B33">33</abbr></abbrgrp> methods which produce millions of reads, but of smaller average length.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Our study indicated that there are still many genes missed in the initial genome annotation of <it>S. meliloti</it>. High-throughput sequence analysis of bacterial transcriptomes is feasible for the identification of new genes. Next-generation sequencing technologies will greatly facilitate the gene discovery process and improve genome annotation.</p>
      </sec>
      <sec>
         <st>
            <p>Methods</p>
         </st>
         <sec>
            <st>
               <p>Cell culture and RNA isolation</p>
            </st>
            <p><it>Sinorhizobium meliloti </it>strain1021 was grown at 30&#176;C in TY medium <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> to mid-exponential phase (OD<sub>600 </sub>= 0.6). Cell growth was stopped by adding 1/9<sup>th </sup>volume of stop solution (5% buffer equilibrated phenol pH 7.4 in ethanol) and placed on ice. Cells were collected by centrifugation in a microcentrifuge at maximum speed for 3 minutes. The cell pellets were stored in -80&#176;C. Total RNA was isolated by using Qiagen RNeasy bacterial RNA purification kit (Qiagen, Valencia, CA). The total RNA was treated with DNase I on mini-RNeasy column before eluted with RNase free water. For RT-PCR experiments, an additional DNase I treatment was done after RNA was eluted from the RNeasy mini column to ensure that there was no genomic DNA contamination. 20 &#956;l of total RNAs eluted from the RNeasy mini-column were treated with 5 &#956;l DNase I (Qiagen) in 10 &#956;l RDD buffer, 1 &#956;l RNase inhibitor (Invitrogen, Carlsbad, CA) and 64 &#956;l RNase free water (Qiagen) at 25&#176;C for 30 minutes. The RNAs were then extracted with phenol/chloroform and precipitated with ethanol using standard protocols. 16s and 23s rRNAs were then depleted using the MICROBExpress&#8482; Bacterial mRNA Enrichment Kit (Ambion, Austin, TX). Total RNAs and rRNA depleted RNAs were quantified and analyzed on the Agilent 2100 Bioanalyzer. 7 &#956;g of total RNA per reaction was used. After 16s and 23s rRNAs were depleted, about 0.5 &#956;g (7%) RNAs was recovered. The total RNA samples had RNA integrity number (RIN value) of 8.0 or better. As shown in Figure <figr fid="F2">2</figr>, more than 90% 16s and 23s rRNAs were depleted. We analyzed more than 10 independent rRNA-depleted preparations on the Agilent 2100 Bioanalyzer. The results were consistent and showed that the 16s and 23s rRNA peaks were greatly reduced in these preparations but not completely removed (Figure <figr fid="F2">2</figr>). In addition, two small peaks immediately before 16s and 23s rRNAs could not be removed by the Ambion MicrobExpress kit. The two peaks were consistently present in all of our RNA preparations.</p>
            <p>Two RNA samples were prepared for RNA amplification. Sample 1 was 16s and 23s rRNA depleted RNA sample as described above. Sample 2 was the 16s and 23s rRNA depleted RNA sample ligated to a 3' RNA adptor (5'-PO<sub>4</sub>-UUCGCUGUUC UUAGCGGCCG CAUGCUC-idT-3'; idT: 3' inverted deoxythymidine) (Dharmacon Research, Lafayette, CO) and a 5' RNA adptor (5'-OH-AUGUGCGCGA CUUCCUGUAG ACGGAACGCU AGAAGAAA-OH-3') (Dharmacon Research). 3' and 5' adaptor ligations were done as described in Argaman <it>et al</it>. 2001 <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>RNA amplification and cDNA preparation</p>
            </st>
            <p>To obtain enough cDNA for sequencing, the 16s and 23s rRNA depleted RNAs (sample 1 and 2) were amplified using Nugen WT-Ovation Pico RNA amplification system <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>. 5 ng of starting RNA was used. The SPIA&#8482; amplified single strand cDNA (2.5 ug) was then taken through a second strand cDNA synthesis, using the following conditions: 5X 2nd strand reaction mix 30 &#956;l (Invitrogen), dNTP, 10 mM 3 &#956;l (Invitrogen), <it>E. coli </it>DNA ligase 1 &#956;l (Invitrogen), <it>E. coli </it>DNA polymerase I 4 &#956;l (Invitrogen), RNase H 1 &#956;l (Invitrogen), RNase-free water 91 &#956;l (Ambion). The reaction mix was incubated at 16&#176;C for 2 hours. The cDNA was then purified using the Qiagen PCR clean up kit resulting 4 ug of cDNA quantified by using the Nanodrop spectrophotometer. 1 ug of cDNA of each sample was size selected (>100 bp) using Roche's GS FLX library Preparation Guide recommendations (no nebulization was necessary due to the size range of the cDNA GS FLX library Preparation Guide), and a single stranded library was created.</p>
         </sec>
         <sec>
            <st>
               <p>GS FLX sequencing and data filtering</p>
            </st>
            <p>The DNA sequencing libraries for the two samples were combined with the sequencing beads in 4 different concentrations to determine the optimal conditions for emPCR amplification. All 8 preparations were sequenced in 8 lanes of a GS FLX sequencing plate using the standard Roche/454 protocols. Sequencing data was obtained after a 7 hour run on the GS FLX. The 54,162 raw reads from GS FLX sequencing run that passed the sample key code filter (initial bases TCAG) were further filtered by the 454 software to eliminate 10,521 mixed reads (with two or more different DNA strands/bead), 15,604 excessively short reads (less than about 50 bp), and 9,032 interrupted reads ("dots"). 35% of the raw reads passed all filters in this titration run to provide the 19,005 high quality reads used in this study.</p>
         </sec>
         <sec>
            <st>
               <p>Sequence analysis and mapping to <it>S. meliloti </it>genome</p>
            </st>
            <p>GS FLX sequences passed filtering criteria were BLASTN-aligned to <it>S. meliloti </it>genome. Sequences with matching to rRNA operons were filtered. The remaining sequences were BLASTed against RefSeq genes and our predicted new genes from our gene annotation pipeline.</p>
         </sec>
         <sec>
            <st>
               <p>RT-PCR</p>
            </st>
            <p>5 &#956;g DNase I treated total RNA was reverse transcribed using superscript II with 4 pmoles of equally mixed gene-specific primers for each candidate gene selected (cm012b-cm0031b, Table S1). Primers were designed using Primer3 <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. For PCR, each 40 &#956;l reaction includes 0.5 &#956;l of 40 &#956;l reverse transcription reaction, 20 &#956;l of 2X GoTaq Green Master Mix (Promaga, Madison, WI) and 0.5 &#956;M of primer pair of each gene (IDT, Coralville, IA). PCR conditions were 95&#176;C 2 min, 30 cycles of 95&#176;C 45 s, 52&#176;C 45 s, 72&#176;C 60 s, and a final cycle of 72&#176;C for 10 min. PCR products were examined by electrophoresis in a 2.5% agarose/TAE/EtBr gel. Sequencing was performed using the BigDye Terminator Cycle Sequencing Kit (Applied Biosystems) and analyzed on an Applied Biosystems model 3730 automated capillary DNA sequencer.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>CM designed and executed the experiments, performed data analysis and drafted the manuscript. CE developed the RNA amplification method and performed RNA amplification and cDNA sample preparation for GS FLX sequencing. RVJ helped with data analysis. CE and RVJ wrote portions of the Methods. BWSS and RVJ critically edited and revised the manuscript. BWSS provided funding, coordination and oversight of the project. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We thank Timothy Driscoll for helpful discussions, Jessica Kraszewski for assistance in the GS FLX runs and Chunxia Wang for providing microarray data. This work was funded by the Commonwealth Research Initiative (CRI) from the Commonwealth of Virginia and by the Virginia Bioinformatics Institute.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid</p>
            </title>
            <aug>
               <au>
                  <snm>Barnett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Komp</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Abola</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bowser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Capela</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gurjal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huizar</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hyman</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Kalman</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Keating</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Palm</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Peck</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Surzycki</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Yeh</snm>
                  <fnm>KC</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Federspiel</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>17</issue>
            <fpage>9883</fpage>
            <lpage>9888</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55547</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481432</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294798</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Analysis of the chromosome sequence of the legume symbiont Sinorhizobium meliloti strain 1021</p>
            </title>
            <aug>
               <au>
                  <snm>Capela</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bothe</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ampe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Batut</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Boistard</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boutry</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cadieu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Dreano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gloux</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Godrie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Goffeau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kahn</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kiss</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Lelaure</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Masuy</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Pohl</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Portetelle</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Puhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Purnelle</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Ramsperger</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Renard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Thebault</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vandenbol</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weidner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>17</issue>
            <fpage>9877</fpage>
            <lpage>9882</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55546</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481430</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294398</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont Sinorhizobium meliloti</p>
            </title>
            <aug>
               <au>
                  <snm>Finan</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Weidner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wong</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Buhrmester</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chain</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Vorholter</snm>
                  <fnm>FJ</fnm>
               </au>
               <au>
                  <snm>Hernandez-Lucas</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Cowie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Puhler</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2001</pubdate>
            <volume>98</volume>
            <issue>17</issue>
            <fpage>9889</fpage>
            <lpage>9894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">55548</pubid>
                  <pubid idtype="pmpid" link="fulltext">11481431</pubid>
                  <pubid idtype="doi">10.1073/pnas.161294698</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>The composite genome of the legume symbiont Sinorhizobium meliloti</p>
            </title>
            <aug>
               <au>
                  <snm>Galibert</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Finan</snm>
                  <fnm>TM</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Puhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Abola</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Ampe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barloy-Hubler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Barnett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Boistard</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bothe</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Boutry</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bowser</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Buhrmester</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cadieu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Capela</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chain</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Cowie</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Dreano</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Federspiel</snm>
                  <fnm>NA</fnm>
               </au>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Gloux</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Godrie</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Goffeau</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Gurjal</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Batut</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2001</pubdate>
            <volume>293</volume>
            <issue>5530</issue>
            <fpage>668</fpage>
            <lpage>672</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1060966</pubid>
                  <pubid idtype="pmpid" link="fulltext">11474104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins</p>
            </title>
            <aug>
               <au>
                  <snm>Pruitt</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Tatusova</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Maglott</snm>
                  <fnm>DR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>Database issue</issue>
            <fpage>D61</fpage>
            <lpage>5</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716718</pubid>
                  <pubid idtype="pmpid" link="fulltext">17130148</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl842</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>A dual-genome Symbiosis Chip for coordinate study of signal exchange and development in a prokaryote-host interaction</p>
            </title>
            <aug>
               <au>
                  <snm>Barnett</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Toman</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Fisher</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>47</issue>
            <fpage>16636</fpage>
            <lpage>16641</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">527922</pubid>
                  <pubid idtype="pmpid" link="fulltext">15542588</pubid>
                  <pubid idtype="doi">10.1073/pnas.0407269101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Construction and validation of a Sinorhizobium meliloti whole genome DNA microarray: genome-wide profiling of osmoadaptive gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Ruberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>ZX</fnm>
               </au>
               <au>
                  <snm>Krol</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Linke</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Puhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Weidner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Becker</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>J Biotechnol</source>
            <pubdate>2003</pubdate>
            <volume>106</volume>
            <issue>2-3</issue>
            <fpage>255</fpage>
            <lpage>268</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.jbiotec.2003.08.005</pubid>
                  <pubid idtype="pmpid" link="fulltext">14651866</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genome sequencing in microfabricated high-density picolitre reactors</p>
            </title>
            <aug>
               <au>
                  <snm>Margulies</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Egholm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Altman</snm>
                  <fnm>WE</fnm>
               </au>
               <au>
                  <snm>Attiya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bader</snm>
                  <fnm>JS</fnm>
               </au>
               <au>
                  <snm>Bemben</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Berka</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Braverman</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>YJ</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Dewell</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fierro</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gomes</snm>
                  <fnm>XV</fnm>
               </au>
               <au>
                  <snm>Godwin</snm>
                  <fnm>BC</fnm>
               </au>
               <au>
                  <snm>Begley</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2005</pubdate>
            <volume>437</volume>
            <issue>7057</issue>
            <fpage>376</fpage>
            <lpage>380</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1464427</pubid>
                  <pubid idtype="pmpid" link="fulltext">16056220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Analysis of one million base pairs of Neanderthal DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Green</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Krause</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Ptak</snm>
                  <fnm>SE</fnm>
               </au>
               <au>
                  <snm>Briggs</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Ronan</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Simons</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Du</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Egholm</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rothberg</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Paunovic</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Paabo</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2006</pubdate>
            <volume>444</volume>
            <issue>7117</issue>
            <fpage>330</fpage>
            <lpage>336</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nature05336</pubid>
                  <pubid idtype="pmpid" link="fulltext">17108958</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The complete genome sequence of Campylobacter jejuni strain 81116 (NCTC11828)</p>
            </title>
            <aug>
               <au>
                  <snm>Pearson</snm>
                  <fnm>BM</fnm>
               </au>
               <au>
                  <snm>Gaskin</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Segers</snm>
                  <fnm>RP</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Nuijten</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>van Vliet</snm>
                  <fnm>AH</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>2007</pubdate>
            <volume>189</volume>
            <issue>22</issue>
            <fpage>8402</fpage>
            <lpage>8403</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">2168669</pubid>
                  <pubid idtype="pmpid" link="fulltext">17873037</pubid>
                  <pubid idtype="doi">10.1128/JB.01404-07</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA</p>
            </title>
            <aug>
               <au>
                  <snm>Poinar</snm>
                  <fnm>HN</fnm>
               </au>
               <au>
                  <snm>Schwarz</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Qi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shapiro</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Macphee</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Buigues</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tikhonov</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Huson</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Tomsho</snm>
                  <fnm>LP</fnm>
               </au>
               <au>
                  <snm>Auch</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rampp</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schuster</snm>
                  <fnm>SC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2006</pubdate>
            <volume>311</volume>
            <issue>5759</issue>
            <fpage>392</fpage>
            <lpage>394</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1123360</pubid>
                  <pubid idtype="pmpid" link="fulltext">16368896</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Analysis of the prostate cancer cell line LNCaP transcriptome using a sequencing-by-synthesis approach</p>
            </title>
            <aug>
               <au>
                  <snm>Bainbridge</snm>
                  <fnm>MN</fnm>
               </au>
               <au>
                  <snm>Warren</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Hirst</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Romanuik</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zeng</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Go</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Delaney</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Griffith</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hickenbotham</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Magrini</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Mardis</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Sadar</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Siddiqui</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Marra</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Jones</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2006</pubdate>
            <volume>7</volume>
            <fpage>246</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1592491</pubid>
                  <pubid idtype="pmpid" link="fulltext">17010196</pubid>
                  <pubid idtype="doi">10.1186/1471-2164-7-246</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Diversity of microRNAs in human and chimpanzee brain</p>
            </title>
            <aug>
               <au>
                  <snm>Berezikov</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thuemmler</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>van Laake</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Kondova</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Bontrop</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Cuppen</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Plasterk</snm>
                  <fnm>RH</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2006</pubdate>
            <volume>38</volume>
            <issue>12</issue>
            <fpage>1375</fpage>
            <lpage>1377</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1914</pubid>
                  <pubid idtype="pmpid" link="fulltext">17072315</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Gene discovery and annotation using LCM-454 transcriptome sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Emrich</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Barbazuk</snm>
                  <fnm>WB</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Schnable</snm>
                  <fnm>PS</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2007</pubdate>
            <volume>17</volume>
            <issue>1</issue>
            <fpage>69</fpage>
            <lpage>73</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1716268</pubid>
                  <pubid idtype="pmpid" link="fulltext">17095711</pubid>
                  <pubid idtype="doi">10.1101/gr.5145806</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Robust analysis of 5'-transcript ends (5'-RATE): a novel technique for transcriptome analysis and genome annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Gowda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Alessi</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pratt</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>GL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <issue>19</issue>
            <fpage>e126</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1636456</pubid>
                  <pubid idtype="pmpid" link="fulltext">17012272</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl522</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Finding novel genes in bacterial communities isolated from the environment</p>
            </title>
            <aug>
               <au>
                  <snm>Krause</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Diaz</snm>
                  <fnm>NN</fnm>
               </au>
               <au>
                  <snm>Bartels</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Edwards</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Puhler</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Rohwer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Stoye</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2006</pubdate>
            <volume>22</volume>
            <issue>14</issue>
            <fpage>e281</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btl247</pubid>
                  <pubid idtype="pmpid" link="fulltext">16873483</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Microbial diversity in the deep sea and the underexplored "rare biosphere"</p>
            </title>
            <aug>
               <au>
                  <snm>Sogin</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Morrison</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Mark Welch</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Huse</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Neal</snm>
                  <fnm>PR</fnm>
               </au>
               <au>
                  <snm>Arrieta</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Herndl</snm>
                  <fnm>GJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <issue>32</issue>
            <fpage>12115</fpage>
            <lpage>12120</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1524930</pubid>
                  <pubid idtype="pmpid" link="fulltext">16880384</pubid>
                  <pubid idtype="doi">10.1073/pnas.0605127103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Ultradeep bisulfite sequencing analysis of DNA methylation patterns in multiple gene promoters by 454 sequencing</p>
            </title>
            <aug>
               <au>
                  <snm>Taylor</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Kramer</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Guo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Duff</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Xu</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Caldwell</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Cancer Res</source>
            <pubdate>2007</pubdate>
            <volume>67</volume>
            <issue>18</issue>
            <fpage>8511</fpage>
            <lpage>8518</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1158/0008-5472.CAN-07-1016</pubid>
                  <pubid idtype="pmpid" link="fulltext">17875690</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>PATRIC: the VBI PathoSystems Resource Integration Center</p>
            </title>
            <aug>
               <au>
                  <snm>Snyder</snm>
                  <fnm>EE</fnm>
               </au>
               <au>
                  <snm>Kampanya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Lu</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nordberg</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Karur</snm>
                  <fnm>HR</fnm>
               </au>
               <au>
                  <snm>Shukla</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Soneja</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Tian</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Xue</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Yoo</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Dharmanolla</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dongre</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Gillespie</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Hamelius</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hance</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huntington</snm>
                  <fnm>KI</fnm>
               </au>
               <au>
                  <snm>Jukneliene</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Setubal</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Sobral</snm>
                  <fnm>BW</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <issue>Database issue</issue>
            <fpage>D401</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1669763</pubid>
                  <pubid idtype="pmpid" link="fulltext">17142235</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl858</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Improved microbial gene identification with GLIMMER</p>
            </title>
            <aug>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Harmon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <issue>23</issue>
            <fpage>4636</fpage>
            <lpage>4641</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">148753</pubid>
                  <pubid idtype="pmpid" link="fulltext">10556321</pubid>
                  <pubid idtype="doi">10.1093/nar/27.23.4636</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Recognition of genes in DNA sequence with ambiguities</p>
            </title>
            <aug>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McIninch</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Biosystems</source>
            <pubdate>1993</pubdate>
            <volume>30</volume>
            <issue>1-3</issue>
            <fpage>161</fpage>
            <lpage>171</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0303-2647(93)90068-N</pubid>
                  <pubid idtype="pmpid">8374073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>GeneMark.hmm: new solutions for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Lukashin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <issue>4</issue>
            <fpage>1107</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">147337</pubid>
                  <pubid idtype="pmpid" link="fulltext">9461475</pubid>
                  <pubid idtype="doi">10.1093/nar/26.4.1107</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>TICO: a tool for improving predictions of prokaryotic translation initiation sites</p>
            </title>
            <aug>
               <au>
                  <snm>Tech</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Pfeifer</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Morgenstern</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Meinicke</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <issue>17</issue>
            <fpage>3568</fpage>
            <lpage>3569</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti563</pubid>
                  <pubid idtype="pmpid" link="fulltext">15994191</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>A probabilistic method for identifying start codons in bacterial genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Suzek</snm>
                  <fnm>BE</fnm>
               </au>
               <au>
                  <snm>Ermolaeva</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Schreiber</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2001</pubdate>
            <volume>17</volume>
            <issue>12</issue>
            <fpage>1123</fpage>
            <lpage>1130</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/17.12.1123</pubid>
                  <pubid idtype="pmpid" link="fulltext">11751220</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <issue>17</issue>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>NCBI</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>The generic genome browser: a building block for a model organism system database</p>
            </title>
            <aug>
               <au>
                  <snm>Stein</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Shu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Caudy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Mangone</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Day</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nickerson</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Stajich</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>TW</fnm>
               </au>
               <au>
                  <snm>Arva</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <issue>10</issue>
            <fpage>1599</fpage>
            <lpage>1610</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">187535</pubid>
                  <pubid idtype="pmpid" link="fulltext">12368253</pubid>
                  <pubid idtype="doi">10.1101/gr.403602</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>SwissProt</p>
            </title>
            <url>http://www.ebi.ac.uk/swissprot</url>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Identification of common molecular subsequences</p>
            </title>
            <aug>
               <au>
                  <snm>Smith</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Waterman</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1981</pubdate>
            <volume>147</volume>
            <issue>1</issue>
            <fpage>195</fpage>
            <lpage>197</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0022-2836(81)90087-5</pubid>
                  <pubid idtype="pmpid" link="fulltext">7265238</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>An insertional point mutation inactivates NolR repressor in Rhizobium meliloti 1021</p>
            </title>
            <aug>
               <au>
                  <snm>Cren</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kondorosi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kondorosi</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>J Bacteriol</source>
            <pubdate>1994</pubdate>
            <volume>176</volume>
            <issue>2</issue>
            <fpage>518</fpage>
            <lpage>519</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">205077</pubid>
                  <pubid idtype="pmpid" link="fulltext">8288547</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Analysis of differences between Sinorhizobium meliloti 1021 and 2011 strains using the host calcium spiking response</p>
            </title>
            <aug>
               <au>
                  <snm>Wais</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Wells</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>SR</fnm>
               </au>
            </aug>
            <source>Mol Plant Microbe Interact</source>
            <pubdate>2002</pubdate>
            <volume>15</volume>
            <issue>12</issue>
            <fpage>1245</fpage>
            <lpage>1252</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1094/MPMI.2002.15.12.1245</pubid>
                  <pubid idtype="pmpid" link="fulltext">12481997</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Illumina</p>
            </title>
            <url>http://www.illumina.com</url>
         </bibl>
         <bibl id="B33">
            <title>
               <p>ABI</p>
            </title>
            <url>http://www.appliedbiosystems.com</url>
         </bibl>
         <bibl id="B34">
            <title>
               <p>R factor transfer in Rhizobium leguminosarum</p>
            </title>
            <aug>
               <au>
                  <snm>Beringer</snm>
                  <fnm>JE</fnm>
               </au>
            </aug>
            <source>J Gen Microbiol</source>
            <pubdate>1974</pubdate>
            <volume>84</volume>
            <issue>1</issue>
            <fpage>188</fpage>
            <lpage>198</lpage>
            <xrefbib>
               <pubid idtype="pmpid">4612098</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Novel small RNA-encoding genes in the intergenic regions of Escherichia coli</p>
            </title>
            <aug>
               <au>
                  <snm>Argaman</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hershberg</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bejerano</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Wagner</snm>
                  <fnm>EG</fnm>
               </au>
               <au>
                  <snm>Margalit</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Altuvia</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Curr Biol</source>
            <pubdate>2001</pubdate>
            <volume>11</volume>
            <issue>12</issue>
            <fpage>941</fpage>
            <lpage>950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0960-9822(01)00270-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">11448770</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Nugen</p>
            </title>
            <url>http://www.nugeninc.com</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>Primer3 on the WWW for general users and for biologist programmers</p>
            </title>
            <aug>
               <au>
                  <snm>Rozen</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Skaletsky</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Methods Mol Biol</source>
            <pubdate>2000</pubdate>
            <volume>132</volume>
            <fpage>365</fpage>
            <lpage>386</lpage>
            <xrefbib>
               <pubid idtype="pmpid">10547847</pubid>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
