<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-2-r20</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Interrupted coding sequences in <it>Mycobacterium smegmatis</it>: authentic mutations or sequencing errors?</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Deshayes</snm>
               <fnm>Caroline</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>deshayes@necker.fr</email>
            </au>
            <au id="A2">
               <snm>Perrodou</snm>
               <fnm>Emmanuel</fnm>
               <insr iid="I3"/>
               <email>perrodou@titus.u-strasbg.fr</email>
            </au>
            <au id="A3">
               <snm>Gallien</snm>
               <fnm>Sebastien</fnm>
               <insr iid="I4"/>
               <email>sgallien@chimie.u-strasbg.fr</email>
            </au>
            <au id="A4">
               <snm>Euphrasie</snm>
               <fnm>Daniel</fnm>
               <insr iid="I1"/>
               <email>euphrasi@necker.fr</email>
            </au>
            <au id="A5">
               <snm>Schaeffer</snm>
               <fnm>Christine</fnm>
               <insr iid="I4"/>
               <email>cSchaeff@chimie.u-strasbg.fr</email>
            </au>
            <au id="A6">
               <snm>Van-Dorsselaer</snm>
               <fnm>Alain</fnm>
               <insr iid="I4"/>
               <email>vandors@chimie.u-strasbg.fr</email>
            </au>
            <au id="A7">
               <snm>Poch</snm>
               <fnm>Olivier</fnm>
               <insr iid="I3"/>
               <email>poch@titus.u-strasbg.fr</email>
            </au>
            <au id="A8">
               <snm>Lecompte</snm>
               <fnm>Odile</fnm>
               <insr iid="I3"/>
               <email>lecompte@titus.u-strasbg.fr</email>
            </au>
            <au id="A9" ca="yes">
               <snm>Reyrat</snm>
               <fnm>Jean-Marc</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>jmreyrat@necker.fr</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Universit&#233; Paris Descartes, Facult&#233; de M&#233;decine Ren&#233; Descartes, Paris Cedex 15, F-75730, France</p>
            </ins>
            <ins id="I2">
               <p>Inserm, U570, Unit&#233; de Pathog&#233;nie des Infections Syst&#233;miques-Groupe AVENIR, Paris Cedex 15, F-75730, France</p>
            </ins>
            <ins id="I3">
               <p>Laboratoire de Biologie et G&#233;nomique Structurales, IGBMC CNRS/INSERM/ULP, BP 163, 67404 Illkirch Cedex, France</p>
            </ins>
            <ins id="I4">
               <p>Laboratoire de Spectrom&#233;trie de Masse Bio-Organique, UMR7178, ECPM, rue Becquerel, Strasbourg, F-67087 cedex 2, France</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>2</issue>
         <fpage>R20</fpage>
         <url>http://genomebiology.com/2007/8/2/R20</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17295914</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-2-r20</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>7</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>20</day>
               <month>11</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>12</day>
               <month>02</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>12</day>
               <month>02</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Deshayes et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Interrupted coding sequences in <it>Mycobacterium smegmatis</it></p>
      </shorttitle>
      <shortabs>
         <p>The question of whether bacterial interrupted coding sequences (ICDS) should be individually verified to produce an informative genome sequence is raised after bioinformatic, proteomic and sequencing analyses reveal that a significant proportion of ICDSs in the deposited genome sequence of <it>Mycobacterium smegmatis </it>are a result of sequencing errors.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p><it>In silico </it>analysis has shown that all bacterial genomes contain a low percentage of ORFs with undetected frameshifts and in-frame stop codons. These interrupted coding sequences (ICDSs) may really be present in the organism or may result from misannotation based on sequencing errors. The reality or otherwise of these sequences has major implications for all subsequent functional characterization steps, including module prediction, comparative genomics and high-throughput proteomic projects.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>We show here, using <it>Mycobacterium smegmatis </it>as a model species, that a significant proportion of these ICDSs result from sequencing errors. We used a resequencing procedure and mass spectrometry analysis to determine the nature of a number of ICDSs in this organism. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>The correction of these errors results in modification of the predicted amino acid sequences of the corresponding proteins and changes in annotation. We suggest that each bacterial ICDS should be investigated individually, to determine its true status and to ensure that the genome sequence is appropriate for comparative genomics analyses.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>More than 250 complete bacterial genome sequences are now available, providing unprecedented opportunities for investigating gene and protein functions <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. The introduction of errors at the first stage of genome sequencing and gene prediction has a major impact on all subsequent studies. One source of errors in genome annotation is the sequence itself. The development of programs identifying position-specific errors has considerably increased the quality of genomic sequences <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. These errors may introduce stop codons or 'artificial' frameshifts in the coding region that are easily detected by computer-assisted methods <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. Such sequence errors lead to errors in annotation and comparison. An <it>in silico </it>survey of the published bacterial genomes shows that most contain interrupted coding sequences (ICDSs) <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. They occur at low frequency, between 2 and 258 per Mb, not correlated with the size or GC content of the genome. A mean of 74 ICDSs were identified per prokaryotic genome tested <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. If this is translated into ICDSs per total coding sequences, a figure of 1% to 5% is obtained, with similar figures reported by various independent studies <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B8">8</abbr></abbrgrp>. The only notable exception is <it>Mycobacterium leprae</it>, which has 30% ICDSs, frequently described as pseudogenes <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. ICDSs may be present in genes of known or unknown function. A number of bacterial species are known to have developed sophisticated mechanisms for bypassing frameshifts and restoring the correct reading frame, but such mechanisms are unlikely to be general <abbrgrp><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>. Moreover, the frameshifts bypassed by the ribosome are generally preceded by a unique sequence that can be identified <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>. Thus, the detected ICDSs may either reflect the real genome sequence of the organism, with all the ensuing consequences for the composition of the encoded protein, or they may result from sequencing errors.</p>
         <p>We used <it>M. smegmatis </it>mc<sup>2</sup>155 as the model species for this study. This saprophytic bacterium, which is often used as a model organism for studies of <it>M. tuberculosis </it>functions, has recently been sequenced <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. By resequencing the ICDSs of this strain, we show that the genome sequence of this organism contains multiple errors. We systematically corrected the errors, and in all cases, these corrections rendered the predicted protein more similar to its ortholog. We also confirm, by a combined proteome and mass spectrometry analysis, that the sequences of some proteins have been incorrectly predicted due to sequencing errors. However, several ICDSs do correspond to true frameshifts. Authentic frameshifts provide a positive addition to our knowledge and make it possible to investigate gene and protein function, whereas sequencing errors generate false knowledge and confound comparative analyses. We show here that the individual analysis of ICDSs can lead to re-evaluation of the annotation of the genome and the proteome. We suggest that each bacterial ICDS should be investigated individually to ascertain its status and to produce a genome sequence suitable for productive comparative genomics.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <sec>
            <st>
               <p>ICDSs in <it>M. smegmatis </it>mc<sup>2</sup>155: a resequencing analysis</p>
            </st>
            <p>An <it>in silico </it>analysis of the genome of <it>M. smegmatis </it>mc<sup>2</sup>155 revealed that it contains 94 ICDSs <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The ICDS database was created using a program based on the analysis of physically adjacent genes to predict putative ICDSs in complete genomes. Briefly, pairs of adjacent genes with at least one common homolog are defined as 'coding sequences (CDSs) containing common hits' and may correspond to a pair of adjacent paralogs or ICDSs. We excluded paralogs from the analysis by searching for sequence similarity between the two 'CDSs containing common hits'. The remaining CDSs are considered to be ICDSs, indicating frameshifts or in-frame stop codon insertion, due to sequencing errors or authentic events. These 94 ICDSs account for 1.4% of the total coding capacity of this organism. They may result from mutations acquired during evolution or from errors in genome sequencing.</p>
            <p>We resequenced the genome of this strain to determine the status of these ICDSs. We did not resequence 21 ICDSs due to the duplication of some open reading frames (ORFs) or high levels of paralogy. The remaining 73 ICDSs were amplified and sequenced on both strands. We compared the nucleotide sequences obtained with the publicly available genome sequence of <it>M. smegmatis </it>mc<sup>2</sup>155. We found that 28 of the 73 ICDSs investigated correspond to sequencing errors (Table <tblr tid="T1">1</tblr>). These 28 genes containing sequencing errors correspond to 4 errors per megabase in the complete genome. In most cases, correction of the error reunified two adjacent ORFs, resulting in a single ORF rather than the two small ORFs of the original sequence (Figure <figr fid="F1">1</figr>).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>ICDSs shown by resequencing to correspond to sequencing errors in <it>M. smegmatis </it>mc<sup>2</sup>155</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>ICDS number</p>
                     </c>
                     <c ca="center">
                        <p>5' position</p>
                     </c>
                     <c ca="left">
                        <p>ORF number</p>
                     </c>
                     <c ca="left">
                        <p>Putative function</p>
                     </c>
                     <c ca="left">
                        <p>Functional classification</p>
                     </c>
                     <c ca="left">
                        <p>Accession number</p>
                     </c>
                     <c ca="center">
                        <p>Type of event</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0012</p>
                     </c>
                     <c ca="center">
                        <p>1639371</p>
                     </c>
                     <c ca="left">
                        <p>1547</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866846">DQ866846</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0019</p>
                     </c>
                     <c ca="center">
                        <p>1918521</p>
                     </c>
                     <c ca="left">
                        <p>1842-1843</p>
                     </c>
                     <c ca="left">
                        <p>Adenosylhomocysteinase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866847">DQ866847</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0022</p>
                     </c>
                     <c ca="center">
                        <p>1930746</p>
                     </c>
                     <c ca="left">
                        <p>1854-1855</p>
                     </c>
                     <c ca="left">
                        <p>Sodium/proton antiporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866848">DQ866848</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0024</p>
                     </c>
                     <c ca="center">
                        <p>2055797</p>
                     </c>
                     <c ca="left">
                        <p>1975-1976</p>
                     </c>
                     <c ca="left">
                        <p>Methane/phenol/toluene hydroxylase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866849">DQ866849</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0026</p>
                     </c>
                     <c ca="center">
                        <p>2119141</p>
                     </c>
                     <c ca="left">
                        <p>2042</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866850">DQ866850</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0027</p>
                     </c>
                     <c ca="center">
                        <p>2162020</p>
                     </c>
                     <c ca="left">
                        <p>2086-2087</p>
                     </c>
                     <c ca="left">
                        <p>Ferredoxin-NADP reductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866851">DQ866851</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0028</p>
                     </c>
                     <c ca="center">
                        <p>2221312</p>
                     </c>
                     <c ca="left">
                        <p>2149-2150</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866852">DQ866852</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0030</p>
                     </c>
                     <c ca="center">
                        <p>2290855</p>
                     </c>
                     <c ca="left">
                        <p>2215-2216</p>
                     </c>
                     <c ca="left">
                        <p>CoA-transferase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866853">DQ866853</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0035</p>
                     </c>
                     <c ca="center">
                        <p>2799279</p>
                     </c>
                     <c ca="left">
                        <p>2732-2733</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866854">DQ866854</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0039</p>
                     </c>
                     <c ca="center">
                        <p>3216877</p>
                     </c>
                     <c ca="left">
                        <p>3151</p>
                     </c>
                     <c ca="left">
                        <p>Aconitate hydratase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866855">DQ866855</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O (&#215; 2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0040</p>
                     </c>
                     <c ca="center">
                        <p>3262835</p>
                     </c>
                     <c ca="left">
                        <p>3192-3193</p>
                     </c>
                     <c ca="left">
                        <p>Maltooligosyltrehalose synthase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866856">DQ866856</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0041</p>
                     </c>
                     <c ca="center">
                        <p>3313327</p>
                     </c>
                     <c ca="left">
                        <p>3240</p>
                     </c>
                     <c ca="left">
                        <p>ABC transporter (CydC)</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866857">DQ866857</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0051</p>
                     </c>
                     <c ca="center">
                        <p>3902349</p>
                     </c>
                     <c ca="left">
                        <p>3837</p>
                     </c>
                     <c ca="left">
                        <p>Dephospho-CoA kinase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866858">DQ866858</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O (&#215; 2)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0053</p>
                     </c>
                     <c ca="center">
                        <p>3961899</p>
                     </c>
                     <c ca="left">
                        <p>3892-3893</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator</p>
                     </c>
                     <c ca="left">
                        <p>Regulation</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866859">DQ866859</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0054</p>
                     </c>
                     <c ca="center">
                        <p>4017126</p>
                     </c>
                     <c ca="left">
                        <p>3952-3953</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866860">DQ866860</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0057</p>
                     </c>
                     <c ca="center">
                        <p>4255762</p>
                     </c>
                     <c ca="left">
                        <p>4183</p>
                     </c>
                     <c ca="left">
                        <p>Pyruvate dehydrogenase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866861">DQ866861</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0058</p>
                     </c>
                     <c ca="center">
                        <p>4288648</p>
                     </c>
                     <c ca="left">
                        <p>4211-4212</p>
                     </c>
                     <c ca="left">
                        <p>Nitrate reductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866862">DQ866862</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0061</p>
                     </c>
                     <c ca="center">
                        <p>4637174</p>
                     </c>
                     <c ca="left">
                        <p>4539-4540</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866863">DQ866863</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0072</p>
                     </c>
                     <c ca="center">
                        <p>5644787</p>
                     </c>
                     <c ca="left">
                        <p>5533-5534</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866864">DQ866864</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0073</p>
                     </c>
                     <c ca="center">
                        <p>5855980</p>
                     </c>
                     <c ca="left">
                        <p>5754</p>
                     </c>
                     <c ca="left">
                        <p>Acetyltransferase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866865">DQ866865</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0076</p>
                     </c>
                     <c ca="center">
                        <p>6078397</p>
                     </c>
                     <c ca="left">
                        <p>5970-5971</p>
                     </c>
                     <c ca="left">
                        <p>Fatty-acid CoA synthetase</p>
                     </c>
                     <c ca="left">
                        <p>Lipid metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866866">DQ866866</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0080</p>
                     </c>
                     <c ca="center">
                        <p>6600510</p>
                     </c>
                     <c ca="left">
                        <p>6504-6505</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866867">DQ866867</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0082</p>
                     </c>
                     <c ca="center">
                        <p>6670969</p>
                     </c>
                     <c ca="left">
                        <p>6579</p>
                     </c>
                     <c ca="left">
                        <p>Helicase</p>
                     </c>
                     <c ca="left">
                        <p>DNA metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866868">DQ866868</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0083</p>
                     </c>
                     <c ca="center">
                        <p>6673489</p>
                     </c>
                     <c ca="left">
                        <p>6581</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866869">DQ866869</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0089</p>
                     </c>
                     <c ca="center">
                        <p>342400</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>Methyltransferase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866870">DQ866870</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0091</p>
                     </c>
                     <c ca="center">
                        <p>601272</p>
                     </c>
                     <c ca="left">
                        <p>0511-0512</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866871">DQ866871</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0092</p>
                     </c>
                     <c ca="center">
                        <p>809979</p>
                     </c>
                     <c ca="left">
                        <p>0716-0717</p>
                     </c>
                     <c ca="left">
                        <p>Transcriptional regulator</p>
                     </c>
                     <c ca="left">
                        <p>Regulation</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866872">DQ866872</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>U</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0093</p>
                     </c>
                     <c ca="center">
                        <p>428949</p>
                     </c>
                     <c ca="left">
                        <p>1395-1396</p>
                     </c>
                     <c ca="left">
                        <p>Elongation factor G</p>
                     </c>
                     <c ca="left">
                        <p>Translation</p>
                     </c>
                     <c ca="left">
                        <p>
                           <ext-link ext-link-type="gen" ext-link-id="DQ866873">DQ866873</ext-link>
                        </p>
                     </c>
                     <c ca="center">
                        <p>O</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The nucleotide position, the affected ORF (according to the TIGR website), its putative function computed after the correction of the sequencing errors, its functional classification and its accession number are indicated for each ICDS. The asterisk indicates an ORF not predicted by TIGR. Two types of error were observed: overcall (O), an extra nucleotide not present in the target sequence was initially predicted at a given position; and undercall (U), a nucleotide corresponding to a true target sequence was not predicted at a given position.</p>
               </tblfn>
            </tbl>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Scheme for ICDS detection and resolution strategy</p>
               </caption>
               <text>
                  <p>Scheme for ICDS detection and resolution strategy. <b>(a) </b>ICDSs are detected within the genome by <it>in silico </it>analysis. The double daggers (&#8225;) indicate the regions containing the identified frameshift. Upon resolution by sequencing and mass spectrometry analysis, the ICDSs can be classified as <b>(b) </b>true frameshifts or <b>(c) </b>sequencing errors. The hash symbol (#) indicates the region of the ORF containing the frameshift. The asterisks (*) indicate sites of corrected sequencing errors resulting in the reconstitution of a full-length ORF. The ORFs are depicted with arrows. The ORF may or may not be in the same frame. Proteins are represented by ellipses.</p>
               </text>
               <graphic file="gb-2007-8-2-r20-1"/>
            </fig>
            <p>Three types of error can be distinguished: miscall, overcall and undercall (Table <tblr tid="T1">1</tblr>) <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. However, no miscalls (incorrect prediction of a specific nucleotide at a given position) were observed within the 28 sequences containing errors, due to the nature of the program used. The predicted amino acid sequences derived from the corrected nucleotide sequences differed greatly from the original predicted sequences and, in all cases, were systematically more similar to their orthologs. In one case (ICDS0089), the ORF containing the frameshift was not even predicted; the frameshift was probably responsible for the non-assignment of this ORF. The genes affected by the sequencing errors encode proteins of several classes, including 'unknown', 'intermediary metabolism', 'regulation' and 'lipid metabolism' (Table <tblr tid="T1">1</tblr>). The genes containing frameshifts encode proteins of several classes, including all of those cited above (Table <tblr tid="T2">2</tblr>). No particular pattern of nucleotides was associated with the 28 sequences containing errors or with the 45 sequences containing frameshifts.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>ICDSs shown by resequencing to correspond to authentic mutations in both <it>M. smegmatis </it>mc<sup>2</sup>155 and ATCC607</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>ICDS number</p>
                     </c>
                     <c ca="center">
                        <p>5' position</p>
                     </c>
                     <c ca="left">
                        <p>ORF number</p>
                     </c>
                     <c ca="left">
                        <p>Putative function</p>
                     </c>
                     <c ca="left">
                        <p>Functional classification</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0003</p>
                     </c>
                     <c ca="center">
                        <p>1169121</p>
                     </c>
                     <c ca="left">
                        <p>1094-1095</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0004</p>
                     </c>
                     <c ca="center">
                        <p>1232918</p>
                     </c>
                     <c ca="left">
                        <p>1164-1165</p>
                     </c>
                     <c ca="left">
                        <p>Arsenic resistance protein</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0005</p>
                     </c>
                     <c ca="center">
                        <p>1277324</p>
                     </c>
                     <c ca="left">
                        <p>1200-1201</p>
                     </c>
                     <c ca="left">
                        <p>Glycosyltransferase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0006</p>
                     </c>
                     <c ca="center">
                        <p>1304141</p>
                     </c>
                     <c ca="left">
                        <p>1226-1227</p>
                     </c>
                     <c ca="left">
                        <p>ABC transporter (permease)</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0007</p>
                     </c>
                     <c ca="center">
                        <p>1508649</p>
                     </c>
                     <c ca="left">
                        <p>1403-1404</p>
                     </c>
                     <c ca="left">
                        <p>Sodium/proton antiporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0008</p>
                     </c>
                     <c ca="center">
                        <p>1510156</p>
                     </c>
                     <c ca="left">
                        <p>1405-1406</p>
                     </c>
                     <c ca="left">
                        <p>Arginine/ornithine antiporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0009</p>
                     </c>
                     <c ca="center">
                        <p>1510156</p>
                     </c>
                     <c ca="left">
                        <p>1405-1407</p>
                     </c>
                     <c ca="left">
                        <p>Arginine/ornithine antiporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0010</p>
                     </c>
                     <c ca="center">
                        <p>1510315</p>
                     </c>
                     <c ca="left">
                        <p>1406-1407</p>
                     </c>
                     <c ca="left">
                        <p>Arginine/ornithine antiporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0011</p>
                     </c>
                     <c ca="center">
                        <p>1545509</p>
                     </c>
                     <c ca="left">
                        <p>1447</p>
                     </c>
                     <c ca="left">
                        <p>Secreted immunogenic protein (Mpt70)</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0013</p>
                     </c>
                     <c ca="center">
                        <p>1645546</p>
                     </c>
                     <c ca="left">
                        <p>1552-1553</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0014</p>
                     </c>
                     <c ca="center">
                        <p>1650143</p>
                     </c>
                     <c ca="left">
                        <p>1557-1558</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0015</p>
                     </c>
                     <c ca="center">
                        <p>1669043</p>
                     </c>
                     <c ca="left">
                        <p>1575-1576</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0020</p>
                     </c>
                     <c ca="center">
                        <p>1922875</p>
                     </c>
                     <c ca="left">
                        <p>1848-1849</p>
                     </c>
                     <c ca="left">
                        <p>Formate dehydrogenase, alpha subunit</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0021</p>
                     </c>
                     <c ca="center">
                        <p>1924487</p>
                     </c>
                     <c ca="left">
                        <p>1849</p>
                     </c>
                     <c ca="left">
                        <p>Formate dehydrogenase, alpha subunit</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0023</p>
                     </c>
                     <c ca="center">
                        <p>2026072</p>
                     </c>
                     <c ca="left">
                        <p>1949-1950</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0025</p>
                     </c>
                     <c ca="center">
                        <p>2097821</p>
                     </c>
                     <c ca="left">
                        <p>2019-2020</p>
                     </c>
                     <c ca="left">
                        <p>Cytochrome P450</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0029</p>
                     </c>
                     <c ca="center">
                        <p>2234814</p>
                     </c>
                     <c ca="left">
                        <p>2164-2165</p>
                     </c>
                     <c ca="left">
                        <p>Substrate-CoA ligase</p>
                     </c>
                     <c ca="left">
                        <p>Lipid metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0033</p>
                     </c>
                     <c ca="center">
                        <p>2557504</p>
                     </c>
                     <c ca="left">
                        <p>2472-2473</p>
                     </c>
                     <c ca="left">
                        <p>Sugar transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0036</p>
                     </c>
                     <c ca="center">
                        <p>2877071</p>
                     </c>
                     <c ca="left">
                        <p>2816-2817</p>
                     </c>
                     <c ca="left">
                        <p>Two-component system regulator</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0038</p>
                     </c>
                     <c ca="center">
                        <p>3161135</p>
                     </c>
                     <c ca="left">
                        <p>3097-3098</p>
                     </c>
                     <c ca="left">
                        <p>O-methyltransferase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0042</p>
                     </c>
                     <c ca="center">
                        <p>3351460</p>
                     </c>
                     <c ca="left">
                        <p>3281-3282</p>
                     </c>
                     <c ca="left">
                        <p>Sugar ABC transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0043</p>
                     </c>
                     <c ca="center">
                        <p>3410192</p>
                     </c>
                     <c ca="left">
                        <p>3341</p>
                     </c>
                     <c ca="left">
                        <p>Fatty acid desaturase (DesA3)</p>
                     </c>
                     <c ca="left">
                        <p>Lipid metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0044</p>
                     </c>
                     <c ca="center">
                        <p>3442071</p>
                     </c>
                     <c ca="left">
                        <p>3378</p>
                     </c>
                     <c ca="left">
                        <p>Dehydrogenase/reductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0045</p>
                     </c>
                     <c ca="center">
                        <p>3471038</p>
                     </c>
                     <c ca="left">
                        <p>3405-3406</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0046</p>
                     </c>
                     <c ca="center">
                        <p>3506575</p>
                     </c>
                     <c ca="left">
                        <p>3443-3344</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0049</p>
                     </c>
                     <c ca="center">
                        <p>3849109</p>
                     </c>
                     <c ca="left">
                        <p>3785</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0052</p>
                     </c>
                     <c ca="center">
                        <p>3930423</p>
                     </c>
                     <c ca="left">
                        <p>3862-3863</p>
                     </c>
                     <c ca="left">
                        <p>Polyprenol-monophosphomannose synthase (Ppm1)</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0055</p>
                     </c>
                     <c ca="center">
                        <p>4172910</p>
                     </c>
                     <c ca="left">
                        <p>4102-4103</p>
                     </c>
                     <c ca="left">
                        <p>Dehydrogenase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0059</p>
                     </c>
                     <c ca="center">
                        <p>4551995</p>
                     </c>
                     <c ca="left">
                        <p>4464-4465</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0063</p>
                     </c>
                     <c ca="center">
                        <p>5113475</p>
                     </c>
                     <c ca="left">
                        <p>5001</p>
                     </c>
                     <c ca="left">
                        <p>Transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0064</p>
                     </c>
                     <c ca="center">
                        <p>5127828</p>
                     </c>
                     <c ca="left">
                        <p>5017-5018</p>
                     </c>
                     <c ca="left">
                        <p>Multidrug resistance efflux protein (Tap)</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0067</p>
                     </c>
                     <c ca="center">
                        <p>5238606</p>
                     </c>
                     <c ca="left">
                        <p>5122-5123</p>
                     </c>
                     <c ca="left">
                        <p>Nitrate reductase (NarX)</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0070</p>
                     </c>
                     <c ca="center">
                        <p>5596138</p>
                     </c>
                     <c ca="left">
                        <p>5488</p>
                     </c>
                     <c ca="left">
                        <p>Conserved hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0071</p>
                     </c>
                     <c ca="center">
                        <p>5639815</p>
                     </c>
                     <c ca="left">
                        <p>5527-5528</p>
                     </c>
                     <c ca="left">
                        <p>Protein-glutamate methylesterase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0074</p>
                     </c>
                     <c ca="center">
                        <p>6014123</p>
                     </c>
                     <c ca="left">
                        <p>5909-5910</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0075</p>
                     </c>
                     <c ca="center">
                        <p>6071755</p>
                     </c>
                     <c ca="left">
                        <p>5963-5964</p>
                     </c>
                     <c ca="left">
                        <p>Integral membrane protein</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0078</p>
                     </c>
                     <c ca="center">
                        <p>6147983</p>
                     </c>
                     <c ca="left">
                        <p>6046</p>
                     </c>
                     <c ca="left">
                        <p>AraC-family transcriptional regulator</p>
                     </c>
                     <c ca="left">
                        <p>Regulation</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0079</p>
                     </c>
                     <c ca="center">
                        <p>6260084</p>
                     </c>
                     <c ca="left">
                        <p>6152-6153</p>
                     </c>
                     <c ca="left">
                        <p>Anion transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0084</p>
                     </c>
                     <c ca="center">
                        <p>6846273</p>
                     </c>
                     <c ca="left">
                        <p>6761</p>
                     </c>
                     <c ca="left">
                        <p>Oxidoreductase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0085</p>
                     </c>
                     <c ca="center">
                        <p>6862121</p>
                     </c>
                     <c ca="left">
                        <p>6775</p>
                     </c>
                     <c ca="left">
                        <p>Major facilitator transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0086</p>
                     </c>
                     <c ca="center">
                        <p>6955671</p>
                     </c>
                     <c ca="left">
                        <p>6870-6871</p>
                     </c>
                     <c ca="left">
                        <p>Glutamine transporter</p>
                     </c>
                     <c ca="left">
                        <p>Cell wall, process</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0087</p>
                     </c>
                     <c ca="center">
                        <p>6977889</p>
                     </c>
                     <c ca="left">
                        <p>6889-6890</p>
                     </c>
                     <c ca="left">
                        <p>Thioredoxin</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0088</p>
                     </c>
                     <c ca="center">
                        <p>17247</p>
                     </c>
                     <c ca="left">
                        <p>0017-0018</p>
                     </c>
                     <c ca="left">
                        <p>Hypothetical</p>
                     </c>
                     <c ca="left">
                        <p>Unknown</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0094</p>
                     </c>
                     <c ca="center">
                        <p>3456823</p>
                     </c>
                     <c ca="left">
                        <p>*</p>
                     </c>
                     <c ca="left">
                        <p>Dihydrolipoamide dehydrogenase</p>
                     </c>
                     <c ca="left">
                        <p>Intermediary metabolism</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The nucleotide position, the affected ORF (according to the TIGR website), its putative function and its functional classification are indicated for each ICDS. The asterisk indicates an ORF not predicted by TIGR.</p>
               </tblfn>
            </tbl>
            <p>As <it>M. smegmatis </it>mc<sup>2</sup>155 was derived from strain ATCC607, we carried out a comparative analysis of the ICDSs in these two strains. The mc<sup>2</sup>155 strain was generated from ATCC607 by selection for adaptation to genetic manipulation <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. The mc<sup>2</sup>155 strain differs phenotypically from its progenitor (ATCC607) in several ways <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. The frameshifts in mc<sup>2</sup>155 may well have been acquired recently in the laboratory, due either to counter-selection of pathways of little utility or selection for genetic manipulability. We therefore investigated whether the genes containing frameshifts were acquired before or after the divergence of the two strains. The genome of the ATCC607 strain has not been sequenced, but as both strains belong to the same species (<it>M. smegmatis</it>), the sequencing primers originally designed for the mc<sup>2</sup>155 strain could also be used for the ATCC607 strain. We resequenced the 45 genes containing a frameshift of mc<sup>2</sup>155 strain in ATCC607 (Table <tblr tid="T2">2</tblr>). All these genes but one (ICDS0020) also contain a frameshift in the progenitor (ATCC607), suggesting that these mutations were acquired before the divergence of the two strains. Thus, the selection of the mc<sup>2</sup>155 strain and its repeated culture in laboratory conditions had no major impact on frameshift acquisition and pseudogene formation.</p>
            <p>Our analysis shows that the genome sequence of <it>M. smegmatis </it>mc<sup>2</sup>155 contains ICDSs, some of which correspond to authentic mutations acquired during evolution, with others resulting entirely from sequencing errors. Our results show that 18 predicted genes do not actually exist in this species (due to fusion of the two ORFs following the correction of the errors) and that one gene was even not predicted in the former sequence, presumably due to these sequencing errors. In all cases, the new predicted genes are actually more similar than previously thought to orthologs in other species.</p>
         </sec>
         <sec>
            <st>
               <p>ICDSs in <it>M. smegmatis </it>mc<sup>2</sup>155: a proteome analysis</p>
            </st>
            <p>As ICDSs (corresponding to authentic events or to sequencing errors) accounted for 1.4% of the ORF content <it>of M. smegmatis </it>mc<sup>2</sup>155, we surveyed a fraction of the proteome to determine the percentage of proteins originating from ORFs not predicted due to misannotations. We carried out two-dimensional electrophoresis of a soluble protein extract. The major spots (120) were excised, digested and analyzed by nano-LC-MS-MS (nanoflow liquid chromatography coupled to tandem mass spectrometry). We were able to identify about 250 proteins unambiguously by comparing the MS-MS data obtained from the tryptic peptides. We compared these MS-MS data directly with public nucleotide sequences, rather than using the classic comparison of MS-MS data with protein sequences <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp> to prevent the introduction of bias. The identification of several proteins for a single spot is not surprising and has been widely reported in proteomic analysis <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. For four spots the tryptic peptides identified by nano-LC-MS-MS analysis matched two contiguous hypothetical ORFs each (Table <tblr tid="T3">3</tblr>, Figure <figr fid="F2">2</figr>). There are two possible explanations for this finding. Firstly, two different proteins, encoded by two different frames in the same genome region, may be present in the same two-dimensional gel electrophoresis spot. This is unlikely, due to differences in molecular masses (Table <tblr tid="T3">3</tblr>), but cannot be entirely excluded. Secondly, these peptides may be derived from the same protein. In this case, a bypassed stop codon or a sequencing error could account for such an observation.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>ICDSs shown by nano-LC-MS-MS analysis to correspond to sequencing errors in <it>M. smegmatis </it>mc<sup>2</sup>155</p>
               </caption>
               <tblbdy cols="4">
                  <r>
                     <c ca="left">
                        <p>ICDS number</p>
                     </c>
                     <c ca="left">
                        <p>Affected ORF</p>
                     </c>
                     <c ca="left">
                        <p>Calculated mass before correction</p>
                     </c>
                     <c ca="left">
                        <p>Calculated mass after correction</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="4">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0019</p>
                     </c>
                     <c ca="left">
                        <p>1842-1843</p>
                     </c>
                     <c ca="left">
                        <p>45,980-7,370</p>
                     </c>
                     <c ca="left">
                        <p>53,460</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0039</p>
                     </c>
                     <c ca="left">
                        <p>3151</p>
                     </c>
                     <c ca="left">
                        <p>64,570</p>
                     </c>
                     <c ca="left">
                        <p>101,200</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0040</p>
                     </c>
                     <c ca="left">
                        <p>3192-3193</p>
                     </c>
                     <c ca="left">
                        <p>48,730-33,880</p>
                     </c>
                     <c ca="left">
                        <p>83,490</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>0093</p>
                     </c>
                     <c ca="left">
                        <p>1395-1396</p>
                     </c>
                     <c ca="left">
                        <p>21,560-63,800</p>
                     </c>
                     <c ca="left">
                        <p>77,220</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The affected ORFs (according to the TIGR website) and their predicted molecular weights before and after genomic correction are indicated.</p>
               </tblfn>
            </tbl>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Comparison of genomic prediction with proteomic results (example of ICDS0040)</p>
               </caption>
               <text>
                  <p>Comparison of genomic prediction with proteomic results (example of ICDS0040). <b>(a) </b>Representation of the DNA region and its predicted ORFs (in color). <b>(b) </b>Detailed view of the two-dimensional gel. Nano-LC-MS-MS data are obtained after extraction and digestion of the protein. The matching peptides are boxed in the translated genomic sequence (a,c). <b>(c) </b>Representation of the DNA region and its predicted ORF upon correction of the sequencing errors (depicted in the ellipse). Correction of the sequencing errors reassociates the two peptides to give a single protein, accounting for their appearance at a single spot.</p>
               </text>
               <graphic file="gb-2007-8-2-r20-2"/>
            </fig>
            <p>For the four proteins concerned, MS-BLAST showed that all the tryptic peptides identified matched the same protein on the basis of sequence similarity with other organisms. We carried out a new search with the MS-MS data obtained for the four two-dimensional gel electrophoresis spots using the corrected sequences obtained after resequencing of all the ICDSs. For all four spots the peptides were found to match in the same frame and new peptides from the proteins were detected (Table <tblr tid="T3">3</tblr>, Figure <figr fid="F2">2</figr>). We can conclude, therefore, that the four ICDSs detected were due to sequencing errors. These ICDSs are ICDS0019, ICDS0039, ICDS0040 and ICDS0093. We show ICDS0040 as an example in Figure <figr fid="F2">2</figr>.</p>
            <p>Thus, proteome analysis identified errors in sequences that were not predicted to correspond to an ORF. All four cases detected in this way were found to correspond to sequencing errors (Table <tblr tid="T1">1</tblr>). There is, therefore, strong congruence between <it>in silico </it>data and nucleotide and proteomic analyses.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Discussion</p>
         </st>
         <p>Previous <it>in silico </it>analyses have shown that all bacterial species contain ICDSs in their genome <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Here, using <it>M. smegmatis </it>and two experimentally independent approaches, we show that these ICDSs correspond to authentic mutations and to sequencing errors. By contrast, a recent large-scale proteome analysis (more than 900 proteins) of <it>M. smegmatis </it>mc<sup>2</sup>155 provided no evidence of sequencing errors <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. Statistically, 16 sequencing errors should have been detected. Possible explanations for this discrepancy are that, by chance, no protein corresponding to an ICDS was extracted, or that proteins in conflict with genomic data were excluded from the analysis.</p>
         <p>True frameshifts provide positive information, useful for characterization of the variation of amino acid sequences between various orthologs, whereas sequencing errors introduce noise and create artifactual genetic differences between strains and species. These sequencing errors may result from under-representation of the region in the genomic library or structures making sequencing difficult. Although most genomes have been sequenced with eight-fold coverage (each nucleotide being sequenced eight times), the sequences generated remain a statistical estimation and many regions of low coverage (less than three-fold) still exist in genome sequences <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>. No assembly data are available for the <it>M. smegmatis </it>genome project, but the sequencing errors are probably located in such low-coverage regions. In <it>M. smegmatis </it>mc<sup>2</sup>155, 28 of the 73 re-sequenced ICDSs were shown to result from errors. The correction of these errors modified the predicted amino acid sequences of the corresponding proteins. These changes in amino acid sequence increased similarity to orthologs, with consequences for comparative genomics. Unfortunately, it was not possible to associate a particular sequence or stretch of nucleotides with sequence errors. It is, therefore, not possible to predict whether a given ICDS corresponds to an authentic event or to a sequence error. The nature of each ICDS must, therefore, be investigated individually.</p>
         <p>Modern biology approaches based on massive sequence comparisons need accurate sequences for meaningful analyses of genetic differences and similarities. Re-sequencing and the correction of errors in genomic sequences are likely to lead to the identification of new protein sequences. For instance, in <it>M. leprae</it>, which has a large number of ICDSs in its genome (845), even a small proportion of sequencing errors will provide researchers with substantial numbers of new protein sequences, making it possible to identify new functional genes, or to develop new serological tests.</p>
         <p>Other mycobacterial species also contain ICDSs in their genome, some of which have been shown to correspond to authentic mutations acquired during evolution. For instance, the genomes of <it>M. tuberculosis </it>H37Rv, <it>M. tuberculosis </it>CDC1551 and <it>M. bovis </it>contain 96, 123 and 111 ICDSs, respectively, corresponding to about 2% of total gene content in each case <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Interestingly, a number of ICDSs corresponding to authentic events have been fortuitously characterized. In several cases it has been shown that these events inactivate the gene. For instance, ICDS0066 of <it>M. tuberculosis </it>H37Rv, corresponding to a gene encoding a polyketide synthase (<it>pks1</it>), includes a frameshift, generating two distinct ORFs, <it>pks1 </it>and <it>pks15</it>. In contrast, <it>M. bovis </it>and <it>M. leprae </it>carry a <it>pks1 </it>gene with no frameshift. The complementation of <it>M. tuberculosis </it>with the <it>pks1 </it>of <it>M. bovis </it>leads to the synthesis of a new metabolite, phenolphthiocerol <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>. Thus, <it>M. tuberculosis </it>has clearly lost the ability to synthesize phenolphthiocerol due to a frameshift within the <it>pks1 </it>gene. Another example is ICDS0067 in <it>M. bovis</it>, which occurs in a sequence encoding a putative glycosyltransferase. The ortholog of this gene has no frameshift in <it>M. tuberculosis </it>(Rv2958) <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. The complementation of <it>M. bovis </it>BCG with Rv2958 from <it>M. tuberculosis </it>leads to the accumulation of a new product in this strain: diglycosylated phenolglycolipid <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>. Thus, <it>M. bovis </it>has lost the ability to metabolize the diglycosylated phenolglycolipid due to the frameshift within the glycosyltransferase gene.</p>
         <p>These two examples, taken from published work, illustrate that, as expected, a frameshift within ORF may lead to a loss of function. It should be noted that the genes for which function has been lost (such as <it>pks1 </it>or <it>Rv2958</it>) have been split into only two pieces and could, therefore, theoretically revert to the wild-type allele with ease. These genes containing frameshifts are in the process of becoming pseudogenes (pseudogenization) but need to acquire additional mutations before they are fixed, leading to an almost irreversible loss of function.</p>
         <p>The conclusion of this work may be extended to most, if not all, bacterial genomes sequenced to date. These findings have major implications for comparative genomics. Firstly, the resolution of sequencing errors reduces protein variability, facilitating the precise definition of module composition and function. Secondly, as ICDSs corresponding to authentic mutations probably lead to a loss of protein function, the choice of strain or species is of particular importance for investigations of the function of a particular gene. Researchers should carefully consider their investment before creating mutants in these ORFs or producing the corresponding polypeptides. It should be noted that a small number of ORFs containing frameshifts may retain their function or even lead to the acquisition of a new function. It would be interesting to re-frame these ORFs to evaluate the impact on protein function.</p>
         <p>We have shown here that 28 of the 73 ICDSs resulted from sequencing errors. It seems highly likely that all sequenced genomes contain ICDSs resulting from sequencing errors. The current ICDS database contains more than 6,600 ICDSs (in 120 genomes) awaiting characterization. In this study, we detected sequencing errors at a rate of 4 per megabase. The calculated number of ICDSs is obviously an underestimate of the reality as some events such as fusion or fission that maintain the correct frame are not detected by the algorithm used <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         <p>Very few articles have dealt with sequence fidelity. TIGR has reported an error rate for finished genomes of 1 in 88,000 nucleotides <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp> whereas Weinstock <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> estimated that the frequency of error was between 10<sup>-3 </sup>and 10<sup>-5</sup>. The frequency of errors clearly depends on the chemical system used and the research centers carrying out the sequencing work <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. The development of error prediction programs has greatly helped to reduce the error rate <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B4">4</abbr></abbrgrp>. However, as shown in this study, sequencing errors are clearly a persistent problem in genomic databases. The major problem is that the bioinformaticians who assemble genomes have, for years, discarded precious information about how all the individual sequence fragments align on the assembled chromosome. The only way to test the nature of the ICDSs is to re-sequence the fragment. The NCBI has recently developed the 'Assembly Archive', which stores records of both the way in which a particular assembly was constructed and alignments of any set of traces to a reference genome <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. This resource makes it possible to determine whether an ICDS corresponds to a region of low coverage and to evaluate the quality of the raw data. It would clearly be easier to resolve the ICDSs in various genomes if all the sequencing centers made complete assembly data available.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Bacterial strains</p>
            </st>
            <p><it>M. smegmatis </it>mc<sup>2</sup>155 (ATCC700084) and <it>M. smegmatis </it>NRRL B-692 (Trevisan) Lehman and Neumann (ATCC607) were purchased from the American Type Culture Collection (Manassas, Virginia, USA).</p>
         </sec>
         <sec>
            <st>
               <p>ICDS detection in <it>M. smegmatis </it>mc<sup>2</sup>155</p>
            </st>
            <p>The genome sequence of <it>M. smegmatis </it>mc<sup>2</sup>155 was taken from the TIGR website <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. The ICDSs were detected using the method developed by Perrodou <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Primer design and sequence analysis</p>
            </st>
            <p>The primers used to sequence frameshifts were designed as previously described <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> using an optimized version of the CADO4MI program (Computed Assisted Design of Oligonucleotides for Microarray). It is a freeware (GNU General Public License) accessible online <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>. For each genome, sequencing primers are available online <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>. The chromosomal DNA of the mc<sup>2</sup>155 and ATCC607 strains of <it>M. smegmatis </it>used for PCR amplification was purified as previously described <abbrgrp><abbr bid="B28">28</abbr></abbrgrp>. Pairs of primers were used for amplification with Pfu Turbo DNA polymerase (Stratagene, La Jolla, CA, USA). PCR samples were run on a 0.8% agarose gel and the fragments were excised from the gel and purified using the QIAquick Gel purification kit (Qiagen Chatsworth, CA, USA). The PCR fragments had a mean length of 300 base-pairs. Purified PCR fragments were used as templates in sequencing reactions with each primer used for PCR amplification. The nucleotide and inferred aminoacid sequences were analyzed with DNA Strider <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>. Three independent amplicons were sequenced for each ICDS.</p>
         </sec>
         <sec>
            <st>
               <p>Protein extraction and two-dimensional gel electrophoresis</p>
            </st>
            <p><it>M</it>. <it>smegmatis </it>strain mc<sup>2</sup>155 (1 liter) was grown in M9 minimal medium (Difco, Detroit, USA) for 5 days and then centrifuged. Bacterial pellets were used for two-dimensional electrophoresis. Unless otherwise specified, all chemicals were obtained from Sigma (St Louis, MO, USA). Dithiothreitol (DTT) and iodoacetamide were obtained from Fluka (Buchs, Switzerland). The pellet fraction was incubated with extraction buffer (50 mM Tris, pH7.5, 1 mM phenylmethylsulfonyl fluoride, 1 mM EDTA, 1 mM DTT, protease inhibitor mixture (complete from Roche, Basel, Switzerland)) for 45 minutes at 4&#176;C. The mixture was sonicated for a few seconds and its protein concentration determined by Bradford assay. The solvent of the protein extract was evaporated off and the protein residue was suspended in rehydration buffer (8 M urea, 2 M thiourea, 4% 3- [(3-cholamidopropyl)dimethylammonio]-1-propanesulfonic acid, 0.5% Triton X-100, 1% DTT, 20 mM spermine, 2% Pharmalyte (Amersham Pharmacia Biotech, Piscataway, NJ, USA)). The sample was incubated for 30 minutes at 20&#176;C and centrifuged at 15,000 rpm at 20&#176;C.</p>
            <p>Protein extract was run on a strip of gel of pH range 3 to 10 (Bio-Rad Laboratories, Hercules, CA, USA) for 15 h at 20&#176;C under 50 V in a PROTEAN isoelectric focusing cell (Bio-Rad). Isoelectric focusing was carried out with several voltage steps: 1 h at 200 V, then 4 h at 1,000 V followed by 16 h at 5,000 V and finally 7 h at 500 V at 20&#176;C. The strips were incubated for 30 minutes at 20&#176;C in electrophoresis buffer (50 mM Tris-HCl, pH 8.8, 6 M urea, 30% (v/v) glycerol, 2% (w/v) SDS, and 1% DTT), followed by 30 minutes in the same buffer supplemented with 2.5% iodoacetamide. Electrophoresis in a gradient gel (5% to 20% acrylamide) on a PROTEAN II (Bio-Rad) apparatus at 5 mA for 1 h and 10 mA overnight was used as the second dimension. The gel was stained with Colloidal blue (G260, Sigma); 120 spots were selected by visual inspection and gel slices were excised with a Proteineer SP automated spot picker (Bruker Daltonics, Bremen, Germany) according to the manufacturer's instructions.</p>
         </sec>
         <sec>
            <st>
               <p>Mass spectrometry</p>
            </st>
            <p>The two-dimensional gel spots were excised, washed, destained, reduced, alkylated and dehydrated for in-gel digestion of the proteins with an automated protein digestion system, MassPREP Station (Waters, Milford, MA, USA). The proteins were digested overnight at room temperature with trypsin. They were then extracted with 60% (v/v) acetonitrile in 5% (v/v) formic acid and then with 100% acetonitrile. The resulting peptide extracts were analyzed directly by nano-LC-MS-MS on an Agilent 1100 Series capillary LC system (Agilent Technologies, Palo Alto, USA) coupled to an HCT Ultra ion trap (Bruker Daltonics). This instrument was equipped with a nanospray ion source and chromatographic separation was carried out on reverse phase (RP) capillary columns (C18, 75 &#956;m id, 15 cm length, Agilent Technologies) with a flow rate of 200 nl/minute. The voltage applied to the capillary cap was optimized to -2,000 V. MS-MS scanning mode was performed in the Ultra Scan resolution mode at a scan rate of 26,000 m/z per second. Eight scans were averaged to obtain an MS-MS mass spectrum. The complete system was fully controlled by Agilent ChemStation and EsuireControl (Bruker Daltonics) software. The generated peak-lists of fragments were used for public <it>M. smegmatis </it>genome database searches.</p>
         </sec>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>Data were obtained from TIGR from their website <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>. We thank INSERM for funding this project through an Avenir program grant to JMR, Charg&#233; de Recherches at INSERM. This work was also funded by a 'Prot&#233;omique et Genie des Prot&#233;ines' grant (project no. PGP 04-013), the RNG (R&#233;seau National de G&#233;nopoles) Strasbourg Bioinformatics Platform infrastructures and EVI-GENORET (LSHG-CT-2005-512036). CD is funded by a doctoral grant from INSERM - R&#233;gion Ile de France. We thank E Stewart for critical reading and correcting the English of this manuscript.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Genomes OnLine Database (GOLD): a monitor of genome projects world-wide.</p>
            </title>
            <aug>
               <au>
                  <snm>Bernal</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ear</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Kyrpides</snm>
                  <fnm>N</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>126</fpage>
            <lpage>127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">29859</pubid>
                  <pubid idtype="pmpid" link="fulltext">11125068</pubid>
                  <pubid idtype="doi">10.1093/nar/29.1.126</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Assignment of position-specific error probability to primary DNA sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Lawrence</snm>
                  <fnm>CB</fnm>
               </au>
               <au>
                  <snm>Solovyev</snm>
                  <fnm>VV</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1994</pubdate>
            <volume>22</volume>
            <fpage>1272</fpage>
            <lpage>1280</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">523653</pubid>
                  <pubid idtype="pmpid" link="fulltext">8165143</pubid>
                  <pubid idtype="doi">10.1093/nar/22.7.1272</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Base-calling of automated sequencer traces using phred. II. Error probabilities.</p>
            </title>
            <aug>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>186</fpage>
            <lpage>194</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9521922</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Base-calling of automated sequencer traces using phred. I. Accuracy assessment.</p>
            </title>
            <aug>
               <au>
                  <snm>Ewing</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Hillier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Wendl</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Green</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>175</fpage>
            <lpage>185</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">9521921</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>ICDS database: interrupted CoDing sequences in prokaryotic genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Perrodou</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Deshayes</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Muller</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Schaeffer</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Van Dorsselaer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Ripp</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Poch</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Reyrat</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Lecompte</snm>
                  <fnm>O</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>D338</fpage>
            <lpage>343</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1347423</pubid>
                  <pubid idtype="pmpid" link="fulltext">16381882</pubid>
                  <pubid idtype="doi">10.1093/nar/gkj060</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Frame: detection of genomic sequencing errors.</p>
            </title>
            <aug>
               <au>
                  <snm>Brown</snm>
                  <fnm>NP</fnm>
               </au>
               <au>
                  <snm>Sander</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>1998</pubdate>
            <volume>14</volume>
            <fpage>367</fpage>
            <lpage>371</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/14.4.367</pubid>
                  <pubid idtype="pmpid" link="fulltext">9632832</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Detecting and analyzing DNA sequencing errors: toward a higher quality of the <it>Bacillus subtilis </it>genome sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Medigue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Viari</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1999</pubdate>
            <volume>9</volume>
            <fpage>1116</fpage>
            <lpage>1127</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310837</pubid>
                  <pubid idtype="pmpid" link="fulltext">10568751</pubid>
                  <pubid idtype="doi">10.1101/gr.9.11.1116</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Comprehensive analysis of pseudogenes in prokaryotes: widespread gene decay and failure of putative horizontally transferred genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Liu</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Harrison</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Kunin</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Gerstein</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>R64</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">522871</pubid>
                  <pubid idtype="pmpid" link="fulltext">15345048</pubid>
                  <pubid idtype="doi">10.1186/gb-2004-5-9-r64</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Lewis antigens in <it>Helicobacter pylori </it>: biosynthesis and phase variation.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Ge</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Rasko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>DE</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>2000</pubdate>
            <volume>36</volume>
            <fpage>1187</fpage>
            <lpage>1196</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1046/j.1365-2958.2000.01934.x</pubid>
                  <pubid idtype="pmpid" link="fulltext">10931272</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Translational bypassing: a new reading alternative of the genetic code.</p>
            </title>
            <aug>
               <au>
                  <snm>Groisman</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Engelberg-Kulka</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Biochem Cell Biol</source>
            <pubdate>1995</pubdate>
            <volume>73</volume>
            <fpage>1055</fpage>
            <lpage>1059</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8722021</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Sequences that direct significant levels of frameshifting are frequent in coding regions of <it>Escherichia coli</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Gurvich</snm>
                  <fnm>OL</fnm>
               </au>
               <au>
                  <snm>Baranov</snm>
                  <fnm>PV</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hammer</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Gesteland</snm>
                  <fnm>RF</fnm>
               </au>
               <au>
                  <snm>Atkins</snm>
                  <fnm>JF</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>2003</pubdate>
            <volume>22</volume>
            <fpage>5941</fpage>
            <lpage>5950</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">275418</pubid>
                  <pubid idtype="pmpid" link="fulltext">14592990</pubid>
                  <pubid idtype="doi">10.1093/emboj/cdg561</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p><it>Mycobacterium smegmatis mc</it><sup>2 </sup><it>155 </it>Genome Page</p>
            </title>
            <url>http://cmr.tigr.org/tigr-scripts/CMR/GenomePage.cgi?database=gms</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Isolation and characterization of efficient plasmid transformation mutants of <it>Mycobacterium smegmatis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Snapper</snm>
                  <fnm>SB</fnm>
               </au>
               <au>
                  <snm>Melton</snm>
                  <fnm>RE</fnm>
               </au>
               <au>
                  <snm>Mustafa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kieser</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Jacobs</snm>
                  <fnm>WR</fnm>
                  <suf>Jr</suf>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1990</pubdate>
            <volume>4</volume>
            <fpage>1911</fpage>
            <lpage>1919</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.1990.tb02040.x</pubid>
                  <pubid idtype="pmpid">2082148</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>The impact of the absence of glycopeptidolipids on the ultrastructure, cell surface and cell wall properties, and phagocytosis of <it>Mycobacterium smegmatis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Etienne</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Villeneuve</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Billman-Jacobe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Astarie-Dequeker</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dupont</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Daffe</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Microbiology</source>
            <pubdate>2002</pubdate>
            <volume>148</volume>
            <fpage>3089</fpage>
            <lpage>3100</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12368442</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Revised draft guidelines for proteomic data publication.</p>
            </title>
            <aug>
               <au>
                  <snm>Bradshaw</snm>
                  <fnm>RA</fnm>
               </au>
            </aug>
            <source>Mol Cell Proteomics</source>
            <pubdate>2005</pubdate>
            <volume>4</volume>
            <fpage>1223</fpage>
            <lpage>1225</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">16160100</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>The ABC's (and XYZ's) of peptide sequencing.</p>
            </title>
            <aug>
               <au>
                  <snm>Steen</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mann</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nat Rev Mol Cell Biol</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>699</fpage>
            <lpage>711</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrm1468</pubid>
                  <pubid idtype="pmpid" link="fulltext">15340378</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Spot overlapping in two-dimensional maps: a serious problem ignored for much too long.</p>
            </title>
            <aug>
               <au>
                  <snm>Campostrini</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Areces</snm>
                  <fnm>LB</fnm>
               </au>
               <au>
                  <snm>Rappsilber</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Pietrogrande</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Dondi</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Pastorino</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ponzoni</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Righetti</snm>
                  <fnm>PG</fnm>
               </au>
            </aug>
            <source>Proteomics</source>
            <pubdate>2005</pubdate>
            <volume>5</volume>
            <fpage>2385</fpage>
            <lpage>2395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/pmic.200401253</pubid>
                  <pubid idtype="pmpid" link="fulltext">15880804</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Mass spectrometry of the <it>M. smegmatis </it>proteome: protein expression levels correlate with function, operons, and codon bias.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Prince</snm>
                  <fnm>JT</fnm>
               </au>
               <au>
                  <snm>Marcotte</snm>
                  <fnm>EM</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>1118</fpage>
            <lpage>1126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1182224</pubid>
                  <pubid idtype="pmpid" link="fulltext">16077011</pubid>
                  <pubid idtype="doi">10.1101/gr.3994105</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Genomics and bacterial pathogenesis.</p>
            </title>
            <aug>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Emerg Infect Dis</source>
            <pubdate>2000</pubdate>
            <volume>6</volume>
            <fpage>496</fpage>
            <lpage>504</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10998381</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Role of the <it>pks15/1 </it>gene in the biosynthesis of phenolglycolipids in the <it>Mycobacterium tuberculosis </it>complex. Evidence that all strains synthesize glycosylated p-hydroxybenzoic methly esters and that strains devoid of phenolglycolipids harbor a frameshift mutation in the <it>pks15/1 </it>gene.</p>
            </title>
            <aug>
               <au>
                  <snm>Constant</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Perez</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Malaga</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Laneelle</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Saurel</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Daffe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guilhot</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2002</pubdate>
            <volume>277</volume>
            <fpage>38148</fpage>
            <lpage>38158</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M206538200</pubid>
                  <pubid idtype="pmpid" link="fulltext">12138124</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Characterization of three glycosyltransferases involved in the biosynthesis of the phenolic glycolipid antigens from the <it>Mycobacterium tuberculosis </it>complex.</p>
            </title>
            <aug>
               <au>
                  <snm>Perez</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Constant</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Lemassu</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Laval</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Daffe</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Guilhot</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2004</pubdate>
            <volume>279</volume>
            <fpage>42574</fpage>
            <lpage>42583</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M406246200</pubid>
                  <pubid idtype="pmpid" link="fulltext">15292272</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Comparative genome sequencing for discovery of novel polymorphisms in <it>Bacillus anthracis</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Pop</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shumway</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Umayam</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Holtzapple</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Busch</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Schupp</snm>
                  <fnm>JM</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2002</pubdate>
            <volume>296</volume>
            <fpage>2028</fpage>
            <lpage>2033</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1071837</pubid>
                  <pubid idtype="pmpid" link="fulltext">12004073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Single nucleotide polymorphisms in <it>Mycobacterium tuberculosis </it>structural genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Emerg Infect Dis</source>
            <pubdate>2001</pubdate>
            <volume>7</volume>
            <fpage>487</fpage>
            <lpage>488</lpage>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Estimation of errors in "raw" DNA sequences: a validation study.</p>
            </title>
            <aug>
               <au>
                  <snm>Richterich</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>251</fpage>
            <lpage>259</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310698</pubid>
                  <pubid idtype="pmpid" link="fulltext">9521928</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>The genome Assembly Archive: a new public resource.</p>
            </title>
            <aug>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>DiCuccio</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yaschenko</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Ostell</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2004</pubdate>
            <volume>2</volume>
            <fpage>E285</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">516794</pubid>
                  <pubid idtype="pmpid" link="fulltext">15367931</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0020285</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Computed Assisted Design of Oligonucleotides forMicroarray</p>
            </title>
            <url>http://bips.u-strasbg.fr/CADO4MI/</url>
         </bibl>
         <bibl id="B27">
            <title>
               <p>ICDS Database</p>
            </title>
            <url>http://alnitak.u-strasbg.fr/ICDS/</url>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Generation of unmarked directed mutations in mycobacteria, using sucrose counter-selectable suicide vectors.</p>
            </title>
            <aug>
               <au>
                  <snm>Pelicic</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Reyrat</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Gicquel</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Mol Microbiol</source>
            <pubdate>1996</pubdate>
            <volume>20</volume>
            <fpage>919</fpage>
            <lpage>925</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1111/j.1365-2958.1996.tb02533.x</pubid>
                  <pubid idtype="pmpid">8809745</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.</p>
            </title>
            <aug>
               <au>
                  <snm>Marck</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1988</pubdate>
            <volume>16</volume>
            <fpage>1829</fpage>
            <lpage>1836</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">338177</pubid>
                  <pubid idtype="pmpid" link="fulltext">2832831</pubid>
                  <pubid idtype="doi">10.1093/nar/16.5.1829</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>The Institute for Genomic Research</p>
            </title>
            <url>http://www.tigr.org</url>
         </bibl>
      </refgrp>
   </bm>
</art>
