<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-5-23</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>Gene prediction using the Self-Organizing Map: automatic generation of multiple gene models</p>
         </title>
         <aug>
            <au id="A1" ca="yes">
               <snm>Mahony</snm>
               <fnm>Shaun</fnm>
               <insr iid="I1"/>
               <email>shaun.mahony@nuigalway.ie</email>
            </au>
            <au id="A2">
               <snm>McInerney</snm>
               <mi>O</mi>
               <fnm>James</fnm>
               <insr iid="I2"/>
               <email>james.o.mcinerney@may.ie</email>
            </au>
            <au id="A3">
               <snm>Smith</snm>
               <mi>J</mi>
               <fnm>Terry</fnm>
               <insr iid="I1"/>
               <email>terry.smith@nuigalway.ie</email>
            </au>
            <au id="A4">
               <snm>Golden</snm>
               <fnm>Aaron</fnm>
               <insr iid="I3"/>
               <email>aaron.golden@nuigalway.ie</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>National Centre for Biomedical Engineering Science, NUI, Galway, Galway, Ireland</p>
            </ins>
            <ins id="I2">
               <p>Bioinformatics and Pharmacogenomics Laboratory, NUI, Maynooth, Co. Kildare, Ireland</p>
            </ins>
            <ins id="I3">
               <p>Department of Information Technology, NUI, Galway, Galway, Ireland</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2004</pubdate>
         <volume>5</volume>
         <issue>1</issue>
         <fpage>23</fpage>
         <url>http://www.biomedcentral.com/1471-2105/5/23</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="doi">10.1186/1471-2105-5-23</pubid>
               <pubid idtype="pmpid">15070404</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>26</day>
               <month>11</month>
               <year>2003</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>05</day>
               <month>3</month>
               <year>2004</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>3</month>
               <year>2004</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>Mahony et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>Many current gene prediction methods use only one model to represent protein-coding regions in a genome, and so are less likely to predict the location of genes that have an atypical sequence composition. It is likely that future improvements in gene finding will involve the development of methods that can adequately deal with intra-genomic compositional variation.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>This work explores a new approach to gene-prediction, based on the Self-Organizing Map, which has the ability to automatically identify multiple gene models within a genome. The current implementation, named RescueNet, uses relative synonymous codon usage as the indicator of protein-coding potential.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>While its raw accuracy rate can be less than other methods, RescueNet consistently identifies some genes that other methods do not, and should therefore be of interest to gene-prediction software developers and genome annotation teams alike. RescueNet is recommended for use in conjunction with, or as a complement to, other gene prediction methods.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="bmc" subtype="user_supplied_xml" id="endnote"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Computational gene prediction methods have yet to achieve perfect accuracy, even in the relatively simple prokaryotic genomes. Problems in gene prediction centre on the fact that many protein families remain uncharacterised. As a result, it seems that only approximately half of an organism's genes can be confidently predicted on the basis of homology to other known genes <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>, so <it>ab initio </it>prediction methods are usually employed to identify many protein-coding regions of DNA.</p>
         <p>Currently, the most popular prokaryotic gene-prediction methods, such as GeneMark.hmm <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> and Glimmer2 <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, are based on probabilistic Markov models that aim to predict each base of a DNA sequence using a number of preceding bases in the sequence. These methods are undoubtedly very successful, with published sensitivity rates between 90% and 99% for most prokaryotic genomes. However, as the sensitivity rates of the methods rise, specificity generally tends to fall, and while the application of sophisticated post-processing rules can correct many false-positive predictions, no method has yet achieved 100% accuracy. This is especially the case in the more complex eukaryotic gene-finding problem, where less than 80% of exons in anonymous genomic sequences are correctly predicted by current methods <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>.</p>
         <p>For the foreseeable future it does not seem that the exact set of genes in any organism can be automatically predicted by any single computational method. In practice, this has meant that the best predictions are to be found by combining evidence from two or more independent methods <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr></abbrgrp>. Genome annotation teams often compare the evidence offered by multiple gene-finders in order to predict the gene complement of a given genome. Because of the degree of 'manual' annotation that now takes place in the major genome sequencing centres, a gene-prediction tool will be of practical use if it can exclusively predict genes that other gene-finders cannot.</p>
         <p>Many <it>ab initio </it>gene-prediction methods are based on single models of protein-coding regions and therefore make the implicit assumption that all protein-coding regions within a particular genome will share similar statistical properties. However, evidence has mounted that single gene models of intrinsic coding measures are no longer fully satisfying <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr></abbrgrp>. The problem with single model methods centres on the degree of oligonucleotide composition variation that exists within most genomes. On the codon level, intra-genomic variation in codon bias has long been correlated with expression level <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. Counterbalancing the translational selection theory of codon bias is the effect of mutational bias <abbrgrp><abbr bid="B13">13</abbr><abbr bid="B14">14</abbr></abbrgrp>. Many other, often more subtle levels of variation have been recognised over the years, with many disparate evolutionary pressures shown to be acting on codon usage bias <abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. For example, strand-specific codon usage biases have often been recognised <abbrgrp><abbr bid="B16">16</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr></abbrgrp>, leading to more general studies of correlation between the location of the gene on the genome and codon bias <abbrgrp><abbr bid="B21">21</abbr></abbrgrp>, and the more specific discovery of a A+T skewed bias near the replication terminus of bacterial genomes <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. Other effects shown to shape codon usage are gene length <abbrgrp><abbr bid="B23">23</abbr></abbrgrp> and selection at the amino acid level <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. It has also been suggested that content variation can occur at the exon level in eukaryotic genes, the possibility existing that some exons in a gene may have different codon usage patterns to others <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. Given that some of the above pressures on codon usage have only recently been discovered, it is likely that some more subtle patterns have yet to be recognised, and therefore it is difficult to predict the level of compositional variation that will be present in an anonymous genomic sequence.</p>
         <p>The need for gene-finding methods that can overcome the problems presented by intra-genomic variation was recognised and addressed in the case of prokaryotic genomes by GeneMark-Genesis <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, which derives two models for each genome according to typical and atypical codon usage clusters in that genome. This increase in the number of gene models led to an increase in accuracy of the GeneMark method. While Hayes &amp; Borodovsky experimented with a third ('highly-typical') codon usage cluster and an associated model in some cases, they did not see the need to further sub-cluster the atypical codon usage set in order to make even more models. Overly sub-clustering the training data would not be useful in the case of Markov-based methods, as the data contained in each sub-cluster may not allow for a good estimation of model parameters. However, generating more specific models for subtle patterns found in the training set can only be advantageous if it can be done in a way that minimises loss of overall accuracy and produces no extra false-positive predictions.</p>
         <p>This paper aims to show how the Self-Organizing Map neural network algorithm can be used to automatically identify the major trends in oligonucleotide variation in a genome, and in doing so provide multiple gene models for use in gene prediction. It will be explained that this approach is an effective solution to the problem of intra-genomic variation. Specific examples of genes predicted only by this method are offered, thus demonstrating the usefulness of the approach in genome annotation.</p>
         <p>A further advantage of using the Self-Organizing Map for gene prediction is the ability of the algorithm to use complex descriptors as measures of gene coding potential. We demonstrate this ability using relative synonymous codon usage (RSCU) as our measure of gene coding potential. Unlike other gene coding measures, RSCU is not based on the absolute frequency of k-mers, but instead describes the codon choice for each amino acid. Markov chains based on the RSCU measure would have transition probabilities that are conditioned on the underlying amino-acids. Although theoretically possible <abbrgrp><abbr bid="B26">26</abbr></abbrgrp>, the practical computation of such Markov chains would give rise to major difficulties. Therefore, the ability of our approach to make use of a sophisticated gene coding descriptor such as RSCU is a distinct advantage of our approach over Markov model based methods.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <sec>
            <st>
               <p>Coding measure</p>
            </st>
            <p>In this study, relative synonymous codon usage (RSCU) vectors are used as the measure of protein-coding potential for a given window of sequence. The RSCU value for a codon 'i' is defined as:</p>
            <p>
               <graphic file="1471-2105-5-23-i1.gif"/>
            </p>
            <p>where Obs<sub>i </sub>is the observed number of occurrences of codon 'i', and Exp<sub>i </sub>is the expected number of occurrences of the same codon (based on the number of times the relevant amino acid is present in the gene and the number of synonymous alternatives to 'i', assuming a uniform choice of synonymous codons). In order to make the data more compatible with the mathematical methods used, the log of each RSCU<sub>i </sub>value is found so that the resulting value is positive if the codon is used more than expected in that gene, and negative if the codon is used less than expected. Values were capped at &#177; 10, and set to 0 in the case of the non-occurrence of an amino acid in the sample. Taking the RSCU values for each of the codons with synonymous alternatives (and ignoring the 3 stop codons and the Trp and Met codons), each sample can be represented by a vector of 59 values.</p>
         </sec>
         <sec>
            <st>
               <p>Self-Organizing Map</p>
            </st>
            <p>The Self-Organizing Map (SOM) is based around the concept of a lattice of interconnected nodes, each of which contains a model. The models begin as random values, but during the iterative training process they are modified to represent different subsets of the training set. In this work for example, the training set and the lattice node models are 59-dimensional RSCU vectors, and the models change during training to become similar to common or repeated patterns in the training set. The algorithm is fully described elsewhere <abbrgrp><abbr bid="B27">27</abbr></abbrgrp>, but we briefly summarize for our context:</p>
            <p>(1) A vector (<it>X</it><sub><it>i</it></sub>), corresponding to a gene's RSCU values, is loaded from the training dataset.</p>
            <p>(2) The lattice node is found whose model vector most closely resembles the input pattern. This node is denoted the 'winning node'.</p>
            <p>(3) The winning node's model, <it>W </it>(as well as a certain number of 'neighbourhood bubble' node models) is changed to be more similar to the input vector by the equation:</p>
            <p><it>W</it><sub><it>new </it></sub>= <it>W</it><sub><it>old </it></sub>+ &#951; (<it>X</it><sub><it>i </it></sub>- <it>W</it><sub><it>old</it></sub>) &#160;&#160;&#160; (2)</p>
            <p>(4) If all the vectors in the training dataset are processed, we say that an epoch has been completed. In this study, all SOMs are trained for 3000 epochs.</p>
            <p>The 'neighbourhood bubble' mentioned in step 3 is a group of nodes centered at the winning node. The radius of this bubble is initialised to be large and is linearly decreased during training until only the winning node's model is changed. Changing the models on the winning node's neighbours allows the clustering of similar patterns. The learning rate (&#951;) in step 3 is initialised close to 1 and is also linearly decreased during training until it is held constant at a predefined fraction. The linear decrease in learning rate means that each node's model will not get changed as much or as often as training progresses. Two recognised phases of training result; an ordering phase where the lattice takes its general shape, and a convergence phase where the nodes get more specialised to respond to specific patterns.</p>
            <p>In this work, similarity between two vectors is measured by finding the cosine of the angle between them. A cosine of 1 represents exactly similar vectors while a cosine of 0 represents exactly dissimilar vectors.</p>
            <p>The SOM is used mainly in data visualisation, as it can be effectively used to reduce high-dimensional data to a two dimensional map. One of the main strengths of the method is the ability to automatically cluster similar patterns in its training set. In the context of codon usage data, the SOM has been previously used to cluster genes on the basis of similar codon usage <abbrgrp><abbr bid="B28">28</abbr><abbr bid="B29">29</abbr><abbr bid="B30">30</abbr></abbrgrp>. However, the previous studies have concentrated on identifying genes with atypical codon usage and hypothesising their origin as horizontally transferred genes. It has since been shown that atypical codon usage is not sufficient evidence to show that a gene has origins in horizontal transfer events <abbrgrp><abbr bid="B22">22</abbr><abbr bid="B31">31</abbr></abbrgrp>. In contrast, this study uses the fact that once a SOM has been trained using codon usage information, the nodes of the SOM encapsulate models that are representative of the major codon usage patterns within the training set.</p>
            <p>If a new sequence is inputted to a trained SOM, we can easily be told which node's model is most similar to this new sequence, and most importantly, how similar. The similarity (cosine) score is then converted to the probability that the sequence is protein-coding. This is achieved by finding the mean cosine score received by a set of random length, random sequence genes that are generated using the same nucleotide bias as the mutational bias found in the genome. Using the mean score, each similarity score can be converted to a z-score, which is in effect the probability that the sequence is not a random sequence.</p>
         </sec>
         <sec>
            <st>
               <p>Using the SOM to find genes</p>
            </st>
            <p>Separate SOMs are trained for each of the 15 genomes under test. The SOMs are each 15&#215;15 nodes in size and trained for 3000 cycles. Finding genes via homology search is usually the first step to be carried out in a genome annotation process, so our training sets consist of all genes in the relevant organism that were previously confirmed by homology searches and are also at least 750 bp long. Note that unlike other gene-finding methods, no statistical knowledge of non-coding DNA is necessary as part of the SOM's training.</p>
            <p>In analysing an entire genome sequence, a sliding window is used to split each of the six reading frames into small samples. The default window size is 110 triplets, which has been chosen as a balance between having a window size long enough to evaluate a meaningful RSCU vector, and short enough to predict short genes. Each window is offset from the next by 10 triplets. An arbitrary probability score of 0.1 is used as the threshold for deciding if a sequence was protein-coding, and all samples that scored higher than 0.1 are recorded as predictions. If a stop codon lies in the sample, the gene prediction is annotated as having ended at that point. Note, however, that no effort is made to find stop codons if they are not within the prediction, and no effort whatsoever is made to find any start codons in the prediction.</p>
         </sec>
         <sec>
            <st>
               <p>Post-processing the predictions</p>
            </st>
            <p>Once all the samples are processed, some simple post-processing is carried out. Naturally, all same-frame concurrent predictions are merged. Predictions that are totally overlapped by another prediction are deleted if they are less than 75% the length of the other. Similarly, any prediction in which more than half its length is overlapped is deleted if it is less than half the length of the other prediction. Alternatively, any prediction that is less than 90% as long as the overlapping prediction and receiving a lower score is deleted. Finally, any prediction that is overlapped on both ends to a total overlap of at least 70% is also deleted. A prediction size of 75 codons was found by trial and error to be the smallest gene-coding region that could accurately be found using RescueNet.</p>
            <p>While the above rules aim to delete smaller erroneous predictions, it is recognised that the loose nature of the rules leave room for many other overlapping predictions. However, it was found that in many overlapping cases it was difficult to decide which prediction to delete. Therefore, the best solution is to leave both predictions rather than misleading an annotator by giving only one, possibly erroneous, prediction.</p>
            <p>In assessing the accuracy of our method, we had to take into account that our method will not predict most start sites, and some stop sites, exactly. We assume that our method will be of most use to annotation teams who rigorously inspect the results of our method in conjunction with the results of other gene prediction programs. Such annotation teams base their final genome annotation on widespread evidence, so the fact that our method may produce inexact start and stop sites will not be a major disadvantage. Therefore, a correct prediction is defined here as one that predicts more than 50% of an annotated gene in the correct frame. This criterion means that only predictions that are useful to annotators are considered to be correct.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Evaluating accuracy rates</p>
            </st>
            <p>Previous studies discuss the possibility that the GenBank annotation of various genomes may be incomplete or incorrect in some cases <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B32">32</abbr></abbrgrp>. Since many GenBank annotations are not experimentally corroborated, this possibility remains strong. Large-scale benchmarking of gene-prediction algorithms is therefore difficult, because few 'gold standard' annotations exist for prokaryotic genomes. Also, in most cases hypothetical gene annotations in the public databases have their roots in the predictions of an <it>ab initio </it>method, thus biasing any comparison of accuracy in favour of the particular method used in the annotation of that genome. However, for the purpose of defining accuracy in this study we must assume that all GenBank annotations are correct and complete. Sensitivity (Sn) is defined here as the percentage of GenBank gene records that are predicted correctly by our method. Specificity (Sp) is defined as the percentage of total RescueNet gene predictions that are correct.</p>
            <p>Table <tblr tid="T1">1</tblr> shows the results of RescueNet's predictions in 15 genomes. All results were generated using the default settings described above. Sensitivity and specificity values for each genome are shown, along with sensitivity values for those genes that are above the prediction length threshold of 75 codons (225 bp) and sensitivity values for those genes that have database matches.</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Accuracy of RescueNet in 15 bacterial genomes.</p>
               </caption>
               <tblbdy cols="8">
                  <r>
                     <c ca="left">
                        <p>
                           <b>Organism</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>GC %</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Number of Genes Annotated</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Training Set Size</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sn. (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sn. >225 bp (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sn. Conserved (%)</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>Sp. (%)</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c cspan="8">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>Buchnera</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>26.2</p>
                     </c>
                     <c ca="center">
                        <p>564</p>
                     </c>
                     <c ca="center">
                        <p>292</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.65</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.24</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.97</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.18</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>B. burgdorferi</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>28.6</p>
                     </c>
                     <c ca="center">
                        <p>857</p>
                     </c>
                     <c ca="center">
                        <p>403</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90.54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.39</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.66</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.02</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>C. jejuni</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>30.6</p>
                     </c>
                     <c ca="center">
                        <p>1654</p>
                     </c>
                     <c ca="center">
                        <p>673</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90.14</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.08</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>92.14</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>99.23</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>M. jannaschii</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>31.4</p>
                     </c>
                     <c ca="center">
                        <p>1715</p>
                     </c>
                     <c ca="center">
                        <p>692</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.39</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.82</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.02</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.50</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>M. genitalium</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>31.7</p>
                     </c>
                     <c ca="center">
                        <p>483</p>
                     </c>
                     <c ca="center">
                        <p>301</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.44</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.52</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.89</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>92.32</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>H. influenzae</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>38.0</p>
                     </c>
                     <c ca="center">
                        <p>1754</p>
                     </c>
                     <c ca="center">
                        <p>885</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.56</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.34</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.10</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>98.01</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>H. pylori</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>38.9</p>
                     </c>
                     <c ca="center">
                        <p>1593</p>
                     </c>
                     <c ca="center">
                        <p>712</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.39</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.80</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.70</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.49</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>A. aeolicus</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>43.3</p>
                     </c>
                     <c ca="center">
                        <p>1517</p>
                     </c>
                     <c ca="center">
                        <p>723</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.78</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.57</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>87.80</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>B. subtilis</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>43.5</p>
                     </c>
                     <c ca="center">
                        <p>4220</p>
                     </c>
                     <c ca="center">
                        <p>1832</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>87.93</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>94.95</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.86</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.47</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>Synechocystis</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>47.6</p>
                     </c>
                     <c ca="center">
                        <p>3169</p>
                     </c>
                     <c ca="center">
                        <p>954</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.18</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>96.53</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.55</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90.95</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>Y. pestis</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>47.6</p>
                     </c>
                     <c ca="center">
                        <p>4043</p>
                     </c>
                     <c ca="center">
                        <p>1640</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.04</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>94.84</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.66</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.29</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>E. coli</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>50.8</p>
                     </c>
                     <c ca="center">
                        <p>4290</p>
                     </c>
                     <c ca="center">
                        <p>1983</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.39</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>92.85</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>92.54</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.04</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>D. radiodurans</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>67.0</p>
                     </c>
                     <c ca="center">
                        <p>2622</p>
                     </c>
                     <c ca="center">
                        <p>1436</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>84.28</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>85.65</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>92.61</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>95.50</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>R. solanacearum</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>67.0</p>
                     </c>
                     <c ca="center">
                        <p>3442</p>
                     </c>
                     <c ca="center">
                        <p>1748</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>84.74</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.60</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>89.82</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>93.20</b>
                        </p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>
                              <it>S. coelicolor</it>
                           </b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>72.1</p>
                     </c>
                     <c ca="center">
                        <p>7851</p>
                     </c>
                     <c ca="center">
                        <p>956</p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>88.35</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.55</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>91.55</b>
                        </p>
                     </c>
                     <c ca="center">
                        <p>
                           <b>90.10</b>
                        </p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>The genomes are listed according to ascending G+C content. For each genome, the table shows: Genome GC content (GC %), the number of genes annotated in GenBank for that genome, the number of genes in the RescueNet training set, overall RescueNet sensitivity (Sn.), the sensitivity of RescueNet in finding genes longer than the 225 bp minimum prediction size (Sn. >225 bp), the sensitivity of RescueNet in finding genes that have been confirmed by homology with other genes in GenBank (Sn. Conserved), and finally, overall RescueNet specificity (Sp.)</p>
               </tblfn>
            </tbl>
            <p>Sequence data used in this study include the following 15 genomes and associated published genes available from the GenBank database: <it>A. aeolicus </it><abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, <it>B. subtilis </it><abbrgrp><abbr bid="B34">34</abbr></abbrgrp>, <it>Buchnera sp. </it><abbrgrp><abbr bid="B35">35</abbr></abbrgrp>, <it>B. burgdorferi </it><abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, <it>C. jejuni </it><abbrgrp><abbr bid="B37">37</abbr></abbrgrp>, <it>D. radiodurans </it>(chromosome 1) <abbrgrp><abbr bid="B38">38</abbr></abbrgrp>, <it>E. coli </it><abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, <it>H. influenzae </it><abbrgrp><abbr bid="B40">40</abbr></abbrgrp>, <it>H. pylori </it><abbrgrp><abbr bid="B41">41</abbr></abbrgrp>, <it>M. genitalium </it><abbrgrp><abbr bid="B42">42</abbr></abbrgrp>, <it>M. jannaschii </it><abbrgrp><abbr bid="B43">43</abbr></abbrgrp>, <it>R. solanacearum </it><abbrgrp><abbr bid="B44">44</abbr></abbrgrp>, <it>S. coelicolor </it><abbrgrp><abbr bid="B45">45</abbr></abbrgrp>, <it>Synechocystis sp. </it><abbrgrp><abbr bid="B46">46</abbr></abbrgrp>, and <it>Y. pestis </it><abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. These genomes were chosen to be representative of a wide range of GC content.</p>
         </sec>
         <sec>
            <st>
               <p>High G+C content genomes</p>
            </st>
            <p>Three of the genomes tested have very high G+C content (<it>D. radiodurans</it>, <it>R. solanacearum </it>and <it>S. coelicolor</it>). High G+C content genomes present a problem to many gene-finding methods because of the relative infrequency of randomly occurring stop codons. The scarcity of stop codons has the effect of a large number of long, overlapping ORFs occurring in the sequence, relatively few of which are actually protein-coding. Many of the current gene-finders fail to discriminate accurately between coding and non-coding ORFs in this type of situation.</p>
            <p>In our method, the relatively high specificity in each high G+C content genome suggests that RescueNet may have advantages in their annotation (see Table <tblr tid="T1">1</tblr>). To illustrate a case where RescueNet may be of practical use, we can consider the ORF annotated as DR1142 (see Figure <figr fid="F1">1</figr>) from <it>D. radiodurans</it>. This ORF is annotated to be protein-coding on the basis of the Glimmer2 prediction only. The RescueNet prediction in this area overlaps DR1142, but on the opposite strand. This type of situation, where a RescueNet prediction directly contradicts a GenBank/Glimmer2 annotation, occurs at least 23 times in the <it>D. radiodurans </it>genome. It is entirely possible that the Glimmer2 predictions are wrong in some of these cases, and the RescueNet predictions correct, but this cannot be proven without biochemical characterisation of the relevant gene. However, in the specific case of the DR1142 annotation, the RescueNet prediction has a much stronger database match than the GenBank annotation, and so has a high possibility of being correct.</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>Screenshot from the Artemis sequence viewer <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> showing a sample region of <it>D. radiodurans </it>and accompanying RescueNet predictions</p>
               </caption>
               <text>
                  <p>Screenshot from the Artemis sequence viewer <abbrgrp><abbr bid="B49">49</abbr></abbrgrp> showing a sample region of <it>D. radiodurans </it>and accompanying RescueNet predictions. Annotated genes are shown as white blocks, and predictions are shown in-frame as shaded blocks. Note the relative infrequency of stop codons (vertical lines in each frame) and the many ORFs that are not protein-coding regions. Note also the selected gene DR1142 and the contradicting RescueNet prediction. DR1142 is a hypothetical gene, predicted to be so by Glimmer2, and there is a strong possibility that the CDS marked by RescueNet is the correct prediction. The possibility is also raised by RescueNet that the gene DR1143 may be longer than previously annotated and contains a frameshift.</p>
               </text>
               <graphic file="1471-2105-5-23-1"/>
            </fig>
            <p>Another interesting pointer to the advantages of RescueNet in high G+C content genomes is the substantially higher percentage of genes with database matches that are correctly predicted by RescueNet (see Table <tblr tid="T1">1</tblr>). In <it>D. radiodurans</it>, for example, 92.54% of genes with database matches are correctly predicted by RescueNet compared with only 84.28% of the total GenBank gene annotations. These figures suggest that hypothetical genes that are predicted only by Markov-based methods are poorly recognised by RescueNet, possibly because many hypothetical genes in high G+C content genomes may in fact be false gene predictions.</p>
         </sec>
         <sec>
            <st>
               <p>Predicting the location of frameshifts</p>
            </st>
            <p>The general location of frameshifts within a gene sequence can be found by our method. Two features of our approach facilitate this. Firstly, even though the overall codon usage of a frameshifted gene could seem unusual, the two coding sections of the gene should each retain the organism's native codon usage. Secondly, our approach does not require that a prediction be bounded by a start and a stop codon. The sliding window used in our algorithm can therefore predict the correct coding frames each side of the frameshift.</p>
            <p>In an interesting example in Figure <figr fid="F1">1</figr>, two RescueNet predictions overlap the <it>D. radiodurans </it>gene DR1143 in such a way that it seems that there may be a frameshift that extends the protein-coding region of the gene past the annotated stop codon. In fact, combining the two RescueNet predictions offers a better database match to the same genes that the original annotation matches. This increases the possibility that the actual gene contains an authentic frameshift or at least that the extra RescueNet prediction is an evolutionary artefact.</p>
            <p>Figure <figr fid="F2">2</figr> shows another example of frameshifts which are detected by RescueNet. In this case, the <it>H. influenzae </it>genes HI0218 and HI0220 both contain frameshifts, but both are handled by RescueNet's predictions. Note that the GeneMark algorithms are known to show the location of frameshifted regions in much the same manner as we have described, but our approach has required no modification to our basic algorithm in order to facilitate the prediction of frameshifted genes.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>Artemis screenshot showing a sample region of the <it>H. influenzae </it>genome and associated RescueNet predictions</p>
               </caption>
               <text>
                  <p>Artemis screenshot showing a sample region of the <it>H. influenzae </it>genome and associated RescueNet predictions. As in Fig. <figr fid="F1">1</figr>, the annotated genes are shown as white blocks and the RescueNet predictions are shown in-frame as shaded blocks. Note that genes HI0218 and HI0220 contain authentic frameshifts. RescueNet gives two predictions that overlap each of these gene, and they meet near the frameshift point.</p>
               </text>
               <graphic file="1471-2105-5-23-2"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Comparison with a Markov-based method</p>
            </st>
            <p>There may be a perception that any method using codon usage as the coding measure will only give predictions that are a subset of the predictions given by a Markov-based method that uses a 4<sup>th </sup>or 5<sup>th </sup>order model. To counter this argument, we compared the predictions of our method in two genomes (<it>H. influenzae </it>and <it>H. pylori</it>) to those of the web-based version of GeneMark.hmm 2.1 for prokaryotes <url>http://opal.biology.gatech.edu/GeneMark/gmhmm2_prok.cgi</url>, which generated results using two models; the 'typical' and 'atypical' models.</p>
            <p>The published sensitivity of GeneMark.hmm in the <it>H. influenzae </it>genome (96.2%, see <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>) is higher than that of our method, and the published specificity (89.8%) is lower, so GeneMark.hmm should give more predictions overall for this genome. However, 11 <it>H. influenzae </it>genes are predicted correctly by our method which are not predicted by GeneMark.hmm using 5<sup>th </sup>order models, and 14 genes are predicted correctly by our method which are not predicted by GeneMark.hmm using 4<sup>th </sup>order models. In the <it>H. pylori </it>genome, GeneMark.hmm has again a higher sensitivity and a lower specificity (94.0% &amp; 91.3% respectively), but even more genes are exclusively predicted by our method; 25 genes as compared to the 5<sup>th </sup>order GeneMark.hmm models and 30 genes as compared to the 4<sup>th </sup>order models. Although these genes represent a small proportion of the total number of genes in the respective organisms, the fact that they are only predicted by RescueNet gives some indication of the advantage of using RescueNet in conjunction with other gene prediction methods.</p>
         </sec>
         <sec>
            <st>
               <p>Possible future improvements</p>
            </st>
            <p>RSCU is only one of many possible criterion with which to measure coding potential (see <abbrgrp><abbr bid="B48">48</abbr></abbrgrp> for a review of others). In-phase hexamers are accepted as the most accurate k-mer frequency based measure of coding potential, and so their use as the coding measure in a Self-Organizing Map may offer improvement in accuracy over RescueNet. However, the larger space dimension of the hexamer coding measure may force a larger sliding window to be used and therefore the use of hexamers could actually decrease the precision of gene prediction.</p>
            <p>The future use of alternative coding measures with our approach may also help to overcome difficulties in recognising genes that are reputed to be horizontally transferred in origin. Horizontally transferred genes would be more likely to have dissimilar codon usage patterns to other genes in the genome. Since our approach currently relies on the codon usage patterns it finds in the training set, it is unlikely to mark areas of unseen codon usage as protein-coding regions. Note, however, we are not suggesting all genes that were not recognised by our approach are of horizontally transferred origin. There are many explanations for a gene displaying atypical codon usage, and codon usage cannot be used as an accurate indicator of horizontal transfer.</p>
            <p>There may be other ways to improve the accuracy of our method. The current implementation has a rather simple post-processing step that does not rely on modifying the prediction in order to include start or stop codons. While the practise of not constraining a prediction to be bound by a start and stop codon stands in stark contrast to other methods, we did not wish to lengthen or shorten any predictions artificially, since doing so can mislead annotation teams (especially in start site annotation). Relatively simple post-processing steps may, in fact, be advantageous. Our predictions represent a raw account of regions of the genome that display typical or native codon usage patterns, and this in itself may be of interest to annotation teams who use codon usage plots as the basis for some genomic feature annotations.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>Gene-finding in prokaryotic genomes is still not a completely solved problem, partly because current methods use a limited number of models to represent the training data. In this paper, we have introduced an alternative, independent approach to the problem. The Self-Organizing Map approach has the potential to overcome the issue of variation in the statistical properties of the training set data, and can automatically train a representative number of gene-models, depending on the degree of variation within the training data.</p>
         <p>While the current implementation of our approach produces lower raw sensitivity scores in comparison to established Markov-based techniques, we have clearly shown that our method can predict some genes that other methods cannot. We have also demonstrated advantages in annotating the traditionally 'difficult' high G+C content genomes. Annotation teams who are concerned with the complete and accurate annotation of a sequenced genome should find our method useful when used alongside other gene-finding methods. The relatively high specificity of our method, coupled with the independent nature of the algorithm, should make it a useful tool in confirming the predictions of other software programs and in some cases pointing out areas of conflicting or contradictory predictions that are worthy of further examination.</p>
      </sec>
      <sec>
         <st>
            <p>Availability</p>
         </st>
         <p>Project name: RescueNet</p>
         <p>Project home page: <url>http://bioinf.nuigalway.ie/RescueNet/</url></p>
         <p>Operating systems: Windows, Linux, IRIX, Digital Unix. Source code also available.</p>
         <p>Programming Language: C++</p>
         <p>Licence: GNU GPL</p>
         <p>Restrictions to use by non-academics: Please contact the authors.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>SM conceived of the study, and designed, implemented and tested the RescueNet software. JMcI, TS and AG supervised, and participated in the design of, the study. All authors read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>The authors would like to thank Kim Rutherford of the Welcome Trust Sanger Institute Pathogen Sequencing Group for his help and suggestions. We also thank the two anonymous reviewers for their useful comments. S.M. is funded by an EMBARK postgraduate fellowship from the Irish Research Council for Science, Engineering &amp; Technology.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Current methods of gene prediction, their strengths and weaknesses</p>
            </title>
            <aug>
               <au>
                  <snm>Mathe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Sagot</snm>
                  <fnm>MF</fnm>
               </au>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2002</pubdate>
            <volume>30</volume>
            <fpage>4103</fpage>
            <lpage>4117</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkf543</pubid>
                  <pubid idtype="pmpid">12364589</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Computational methods for the identification of genes in vertebrate genomic sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Claverie</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Hum Mol Genet</source>
            <pubdate>1997</pubdate>
            <volume>6</volume>
            <fpage>1735</fpage>
            <lpage>1744</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/6.10.1735</pubid>
                  <pubid idtype="pmpid">9300666</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Finding genes by computer: the state of the art</p>
            </title>
            <aug>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1996</pubdate>
            <volume>12</volume>
            <fpage>316</fpage>
            <lpage>320</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0168-9525(96)10038-X</pubid>
                  <pubid idtype="pmpid">8783942</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>GeneMark.hmm: new solutions for gene finding</p>
            </title>
            <aug>
               <au>
                  <snm>Lukashin</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1998</pubdate>
            <volume>26</volume>
            <fpage>1107</fpage>
            <lpage>1115</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/26.4.1107</pubid>
                  <pubid idtype="pmpid">9461475</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Improved microbial gene identification with GLIMMER</p>
            </title>
            <aug>
               <au>
                  <snm>Delcher</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Harmon</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Kasif</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>4636</fpage>
            <lpage>4641</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/27.23.4636</pubid>
                  <pubid idtype="pmpid">10556321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>An assessment of gene prediction accuracy in large DNA sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Guigo</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Agarwal</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Burset</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>1631</fpage>
            <lpage>1642</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.122800</pubid>
                  <pubid idtype="pmpid">11042160</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Powers and pitfalls in sequence analysis: the 70% hurdle</p>
            </title>
            <aug>
               <au>
                  <snm>Bork</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>398</fpage>
            <lpage>400</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.10.4.398</pubid>
                  <pubid idtype="pmpid">10779480</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Genome annotation assessment in Drosophila melanogaster</p>
            </title>
            <aug>
               <au>
                  <snm>Reese</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>NL</fnm>
               </au>
               <au>
                  <snm>Ohler</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Abril</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Lewis</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2000</pubdate>
            <volume>10</volume>
            <fpage>483</fpage>
            <lpage>501</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.10.4.483</pubid>
                  <pubid idtype="pmpid">10779488</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Improving gene recognition accuracy by combining predictions from two gene-finding programs</p>
            </title>
            <aug>
               <au>
                  <snm>Rogic</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Ouellette</snm>
                  <fnm>BF</fnm>
               </au>
               <au>
                  <snm>Mackworth</snm>
                  <fnm>AK</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>1034</fpage>
            <lpage>1045</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/18.8.1034</pubid>
                  <pubid idtype="pmpid">12176826</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Detection of new genes in a bacterial genome using Markov models for three gene classes</p>
            </title>
            <aug>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>McIninch</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Koonin</snm>
                  <fnm>EV</fnm>
               </au>
               <au>
                  <snm>Rudd</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Medigue</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1995</pubdate>
            <volume>23</volume>
            <fpage>3554</fpage>
            <lpage>3562</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7567469</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Gene prediction and gene classes in Arabidopsis thaliana</p>
            </title>
            <aug>
               <au>
                  <snm>Mathe</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Dehais</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Pavy</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Rombauts</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Van Montagu</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rouze</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>J Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>78</volume>
            <fpage>293</fpage>
            <lpage>299</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-1656(00)00196-6</pubid>
                  <pubid idtype="pmpid">10751690</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Correlation between the Abundance of Escherichia coli Transfer RNAs and the Occurance of the Respective Codons in its Protein Genes: A Proposal for a Synonymous Codon Choice that is Optimal for the E. coli Translational System</p>
            </title>
            <aug>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>Toshimichi</fnm>
               </au>
            </aug>
            <source>J. Mol. Biol.</source>
            <pubdate>1981</pubdate>
            <volume>151</volume>
            <fpage>389</fpage>
            <lpage>409</lpage>
            <xrefbib>
               <pubid idtype="pmpid">6175758</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>The selection-mutation-drift theory of synonymous codon usage</p>
            </title>
            <aug>
               <au>
                  <snm>Bulmer</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>1991</pubdate>
            <volume>129</volume>
            <fpage>897</fpage>
            <lpage>907</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1752426</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Codon usage: mutational bias, translational selection, or both?</p>
            </title>
            <aug>
               <au>
                  <snm>Sharp</snm>
                  <fnm>Paul M.</fnm>
               </au>
               <au>
                  <snm>Stenico</snm>
                  <fnm>Michele</fnm>
               </au>
               <au>
                  <snm>Peden</snm>
                  <fnm>John F.</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>Andrew T.</fnm>
               </au>
            </aug>
            <source>Biochem. Soc. Trans.</source>
            <pubdate>1993</pubdate>
            <volume>21</volume>
            <fpage>835</fpage>
            <lpage>841</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8132077</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Evolution of synonymous codon usage in metazoans</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Curr Opin Genet Dev</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>640</fpage>
            <lpage>649</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0959-437X(02)00353-2</pubid>
                  <pubid idtype="pmpid">12433576</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Asymmetric substitution patterns in the two DNA strands of bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Lobry</snm>
                  <fnm>JR</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>1996</pubdate>
            <volume>13</volume>
            <fpage>660</fpage>
            <lpage>665</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8676740</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Strand asymmetries in DNA evolution</p>
            </title>
            <aug>
               <au>
                  <snm>Francino</snm>
                  <fnm>MP</fnm>
               </au>
               <au>
                  <snm>Ochman</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>1997</pubdate>
            <volume>13</volume>
            <fpage>240</fpage>
            <lpage>245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(97)01118-9</pubid>
                  <pubid idtype="pmpid">9196330</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Replicational and transcriptional selection on codon usage in Borrelia burgdorferi</p>
            </title>
            <aug>
               <au>
                  <snm>McInerney</snm>
                  <fnm>James O.</fnm>
               </au>
            </aug>
            <source>Proc. Natl. Acad. Sci.</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>10698</fpage>
            <lpage>10703</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.95.18.10698</pubid>
                  <pubid idtype="pmpid">9724767</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Strand compositional asymmetry in bacterial and large viral genomes</p>
            </title>
            <aug>
               <au>
                  <snm>Mrazek</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Karlin</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1998</pubdate>
            <volume>95</volume>
            <fpage>3720</fpage>
            <lpage>3725</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.95.7.3720</pubid>
                  <pubid idtype="pmpid">9520433</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Proteome composition and codon usage in spirochaetes: species-specific and DNA strand-specific mutational biases</p>
            </title>
            <aug>
               <au>
                  <snm>Lafay</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>AT</fnm>
               </au>
               <au>
                  <snm>McLean</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Devine</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>KH</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1999</pubdate>
            <volume>27</volume>
            <fpage>1642</fpage>
            <lpage>1649</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/27.7.1642</pubid>
                  <pubid idtype="pmpid">10075995</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Base Composition Skews, Replication Orientation, and Gene Orientation in 12 Prokaryote Genomes</p>
            </title>
            <aug>
               <au>
                  <snm>McLean</snm>
                  <fnm>Michael J.</fnm>
               </au>
               <au>
                  <snm>Wolfe</snm>
                  <fnm>Kenneth H.</fnm>
               </au>
               <au>
                  <snm>Devine</snm>
                  <fnm>Kevin M.</fnm>
               </au>
            </aug>
            <source>J. Mol. Evol.</source>
            <pubdate>1998</pubdate>
            <volume>47</volume>
            <fpage>691</fpage>
            <lpage>696</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9847411</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Intragenomic base content variation is a potential source of biases when searching for horizontally transferred genes</p>
            </title>
            <aug>
               <au>
                  <snm>Guindon</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Perriere</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>1838</fpage>
            <lpage>1840</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11504864</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis</p>
            </title>
            <aug>
               <au>
                  <snm>Duret</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Mouchiroud</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>1999</pubdate>
            <volume>96</volume>
            <fpage>4482</fpage>
            <lpage>4487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1073/pnas.96.8.4482</pubid>
                  <pubid idtype="pmpid">10200288</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>Selection at the amino acid level can influence synonymous codon usage: implications for the study of codon adaptation in plastid genes</p>
            </title>
            <aug>
               <au>
                  <snm>Morton</snm>
                  <fnm>BR</fnm>
               </au>
            </aug>
            <source>Genetics</source>
            <pubdate>2001</pubdate>
            <volume>159</volume>
            <fpage>347</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11560910</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>How to interpret an anonymous bacterial genome: machine learning approach to gene identification</p>
            </title>
            <aug>
               <au>
                  <snm>Hayes</snm>
                  <fnm>WS</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>1154</fpage>
            <lpage>1171</lpage>
            <xrefbib>
               <pubid idtype="pmpid">9847079</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>Translation conditional models for protein coding sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Rodolphe</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Mathe</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>J Comput Biol</source>
            <pubdate>2000</pubdate>
            <volume>7</volume>
            <fpage>249</fpage>
            <lpage>260</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1089/10665270050081504</pubid>
                  <pubid idtype="pmpid">10890400</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Self-Organizing Maps</p>
            </title>
            <aug>
               <au>
                  <snm>Kohonen</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <publisher>Berlin, Springer-Verlag</publisher>
            <pubdate>1995</pubdate>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome</p>
            </title>
            <aug>
               <au>
                  <snm>Kanaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kinouchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Abe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kudo</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Nishi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Mori</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>2001</pubdate>
            <volume>276</volume>
            <fpage>89</fpage>
            <lpage>99</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(01)00673-4</pubid>
                  <pubid idtype="pmpid">11591475</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Analysis of codon usage patterns of bacterial genomes using the self-organizing map</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>HC</fnm>
               </au>
               <au>
                  <snm>Badger</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kearney</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Li</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>792</fpage>
            <lpage>800</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11319263</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Informatics for unveiling hidden genome signatures</p>
            </title>
            <aug>
               <au>
                  <snm>Abe</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kanaya</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kinouchi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ichiba</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kozuki</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Ikemura</snm>
                  <fnm>T</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>693</fpage>
            <lpage>702</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.634603</pubid>
                  <pubid idtype="pmpid">12671005</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>Codon Bias and Base Composition Are Poor Indicators of Horizontally Transferred Genes</p>
            </title>
            <aug>
               <au>
                  <snm>Koski</snm>
                  <fnm>Liisa B.</fnm>
               </au>
               <au>
                  <snm>Morton</snm>
                  <fnm>Richard A.</fnm>
               </au>
               <au>
                  <snm>Golding</snm>
                  <fnm>G. Brian</fnm>
               </au>
            </aug>
            <source>Mol Biol Evol</source>
            <pubdate>2001</pubdate>
            <volume>18</volume>
            <fpage>404</fpage>
            <lpage>412</lpage>
            <xrefbib>
               <pubid idtype="pmpid">11230541</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions</p>
            </title>
            <aug>
               <au>
                  <snm>Besemer</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Lomsadze</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Borodovsky</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2001</pubdate>
            <volume>29</volume>
            <fpage>2607</fpage>
            <lpage>2618</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/29.12.2607</pubid>
                  <pubid idtype="pmpid">11410670</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>The complete genome of the hyperthermophilic bacterium Aquifex aeolicus</p>
            </title>
            <aug>
               <au>
                  <snm>Deckert</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Warren</snm>
                  <fnm>PV</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Young</snm>
                  <fnm>WG</fnm>
               </au>
               <au>
                  <snm>Lenox</snm>
                  <fnm>AL</fnm>
               </au>
               <au>
                  <snm>Graham</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Overbeek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Snead</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Keller</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Aujay</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Huber</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Feldman</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Short</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Swanson</snm>
                  <fnm>RV</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1998</pubdate>
            <volume>392</volume>
            <fpage>353</fpage>
            <lpage>358</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/32831</pubid>
                  <pubid idtype="pmpid">9537320</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>The complete genome sequence of the gram-positive bacterium Bacillus subtilis</p>
            </title>
            <aug>
               <au>
                  <snm>Kunst</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Ogasawara</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Moszer</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Albertini</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Alloni</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Azevedo</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Bertero</snm>
                  <fnm>MG</fnm>
               </au>
               <au>
                  <snm>Bessieres</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Bolotin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Borchert</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Borriss</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Boursier</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Brans</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Braun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Brignell</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Bron</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brouillet</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Bruschi</snm>
                  <fnm>CV</fnm>
               </au>
               <au>
                  <snm>Caldwell</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Capuano</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Carter</snm>
                  <fnm>NM</fnm>
               </au>
               <au>
                  <snm>Choi</snm>
                  <fnm>SK</fnm>
               </au>
               <au>
                  <snm>Codani</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Connerton</snm>
                  <fnm>IF</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>390</volume>
            <fpage>249</fpage>
            <lpage>256</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/36786</pubid>
                  <pubid idtype="pmpid">9384377</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS</p>
            </title>
            <aug>
               <au>
                  <snm>Shigenobu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Hattori</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sakaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Ishikawa</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>407</volume>
            <fpage>81</fpage>
            <lpage>86</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35024074</pubid>
                  <pubid idtype="pmpid">10993077</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B36">
            <title>
               <p>Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Casjens</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>WM</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Lathigra</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Ketchum</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Gwinn</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hanson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>van Vugt</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Palmer</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>390</volume>
            <fpage>580</fpage>
            <lpage>586</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/37551</pubid>
                  <pubid idtype="pmpid">9403685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B37">
            <title>
               <p>The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wren</snm>
                  <fnm>BW</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Ketley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Basham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Chillingworth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Feltwell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Holroyd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jagels</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Karlyshev</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Moule</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Pallen</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Penn</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>KM</fnm>
               </au>
               <au>
                  <snm>van Vliet</snm>
                  <fnm>AH</fnm>
               </au>
               <au>
                  <snm>Whitehead</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2000</pubdate>
            <volume>403</volume>
            <fpage>665</fpage>
            <lpage>668</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35001088</pubid>
                  <pubid idtype="pmpid">10688204</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Genome sequence of the radioresistant bacterium Deinococcus radiodurans R1</p>
            </title>
            <aug>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Hickey</snm>
                  <fnm>EK</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Gwinn</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>DL</fnm>
               </au>
               <au>
                  <snm>Moffat</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Qin</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Pamphile</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Crosby</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vamathevan</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Lam</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Zalewski</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Makarova</snm>
                  <fnm>KS</fnm>
               </au>
               <au>
                  <snm>Aravind</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Daly</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1999</pubdate>
            <volume>286</volume>
            <fpage>1571</fpage>
            <lpage>1577</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.286.5444.1571</pubid>
                  <pubid idtype="pmpid">10567266</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>The complete genome sequence of Escherichia coli K-12</p>
            </title>
            <aug>
               <au>
                  <snm>Blattner</snm>
                  <fnm>FR</fnm>
               </au>
               <au>
                  <snm>Plunkett</snm>
                  <fnm>G., 3rd</fnm>
               </au>
               <au>
                  <snm>Bloch</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Perna</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Burland</snm>
                  <fnm>V</fnm>
               </au>
               <au>
                  <snm>Riley</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Collado-Vides</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Glasner</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Rode</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>Mayhew</snm>
                  <fnm>GF</fnm>
               </au>
               <au>
                  <snm>Gregor</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>NW</fnm>
               </au>
               <au>
                  <snm>Kirkpatrick</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Goeden</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rose</snm>
                  <fnm>DJ</fnm>
               </au>
               <au>
                  <snm>Mau</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Shao</snm>
                  <fnm>Y</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1997</pubdate>
            <volume>277</volume>
            <fpage>1453</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.277.5331.1453</pubid>
                  <pubid idtype="pmpid">9278503</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Whole-genome random sequencing and assembly of Haemophilus influenzae Rd</p>
            </title>
            <aug>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>269</volume>
            <fpage>496</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7542800</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The complete genome sequence of the gastric pathogen Helicobacter pylori</p>
            </title>
            <aug>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Ketchum</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Klenk</snm>
                  <fnm>HP</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Quackenbush</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Loftus</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Khalak</snm>
                  <fnm>HG</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>McKenney</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Fitzegerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>1997</pubdate>
            <volume>388</volume>
            <fpage>539</fpage>
            <lpage>547</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/41483</pubid>
                  <pubid idtype="pmpid">9252185</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>The minimal gene complement of Mycoplasma genitalium</p>
            </title>
            <aug>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>397</fpage>
            <lpage>403</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7569993</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii</p>
            </title>
            <aug>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Olsen</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Blake</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>FitzGerald</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Reich</snm>
                  <fnm>CI</fnm>
               </au>
               <au>
                  <snm>Overbeek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Weinstock</snm>
                  <fnm>KG</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NS</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1996</pubdate>
            <volume>273</volume>
            <fpage>1058</fpage>
            <lpage>1073</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8688087</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Genome sequence of the plant pathogen Ralstonia solanacearum</p>
            </title>
            <aug>
               <au>
                  <snm>Salanoubat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Genin</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Artiguenave</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Gouzy</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Mangenot</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Arlat</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Billault</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brottier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Camus</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Cattolico</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Chandler</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Choisne</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Claudel-Renard</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Cunnac</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Demange</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Gaspin</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Lavie</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Moisan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Robert</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Saurin</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Schiex</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Siguier</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Thebault</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Whalen</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Wincker</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Levy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Weissenbach</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Boucher</snm>
                  <fnm>CA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>415</volume>
            <fpage>497</fpage>
            <lpage>502</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/415497a</pubid>
                  <pubid idtype="pmpid">11823852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2)</p>
            </title>
            <aug>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Chater</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Cerdeno-Tarraga</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Challis</snm>
                  <fnm>GL</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Harris</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Kieser</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Harper</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bateman</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Brown</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Chandra</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>CW</fnm>
               </au>
               <au>
                  <snm>Collins</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Goble</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Hidalgo</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hornsby</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Howarth</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>CH</fnm>
               </au>
               <au>
                  <snm>Kieser</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Larke</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Oliver</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>O'Neil</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Rabbinowitsch</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Rutter</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Seeger</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Saunders</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Sharp</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Squares</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Squares</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Taylor</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Warren</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Wietzorrek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Woodward</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Hopwood</snm>
                  <fnm>DA</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2002</pubdate>
            <volume>417</volume>
            <fpage>141</fpage>
            <lpage>147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/417141a</pubid>
                  <pubid idtype="pmpid">12000953</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions</p>
            </title>
            <aug>
               <au>
                  <snm>Kaneko</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Sato</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kotani</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tanaka</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Asamizu</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Miyajima</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Hirosawa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sugiura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Sasamoto</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kimura</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hosouchi</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Matsuno</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Muraki</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Nakazaki</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Naruo</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Okumura</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Shimpo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Takeuchi</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Wada</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Watanabe</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Yamada</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Yasuda</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Tabata</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>DNA Res</source>
            <pubdate>1996</pubdate>
            <volume>3</volume>
            <fpage>109</fpage>
            <lpage>136</lpage>
            <xrefbib>
               <pubid idtype="pmpid">8905231</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Genome sequence of Yersinia pestis, the causative agent of plague</p>
            </title>
            <aug>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wren</snm>
                  <fnm>BW</fnm>
               </au>
               <au>
                  <snm>Thomson</snm>
                  <fnm>NR</fnm>
               </au>
               <au>
                  <snm>Titball</snm>
                  <fnm>RW</fnm>
               </au>
               <au>
                  <snm>Holden</snm>
                  <fnm>MT</fnm>
               </au>
               <au>
                  <snm>Prentice</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Sebaihia</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>James</snm>
                  <fnm>KD</fnm>
               </au>
               <au>
                  <snm>Churcher</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mungall</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Baker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Basham</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bentley</snm>
                  <fnm>SD</fnm>
               </au>
               <au>
                  <snm>Brooks</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Cerdeno-Tarraga</snm>
                  <fnm>AM</fnm>
               </au>
               <au>
                  <snm>Chillingworth</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Cronin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Davies</snm>
                  <fnm>RM</fnm>
               </au>
               <au>
                  <snm>Davis</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Dougan</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Feltwell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hamlin</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Holroyd</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Jagels</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Karlyshev</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Leather</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Moule</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Oyston</snm>
                  <fnm>PC</fnm>
               </au>
               <au>
                  <snm>Quail</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Simmonds</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Skelton</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stevens</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Whitehead</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>BG</fnm>
               </au>
            </aug>
            <source>Nature</source>
            <pubdate>2001</pubdate>
            <volume>413</volume>
            <fpage>523</fpage>
            <lpage>527</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/35097083</pubid>
                  <pubid idtype="pmpid">11586360</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Assessment of protein coding measures</p>
            </title>
            <aug>
               <au>
                  <snm>Fickett</snm>
                  <fnm>JW</fnm>
               </au>
               <au>
                  <snm>Tung</snm>
                  <fnm>CS</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1992</pubdate>
            <volume>20</volume>
            <fpage>6441</fpage>
            <lpage>6450</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1480466</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>Artemis: sequence visualization and annotation</p>
            </title>
            <aug>
               <au>
                  <snm>Rutherford</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Parkhill</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Crook</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Horsnell</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Rajandream</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Barrell</snm>
                  <fnm>B</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>944</fpage>
            <lpage>945</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/16.10.944</pubid>
                  <pubid idtype="pmpid">11120685</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
