<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>1471-2105-6-2</ui>
   <ji>1471-2105</ji>
   <fm>
      <dochead>Software</dochead>
      <bibl>
         <title>
            <p>Visualization of comparative genomic analyses by BLAST score ratio</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Rasko</snm>
               <mi>A</mi>
               <fnm>David</fnm>
               <insr iid="I1"/>
               <email>drasko@tigr.org</email>
            </au>
            <au id="A2">
               <snm>Myers</snm>
               <mi>SA</mi>
               <fnm>Garry</fnm>
               <insr iid="I1"/>
               <email>gmyers@tigr.org</email>
            </au>
            <au id="A3" ca="yes">
               <snm>Ravel</snm>
               <fnm>Jacques</fnm>
               <insr iid="I1"/>
               <email>jravel@tigr.org</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850 USA</p>
            </ins>
         </insg>
         <source>BMC Bioinformatics</source>
         <issn>1471-2105</issn>
         <pubdate>2005</pubdate>
         <volume>6</volume>
         <issue>1</issue>
         <fpage>2</fpage>
         <url>http://www.biomedcentral.com/1471-2105/6/2</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">15634352</pubid>
               <pubid idtype="doi">10.1186/1471-2105-6-2</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>18</day>
               <month>10</month>
               <year>2004</year>
            </date>
         </rec>
         <acc>
            <date>
               <day>05</day>
               <month>1</month>
               <year>2005</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>05</day>
               <month>1</month>
               <year>2005</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2005</year>
         <collab>Rasko et al; licensee BioMed Central Ltd.</collab>
         <note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>The first microbial genome sequence, <it>Haemophilus influenzae</it>, was published in 1995. Since then, more than 400 microbial genome sequences have been completed or commenced. This massive influx of data provides the opportunity to obtain biological insights through comparative genomics. However few tools are available for this scale of comparative analysis.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>The BLAST Score Ratio (BSR) approach, implemented in a Perl script, classifies all putative peptides within three genomes using a measure of similarity based on the ratio of BLAST scores. The output of the BSR analysis enables global visualization of the degree of proteome similarity between all three genomes. Additional output enables the genomic synteny (conserved gene order) between each genome pair to be assessed. Furthermore, we extend this synteny analysis by overlaying BSR data as a color dimension, enabling visualization of the degree of similarity of the peptides being compared.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusions</p>
               </st>
               <p>Combining the degree of similarity, synteny and annotation will allow rapid identification of conserved genomic regions as well as a number of common genomic rearrangements such as insertions, deletions and inversions. The script and example visualizations are available at: <url>http://www.microbialgenomics.org/BSR/</url>.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>In the decade since the publication of the <it>Haemophilus influenzae </it>genome sequence in 1995 <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>, 191 microbial genomes have been completed, with another 276 in progress <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>; as of October 14, 2004). Multiple strains of the same organism, or multiple species of the same genus are being sequenced or have been completed, making comparative genomic analysis possible on an unprecedented scale. As the technology continues to improve, the number of completed microbial genome sequences will further increase &#8211; a major challenge of the comparative genomic era is to fully exploit this data. However, the development of tools for analysis of such data sets has not kept pace.</p>
         <p>BLAST analysis has become a ubiquitous method of interrogating new sequence data, but there are limitations to using BLAST alone as a discriminating tool. Many other methods and individuals use BLAST output E-values as a criterion for data parsing. While this measure may be efficient, the output is often skewed by both the database used for comparison and the length of the match <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Small regions of high similarity can generate an artificially low E-value and negate the global level of similarity exhibited by the sequence. This bias is eliminated when using the BLAST raw score as it is directly derived from the similarity of the match. However the value of the BLAST score varies with the length of the peptide queried, and hence is not suitable alone for comparative analysis using universal cutoffs <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>.</p>
         <p>Several other tools utilize the BLAST algorithms to compare nucleotide or peptide sequences from genome projects. The Wellcome Trust Sanger Institute ACT software <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> can display nucleotide similarity between two genomes based on BLASTN E-value. ACT builds upon Artemis and displays regions of high similarity mapped on the genome annotation. The GenomeComp tool <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> displays a similar analysis also based on BLASTN E-values to compare genome sequences. NCBI Taxplot, a three-way genome comparison tool based on precomputed protein BLASTP E-values displays a point for each protein in the Reference genome based on the best alignment with proteins in each of the two genomes being compared <abbrgrp><abbr bid="B7">7</abbr></abbrgrp>. On the other hand, the SimiTri program utilizes BLASTP comparisons of three proteomes and uses the raw BLAST score, not E-values. However, only protein similarity data is represented and no information on the comparative structure of the genomes is provided <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. Moreover, the SimiTri program does not address BLAST artifacts derived from the size of the database or the length of the match. This paper describes the BLAST score ratio (BSR) algorithm that enables comparative analysis of multiple proteomes, together with visualization of genome structure (synteny).</p>
         <p>BSR analysis is a departure from traditional genome scale analyses as it overcomes the limitations of BLAST E-values in comparative studies by normalizing the BLAST raw scores. BSR analysis is a tool for the rapid comparison of complete proteomes of any three genomes, and enables a visual evaluation of the overall degree of similarity of these proteomes and their genomic structure.</p>
      </sec>
      <sec>
         <st>
            <p>Implementation</p>
         </st>
         <p>We have implemented the BSR algorithm using Perl. The inputs are the predicted proteomes of each of the three genomes under analysis, formatted as multi-FASTA files. An additional file for each of the proteomes is required. This file must contain a unique identifier, matching the FASTA header of the corresponding peptide in the multi-FASTA file, the relative genomic location of the start and stop of the coding region as well as the annotation for each peptide. The user selects one proteome as the "Reference"; the other two proteomes are termed "Query<sub>1</sub>" and "Query<sub>2</sub>" respectively. Initiation of the script results in each of the putative peptides in the Reference proteome being compared to all of the other peptides in the Reference and Query proteomes using NCBI BLASTP.</p>
         <p>The BSR is then computed as follows. The BLAST raw score for each Reference peptide against itself is stored as the Reference score. Each Reference peptide is then compared to each peptide in the Query<sub>1 </sub>and Query<sub>2 </sub>proteomes with each best BLAST raw score recorded as Query<sub>1 </sub>and Query<sub>2</sub>, respectively (Figure <figr fid="F1">1</figr>). The BSR is calculated by dividing the Query score by the Reference score for each Reference peptide (Figure <figr fid="F1">1A</figr>). Thus, for each peptide in the Reference genome, two numbers are generated, one from each from the best matches in Query<sub>1 </sub>and Query<sub>2</sub>, thus normalizing all scores in the range of 0 to 1. A score of 1 indicates a perfect match of the Reference peptide to a Query peptide and score of 0 indicates no BLAST match of the Reference peptide in the Query proteome. The BLAST raw score is used rather than the E-value for the BLASTP results as it more accurately accounts for the length of the similarity between the Reference and Query peptides <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B9">9</abbr></abbrgrp>. This normalized pair of numbers can be plotted as coordinates in Cartesian space for each peptide in the Reference proteome, enabling the visualization of the entire Reference proteome in comparison to the two Query proteomes (Figure <figr fid="F1">1B</figr>).</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>BSR rationale and scatter plot example</p>
            </caption>
            <text>
               <p><b>BSR rationale and scatter plot example. </b><b>A. </b>BLAST score ratio analysis (BSR) calculation demonstrating how the two coordinates for plotting in figures B and C are calculated. <b>B. </b>Locations of the peptide spot revels the similarity that the peptide has to the two Query genomes. Use of a 0.4 separator is based on ~30% amino acid identity over 30% of the length of the peptide [10]. <b>C. </b>Sample data obtained from comparison of <it>Chlaymidia caviae </it>GPIC (GenBank Accession Number AE015925) to the proteomes of <it>Chlamydia muridarum </it>strain Nigg (GenBank Accession Number AE002160) and <it>Chlamydia pneumoniae </it>AR39 (GenBank Accession Number AE002161) [17]. Each point in the figure represents a single peptide in <it>Chlaymidia caviae </it>GPIC This analysis reveals that while these organisms are very similar, <it>C. caviae </it>is more similar to <it>C. pneumoniae </it>AR39 than <it>C. muridarum </it>strain Nigg due the skew of peptides with a slope of greater than 1.</p>
            </text>
            <graphic file="1471-2105-6-2-1"/>
         </fig>
      </sec>
      <sec>
         <st>
            <p>Outputs</p>
         </st>
         <p>Following calculation of the BSRs, a number of output files are generated, including both text and graphical formats. The text files are tab-delimited for ease of parsing; filenames are derived from the named proteome files used as input into the script. The R_Q1_Q2.txt (Reference_Query<sub>1</sub>_Query<sub>2</sub>.txt) output contains an ordered list including the Reference peptide unique identifier, annotation, and Reference BLAST raw score, in addition to the unique identifier of the best hits in the Query proteomes, corresponding BLAST raw scores and the calculated BSR. Additionally, four unique files are generated corresponding to the peptides within the four quadrants delineated in Figure <figr fid="F1">1B</figr>. The four quadrants are derived from a BSR threshold value of 0.4, which was empirically determined to represent approximately 30% amino acid identity over approximately 30% of the peptide length, a commonly used threshold for peptide similarity <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. This threshold value can be adjusted using the "-C" option (see help file).</p>
         <p>The graphical output files are viewed with Gnuplot <abbrgrp><abbr bid="B11">11</abbr></abbrgrp> to reveal the global similarity of the compared genomes as well as the level of conserved genome structure. PostScript and xfig <abbrgrp><abbr bid="B12">12</abbr></abbrgrp> graphic files are subsequently generated by Gnuplot. The scatter or similarity plot provides an overall view of the level and number of similar and dissimilar proteins in the Reference proteome when compared to the Query proteomes (Figure <figr fid="F1">1C</figr>). The regions of the graph are color-coded depending on the level of similarity between the three genomes (Figure <figr fid="F1">1B</figr>). Quadrant A (BSR &lt; 0.4), colored in orange, contains peptides unique to the Reference proteome with little similarity in either of the Query proteomes. Quadrant C (BSR > 0.4), colored Red, contains peptides that have significant similarity in all three compared proteomes. Quadrant B, colored green, contains Reference peptides with similarity to only Query proteome 2, whereas Quadrant D, colored blue, contains Reference peptides that have similarity to only Query proteome 1.</p>
         <p>Two additional plots, termed synteny plots, are generated, one for comparison of the Reference proteome to each Query proteome, by plotting the genomic location of the Reference peptide on the X-axis and the genomic location of the most similar Query peptide on the Y-axis. This plot alone would demonstrate the level of synteny (conservation of gene order) between the two genomes <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, however, an additional level of information is included by coloring each point based on the BSR (see legend Figure <figr fid="F2">2</figr>). The color provides an additional visual clue to the global level of similarities of the proteomes. For example genomes can be highly syntenic with relatively low levels of proteomic similarity as is shown in Figure <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr> or they may have a high degree of protein similarity <b><ul>and</ul></b>conserved genome structure (Figure <figr fid="F2">2C</figr>).</p>
         <fig id="F2">
            <title>
               <p>Figure 2</p>
            </title>
            <caption>
               <p>Genome structure visualization</p>
            </caption>
            <text>
               <p><b>Genome structure visualization. </b>Direct comparison of two genomes at a time demonstrating some examples of large-scale genomic rearrangements. Each protein is plotted by the genomic location of the coding region and is color-coded by the degree of similarity based on the BSR as is demonstrated in the legend. <b>A. </b>Comparison of <it>C. caviae </it>GPIC and <it>C. pneumoniae </it>AR39. This comparison contains two genomic rearrangements of different sizes as indicated by the arrows. <b>B. </b><it>C. caviae </it>GPIC and <it>C. muridarum </it>strain Nigg comparison reveals a more extensive genomic rearrangements suggesting that while proteomically these organisms are similar the genomes have diverged significantly. <b>C. </b><it>E. coli </it>CFT073 (GenBank Accession Number AE014075) vs. <it>E. coli </it>K12 (GenBank Accession Number U00096). <it>E. coli </it>CFT073 contains a number of unique insertions that are represented as breakpoints in the plot and highlighted with arrows. The high level of synteny and similarity are exhibited by these genomes.</p>
            </text>
            <graphic file="1471-2105-6-2-2"/>
         </fig>
         <p>The Gnuplot, PostScript and xfig outputs allow publication-quality, global visualization of the similarity and synteny of the selected genomes. However these formats do not allow the annotation associated with individual peptides to be viewed interactively. To overcome this limitation, additional XML files for the similarity and synteny plots described above are generated. These files are the input for the freely available GGobi software. GGobi is a data visualization system for viewing high-dimensional data <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. The tools provided in the GGobi software package allow the annotation associated with individual points within the similarity and synteny plots to be viewed interactively (Figure <figr fid="F3">3</figr>).</p>
         <fig id="F3">
            <title>
               <p>Figure 3</p>
            </title>
            <caption>
               <p>Visualization with GGobi</p>
            </caption>
            <text>
               <p><b>Visualization with GGobi. </b>GGobi screenshots of the graphical outputs from the BSR. The proteins for tryptophan synthase alpha and beta subunits are highlighted as they were unique in the <it>C. caviae </it>genome and represented a significant metabolic adaptation of this species in comparison to the other species compared [17]. <b>A. </b>The scatter plot represents the same figure as shown in Figure 1C, however the interactive nature of GGobi allows visualization of the annotation associated with any of the peptides. <b>B. </b>Synteny plots as seen in GGobi. These same genes from Figure 3A can be highlighted in the in the synteny plots and the genomic location can be observed. To take advantage of the usefulness of the interactive mouseover the BSR is included with the annotation.</p>
            </text>
            <graphic file="1471-2105-6-2-3"/>
         </fig>
         <p>The GGobi package also allows the expansion of the BSR approach to include more than three genomes or other additional parameters associated with proteomic or genomic data, enabling interactive, user-driven exploration of these complex datasets. The current BSR implementation uses three genomes as input; however, additional genomes can readily be added as new dimensions simply by repeating the analysis with the same Reference genome and varying the Query genomes. Additional non-BSR dimensions are readily included, such as pI or %GC, or factors such as surface localization or some other feature of the peptides of interest.</p>
      </sec>
      <sec>
         <st>
            <p>Results</p>
         </st>
         <p>Genome structure is often altered during the evolution of species <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. Visualization of this structure often lends insight into genome evolution and examination of the various BSR outputs rapidly reveal alterations of the genome structure as well as the overall similarity of the two Query proteomes to the Reference proteome. The genomes of the Order Chlamydiales (Figures <figr fid="F1">1</figr>, <figr fid="F2">2 A</figr> and <figr fid="F2">2B</figr>) provide an example of this insight. In Figure <figr fid="F1">1</figr>a large proportion of the peptides are conserved, with 71.7% of the proteins shared between all three proteomes. If the Query proteomes are further used as the Reference proteome and vice versa we still see a similar trend (data not shown). Additionally, the proteome of <it>C. pneumoniae </it>AR39 (GenBank Accession Number AE002161) is more similar to <it>C. caviae </it>GPIC (GenBank Accession Number AE015925) than <it>C. muridarum </it>strain Nigg (GenBank Accession Number AE002160) as 7.3 % of the proteome is shared between only <it>C. caviae </it>GPIC and <it>C. pneumoniae </it>AR39 compared to only 1.6% between <it>C. caviae </it>GPIC and <it>C. muridarum </it>strain Nigg. Finally, Figure <figr fid="F1">1</figr> demonstrates that 19.4% of the <it>C. caviae </it>proteome has no significant hit to any of the peptides in the Query proteomes, although many of these peptides (78.2%) are currently annotated as hypothetical.</p>
         <p>From the analysis in Figure <figr fid="F1">1</figr> we could conclude that the chlamydial proteomes are extremely similar and suggest that the genome structure will also be similar. However, the synteny plots in Figure <figr fid="F2">2A</figr> and <figr fid="F2">2B</figr> demonstrate that while the chlamydial proteomes exhibit a high degree of similarity, there is significant alteration in the genomic structure. The comparison of the proteomically similar organisms, <it>C. caviae </it>GPIC and <it>C. pneumoniae </it>AR39 reveals that the genomes contain two points of inversion (arrows in Figure <figr fid="F2">2A</figr>). One of these points of inversion is centered on the terminus of replication. There are more extensive genomic rearrangements between the <it>C. caviae </it>GPIC and <it>C. muridarum </it>strain Nigg genomes (Figure <figr fid="F2">2B</figr>). The additional color information extends the utility of these synteny plots. While the chlamydial genomes show regions of conserved synteny, as demonstrated by the peptides in the same genomic location forming a line with a slope of 1 or -1, the absolute degree of similarity between the peptides, demonstrated by color indicates divergence. By contrast the synteny plot of two <it>Escherichia coli </it>genomes (Figure <figr fid="F2">2C</figr>) demonstrates a high level of synteny with a number of unique insertions, however no inversions are present. Moreover the color dimension on this plot reveals that unlike the chlamydial proteome comparisons the <it>E. coli </it>proteomes have a high level of similarity <b><ul>and</ul></b>synteny.</p>
         <p>In the analysis of the Chlamydial proteomes using BSR score and BLAST E-values approximately 1% of peptides examined have a BSR score > 0.4 and BLAST E-value > 1 &#215; 10<sup>-15</sup>. These peptides were all very small in size (&lt; 70 amino acids) and greater than 50% amino acid identity. This group of peptides is more readily identified by BSR analysis than BLAST E-value, which is artificially low due to the small peptide size. Additionally, peptides that have a BSR score &lt; 0.4 but a BLAST E-value &lt; 1 &#215; 10<sup>-15 </sup>correspond 7.8% of the proteome. These represent divergent peptides with an artificially high BLAST E-value score resulting from limited regions of identity. The BSR analysis more accurately classifies these peptides based on the amino acid identity over the entire peptide. As the BSR comparison utilizes a single genome as a reference, the BSR score is calculated using a unidirectional best BLAST hit. However, when the Chlamydial proteomes were compared only one case in over 1000 could be found with a BSR score > 0.4 that was not also a bidirectional best BLAST hit.</p>
      </sec>
      <sec>
         <st>
            <p>Conclusions</p>
         </st>
         <p>The BSR approach allows rapid evaluation of the level of conservation of any three proteomes and the degree to which the genome structure between the three genomes is similar. While in this report we discuss the applications of this approach to whole genomes, the analysis has been performed on portions of genomes such as genomic or pathogenicity islands, plasmids and phage to identify peptide similarity and regional structure.</p>
         <p>More genome sequences are being generated from closely related organisms &#8211; a trend which shows no sign of abating. The BSR approach has become a crucial tool in our comparative genomics armamentarium and has been utilized in a number of genomic comparisons, revealing regions of similarity and difference between both closely and distantly related organisms <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>.</p>
      </sec>
      <sec>
         <st>
            <p>Availability and requirements</p>
         </st>
         <p><b>Project name: </b>BSR.pl</p>
         <p>
            <b>Project homepage: </b>
            <url>http://www.microbialgenomics.org/BSR/</url>
         </p>
         <p><b>Operating System: </b>Unix and MacOS X</p>
         <p><b>Programming language: </b>Perl</p>
         <p><b>Other requirements: </b>Perl Statistics::Descriptive module <url>http://search.cpan.orgdist/Statistics-Descriptive</url></p>
         <p><b>License: </b>None</p>
         <p><b>Any restrictions to use by non-academics: </b>None</p>
      </sec>
      <sec>
         <st>
            <p>List of abbreviations</p>
         </st>
         <p>BSR &#8211; BLAST score ratio; BLAST &#8211; basic local alignment search tool.</p>
      </sec>
      <sec>
         <st>
            <p>Authors' contributions</p>
         </st>
         <p>DAR, GSAM and JR conceived and implemented the first versions of BSR and prepared the manuscript. All authors have read and approved the final manuscript.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgments</p>
            </st>
            <p>The authors would like to thank Timothy D. Read for initial consultation on development of the BSR algorithm. JR and DAR are supported by Federal funds from the National Institute of Allergy and Infectious Disease, National Institutes of Health, under Contract No. N01-AI15447. GSAM is supported by NIAID R01 AI051472.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Whole-genome random sequencing and assembly of <it>Haemophilus influenzae </it>Rd</p>
            </title>
            <aug>
               <au>
                  <snm>Fleischmann</snm>
                  <fnm>RD</fnm>
               </au>
               <au>
                  <snm>Adams</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Clayton</snm>
                  <fnm>RA</fnm>
               </au>
               <au>
                  <snm>Kirkness</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Kerlavage</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Bult</snm>
                  <fnm>CJ</fnm>
               </au>
               <au>
                  <snm>Tomb</snm>
                  <fnm>J-F</fnm>
               </au>
               <au>
                  <snm>Dougherty</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Merrick</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>McKenny</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Sutton</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Fitzhugh</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Fields</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Gocayne</snm>
                  <fnm>JD</fnm>
               </au>
               <au>
                  <snm>Scott</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Shirley</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>L-I</fnm>
               </au>
               <au>
                  <snm>Glodek</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kelley</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Wiedman</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Phillips</snm>
                  <fnm>CA</fnm>
               </au>
               <au>
                  <snm>Spriggs</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Hedblom</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cotton</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Utterback</snm>
                  <fnm>TR</fnm>
               </au>
               <au>
                  <snm>Hanna</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Nquyen</snm>
                  <fnm>DT</fnm>
               </au>
               <au>
                  <snm>Saudek</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Brandon</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Fine</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Fritchman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Fuhrman</snm>
                  <fnm>JL</fnm>
               </au>
               <au>
                  <snm>Geoghagen</snm>
                  <fnm>NSM</fnm>
               </au>
               <au>
                  <snm>Gnehm</snm>
                  <fnm>CL</fnm>
               </au>
               <au>
                  <snm>McDonald</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Small</snm>
                  <fnm>KV</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>HO</fnm>
               </au>
               <au>
                  <snm>Venter</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>269</volume>
            <fpage>496</fpage>
            <lpage>512</lpage>
            <xrefbib>
               <pubid idtype="pmpid">7542800</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>The National Center for Biotechnology Information &#8211; Genomes</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/Genomes/index.html</url>
         </bibl>
         <bibl id="B3">
            <title>
               <p>BLAST</p>
            </title>
            <aug>
               <au>
                  <snm>Korf</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Yandell</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bedell</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Sebastopol: O'Reilly &amp; Associates, Inc</source>
            <pubdate>2003</pubdate>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Basic local alignment search tool</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Gish</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>EW</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Journal of Molecular Biology</source>
            <pubdate>1990</pubdate>
            <volume>215</volume>
            <fpage>403</fpage>
            <lpage>410</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1990.9999</pubid>
                  <pubid idtype="pmpid" link="fulltext">2231712</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>The Wellcome Trust Sanger Institute</p>
            </title>
            <url>http://www.sanger.ac.uk/Software/ACT/</url>
         </bibl>
         <bibl id="B6">
            <title>
               <p>GenomeComp: a visualization tool for microbial genome comparison</p>
            </title>
            <aug>
               <au>
                  <snm>Yang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>ZJ</fnm>
               </au>
               <au>
                  <snm>Jin</snm>
                  <fnm>Q</fnm>
               </au>
               <au>
                  <snm>Shen</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>J Microbiol Methods</source>
            <pubdate>2003</pubdate>
            <volume>54</volume>
            <issue>3</issue>
            <fpage>423</fpage>
            <lpage>426</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0167-7012(03)00094-0</pubid>
                  <pubid idtype="pmpid" link="fulltext">12842490</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>The National Center for Biotechnology Information &#8211; Taxplot</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/sutils/taxik2.cgi?isbact=1</url>
         </bibl>
         <bibl id="B8">
            <title>
               <p>SimiTri &#8211; visualizing similarity relationships for groups of sequences</p>
            </title>
            <aug>
               <au>
                  <snm>Parkinson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Blaxter</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2003</pubdate>
            <volume>19</volume>
            <issue>3</issue>
            <fpage>390</fpage>
            <lpage>395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/btf870</pubid>
                  <pubid idtype="pmpid" link="fulltext">12584125</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
            </title>
            <aug>
               <au>
                  <snm>Altschul</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Madden</snm>
                  <fnm>TL</fnm>
               </au>
               <au>
                  <snm>Schaffer</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Lipman</snm>
                  <fnm>DJ</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>1997</pubdate>
            <volume>25</volume>
            <fpage>3389</fpage>
            <lpage>3402</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">146917</pubid>
                  <pubid idtype="pmpid" link="fulltext">9254694</pubid>
                  <pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>The genome sequence of <it>Bacillus cereus </it>ATCC 10987 reveals metabolic adaptations and a large plasmid related to <it>Bacillus anthracis </it>pXO1</p>
            </title>
            <aug>
               <au>
                  <snm>Rasko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Okstad</snm>
                  <fnm>OA</fnm>
               </au>
               <au>
                  <snm>Helgason</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Cer</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Shores</snm>
                  <fnm>KA</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Tourasse</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Angiuoli</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Kolonay</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Kolsto</snm>
                  <fnm>AB</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>3</issue>
            <fpage>977</fpage>
            <lpage>988</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">373394</pubid>
                  <pubid idtype="pmpid" link="fulltext">14960714</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh258</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Gnuplot</p>
            </title>
            <url>http://www.gnuplot.org</url>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Xfig</p>
            </title>
            <url>http://www.xfig.org/</url>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Evidence for symmetric chromosomal inversions around the replication origin in bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Eisen</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2000</pubdate>
            <volume>1</volume>
            <issue>6</issue>
         </bibl>
         <bibl id="B14">
            <title>
               <p>GGobi Data Visualization System</p>
            </title>
            <url>http://www.ggobi.org</url>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Identification of anthrax toxin genes in a <it>Bacillus cereus </it>associated with an illness resembling inhalation anthrax</p>
            </title>
            <aug>
               <au>
                  <snm>Hoffmaster</snm>
                  <fnm>AR</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Rasko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Chapman</snm>
                  <fnm>GD</fnm>
               </au>
               <au>
                  <snm>Chute</snm>
                  <fnm>MD</fnm>
               </au>
               <au>
                  <snm>Marston</snm>
                  <fnm>CK</fnm>
               </au>
               <au>
                  <snm>De</snm>
                  <fnm>BK</fnm>
               </au>
               <au>
                  <snm>Sacchi</snm>
                  <fnm>CT</fnm>
               </au>
               <au>
                  <snm>Fitzgerald</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Mayer</snm>
                  <fnm>LW</fnm>
               </au>
               <au>
                  <snm>Maiden</snm>
                  <fnm>MC</fnm>
               </au>
               <au>
                  <snm>Priest</snm>
                  <fnm>FG</fnm>
               </au>
               <au>
                  <snm>Barker</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Jiang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Cer</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Rilstone</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>SN</fnm>
               </au>
               <au>
                  <snm>Weyant</snm>
                  <fnm>RS</fnm>
               </au>
               <au>
                  <snm>Galloway</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Popovic</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci U S A</source>
            <pubdate>2004</pubdate>
            <volume>101</volume>
            <issue>22</issue>
            <fpage>8449</fpage>
            <lpage>8454</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">420414</pubid>
                  <pubid idtype="pmpid" link="fulltext">15155910</pubid>
                  <pubid idtype="doi">10.1073/pnas.0402414101</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>Whole genome comparisons of serotype 4b and 1/2a strains of the food-borne pathogen <it>Listeria monocytogenes </it>reveal new insights into the core genome components of this species</p>
            </title>
            <aug>
               <au>
                  <snm>Nelson</snm>
                  <fnm>KE</fnm>
               </au>
               <au>
                  <snm>Fouts</snm>
                  <fnm>DE</fnm>
               </au>
               <au>
                  <snm>Mongodin</snm>
                  <fnm>EF</fnm>
               </au>
               <au>
                  <snm>Ravel</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>DeBoy</snm>
                  <fnm>RT</fnm>
               </au>
               <au>
                  <snm>Kolonay</snm>
                  <fnm>JF</fnm>
               </au>
               <au>
                  <snm>Rasko</snm>
                  <fnm>DA</fnm>
               </au>
               <au>
                  <snm>Angiuoli</snm>
                  <fnm>SV</fnm>
               </au>
               <au>
                  <snm>Gill</snm>
                  <fnm>SR</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Nierman</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Beanan</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>Brinkac</snm>
                  <fnm>LM</fnm>
               </au>
               <au>
                  <snm>Daugherty</snm>
                  <fnm>SC</fnm>
               </au>
               <au>
                  <snm>Dodson</snm>
                  <fnm>RJ</fnm>
               </au>
               <au>
                  <snm>Durkin</snm>
                  <fnm>AS</fnm>
               </au>
               <au>
                  <snm>Madupu</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Selengut</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Van Aken</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Khouri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Fedorova</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Forberger</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Tran</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kathariou</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Wonderling</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Uhlich</snm>
                  <fnm>GA</fnm>
               </au>
               <au>
                  <snm>Bayles</snm>
                  <fnm>DO</fnm>
               </au>
               <au>
                  <snm>Luchansky</snm>
                  <fnm>JB</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <issue>8</issue>
            <fpage>2386</fpage>
            <lpage>2395</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">419451</pubid>
                  <pubid idtype="pmpid" link="fulltext">15115801</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh562</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Genome sequence of <it>Chlamydophila caviae </it>(<it>Chlamydia psittaci </it>GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae</p>
            </title>
            <aug>
               <au>
                  <snm>Read</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Myers</snm>
                  <fnm>GS</fnm>
               </au>
               <au>
                  <snm>Brunham</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Nelson</snm>
                  <fnm>WC</fnm>
               </au>
               <au>
                  <snm>Paulsen</snm>
                  <fnm>IT</fnm>
               </au>
               <au>
                  <snm>Heidelberg</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Holtzapple</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Khouri</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Federova</snm>
                  <fnm>NB</fnm>
               </au>
               <au>
                  <snm>Carty</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Umayam</snm>
                  <fnm>LA</fnm>
               </au>
               <au>
                  <snm>Haft</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Peterson</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Beanan</snm>
                  <fnm>MJ</fnm>
               </au>
               <au>
                  <snm>White</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Salzberg</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Hsia</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>McClarty</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Rank</snm>
                  <fnm>RG</fnm>
               </au>
               <au>
                  <snm>Bavoil</snm>
                  <fnm>PM</fnm>
               </au>
               <au>
                  <snm>Fraser</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2003</pubdate>
            <volume>31</volume>
            <issue>8</issue>
            <fpage>2134</fpage>
            <lpage>2147</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">153749</pubid>
                  <pubid idtype="pmpid" link="fulltext">12682364</pubid>
                  <pubid idtype="doi">10.1093/nar/gkg321</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
