<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-7-S4-S18</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>GenomeBlast: a web tool for small genome comparison</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Lu</snm>
					<fnm>Guoqing</fnm>
					<insr iid="I1"/>
					<insr iid="I5"/>
					<email>glu3@mail.unomaha.edu</email>
				</au>
				<au id="A2">
					<snm>Jiang</snm>
					<fnm>Liying</fnm>
					<insr iid="I2"/>
					<email>ljiang@cse.unl.edu</email>
				</au>
				<au id="A3">
					<snm>Helikar</snm>
					<mi>MK</mi>
					<fnm>Resa</fnm>
					<insr iid="I3"/>
					<email>rkotalik@mail.unomaha.edu</email>
				</au>
				<au id="A4">
					<snm>Rowley</snm>
					<mi>W</mi>
					<fnm>Thaine</fnm>
					<insr iid="I3"/>
					<email>trowley@mail.unomaha.edu</email>
				</au>
				<au id="A5">
					<snm>Zhang</snm>
					<fnm>Luwen</fnm>
					<insr iid="I4"/>
					<insr iid="I5"/>
					<email>lzhang2@unlnotes.unl.edu</email>
				</au>
				<au id="A6">
					<snm>Chen</snm>
					<fnm>Xianfeng</fnm>
					<insr iid="I6"/>
					<email>xchen@vbi.vt.edu</email>
				</au>
				<au id="A7">
					<snm>Moriyama</snm>
					<mi>N</mi>
					<fnm>Etsuko</fnm>
					<insr iid="I4"/>
					<insr iid="I7"/>
					<email>emoriyama2@unlnotes.unl.edu</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>Department of Biology, University of Nebraska at Omaha, Omaha, NE 68182, USA</p>
				</ins>
				<ins id="I2">
					<p>Department of Computer Science, University of Nebraska-Lincoln, Lincoln, NE 68588, USA</p>
				</ins>
				<ins id="I3">
					<p>Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA</p>
				</ins>
				<ins id="I4">
					<p>School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588, USA</p>
				</ins>
				<ins id="I5">
					<p>Nebraska Center for Virology, University of Nebraska-Lincoln, Lincoln, NE 68588, USA</p>
				</ins>
				<ins id="I6">
					<p>Virginia Bioinformatics Institute, Virginia Tech Blacksburg, VA 24061, USA</p>
				</ins>
				<ins id="I7">
					<p>Plant Science Initiative, University of Nebraska-Lincoln, Lincoln, NE 68588, USA</p>
				</ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<supplement>
				<title>
					<p>Symposium of Computations in Bioinformatics and Bioscience (SCBB06)</p>
				</title>
				<editor>Youping Deng, Jun Ni</editor>
				<note>Research</note>
				<url>http://www.biomedcentral.com/content/pdf/1471-2105-7-S4-info.pdf</url>
			</supplement>
			<conference>
				<title>
					<p>Symposium of Computations in Bioinformatics and Bioscience (SCBB06) in conjunction with the International Multi-Symposiums on Computer and Computational Sciences 2006 (IMSCCS|06)</p>
				</title>
				<location>Hangzhou, China</location>
				<date-range>June 20&#8211;24, 2006</date-range>
				<url>http://mfgn.usm.edu/ebl/SCBB06</url>
			</conference>
			<issn>1471-2105</issn>
			<pubdate>2006</pubdate>
			<volume>7</volume>
			<issue>Suppl 4</issue>
			<fpage>S18</fpage>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">17217510</pubid><pubid idtype="doi">10.1186/1471-2105-7-S4-S18</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<pub>
				<date>
					<day>12</day>
					<month>12</month>
					<year>2006</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2006</year>
			<collab>Lu et al; licensee BioMed Central Ltd</collab>
			<note>This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>Comparative genomics has become an essential approach for identifying homologous gene candidates and their functions, and for studying genome evolution. There are many tools available for genome comparisons. Unfortunately, most of them are not applicable for the identification of unique genes and the inference of phylogenetic relationships in a given set of genomes.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>GenomeBlast is a Web tool developed for comparative analysis of multiple small genomes. A new parameter called "coverage" was introduced and used along with sequence identity to evaluate global similarity between genes. With GenomeBlast, the following results can be obtained: (1) unique genes in each genome; (2) homologous gene candidates among compared genomes; (3) 2D plots of homologous gene candidates along the all pairwise genome comparisons; and (4) a table of gene presence/absence information and a genome phylogeny. We demonstrated the functions in GenomeBlast with an example of multiple herpesviral genome analysis and illustrated how GenomeBlast is useful for small genome comparison.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusion</p>
					</st>
					<p>We developed a Web tool for comparative analysis of small genomes, which allows the user not only to identify unique genes and homologous gene candidates among multiple genomes, but also to view their graphical distributions on genomes, and to reconstruct genome phylogeny. GenomeBlast runs on a Linux server with 4 CPUs and 4 GB memory. The online version of GenomeBlast is available to public by using a Web browser with the URL <url>http://bioinfo-srv1.awh.unomaha.edu/genomeblast/</url>.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>With the rapidly increasing availability of complete genome sequences, genome-wide sequence comparison has become an essential approach for finding homologous gene candidates, for identifying gene functions, and for studying genome evolution <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr></abbrgrp>. Genome comparison can be used to find genes that characterize unique features in a given organism such as specific phenotypic variation or particular pathogenicity <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. Meanwhile, genome phylogenies based on gene content or gene order shed new light on the construction of the Tree of Life <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>.</p>
			<p>Currently many tools such as MUMmer and Artemis are available for comparative genomic analysis <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr><abbr bid="B8">8</abbr></abbrgrp>. These tools can be used for pairwise genome alignment (e.g., <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>) as well as multiple genome alignment e.g., <abbrgrp><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr></abbrgrp>). Unfortunately, most of them are not applicable for the identification of unique genes in a given set of genomes, since the tools were developed for homologous gene detection in most cases. Additionally, only a few tools can be used for the study of phylogeny from the genomic point of view <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>.</p>
			<p>The BLAST (Basic Local Alignment Search Tool) algorithm as well as other anchor-based algorithms are commonly used for the identification of homologous gene candidates across diverse genomes <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B14">14</abbr></abbrgrp>. Although the BLAST algorithm has its pros such as fast computation and accurate results in detecting local highly-similar sequences regions, it sustains two cons when used to identify global sequence similarity: (1) genes that reside in local highly-similar regions can be erroneously identified as homologue candidates; and (2) multiple local hits that happen against the same subjective sequence need to be combined to obtain the overall aligned region between the query and subject sequences.</p>
			<p>In order to solve these problems, we developed a Web tool, GenomeBlast. It performs multiple genome comparisons, identifies unique genes as well as shared (possibly homologous) genes among the genomes, and reconstructs the genome phylogeny. Identification of homologous gene candidates is done by detecting global sequence similarity using alignment coverage information. This paper describes its architecture, algorithms, and implementation. We demonstrate the practical use of GenomeBlast with an example using herpesviral genomes, and discuss its future improvement plan.</p>
		</sec>
		<sec>
			<st>
				<p>Implementation</p>
			</st>
			<sec>
				<st>
					<p>Architecture</p>
				</st>
				<p>The architecture of GenomeBlast is illustrated in Figure <figr fid="F1">1</figr>. In addition to input and output modules, it consists of sequence extraction, database formatting, sequence comparison, output filtering, and visual presentation of results.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>The architecture of GenomeBlast</p>
					</caption>
					<text>
						<p><b>The architecture of GenomeBlast</b>. GenomeBlast consists of sequence extraction, database formatting, sequence comparison, output filtering, and visual presentation of results. The inputs to GenomeBlast are genome sequences in the GenBank format, each in a single file. The outputs include three-level results: 1) putative unique genes and homologous genes; 2) 2D plots of homologous gene candidates for pairwise genome comparisons; 3) a table of gene presence/absence information, genome phylogeny, and a summary table for multiple genome comparison.</p>
					</text>
					<graphic file="1471-2105-7-S4-S18-1"/>
				</fig>
				<p>The inputs to GenomeBlast are genome sequences in the GenBank format, each in a single file. Each genome sequence record needs to include the FEATURE table with coding sequence (CDS) annotations. Such data can be downloaded from public databases such as the National Center for Biotechnology Information (NCBI)<abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. Protein sequences are extracted from translation records in the CDS annotations. The formatdb program is used to generate protein database files from the protein dataset for each genome. These protein database files can be used with the blastp program. The all-against-all blasting strategy is used for genome comparison. Each of the protein sequences from one genome is compared against protein sequences from all other genomes. The BLAST results are then filtered and presented in various outputs.</p>
				<p>Three-level outputs generated from GenomeBlast include: (1) candidates for unique genes and homologous genes; (2) 2D plots of homologous gene candidates for pairwise genome comparisons; (3) a table of gene presence/absence information; (4) genome phylogeny; and (5) a summary table for multiple genome comparison.</p>
			</sec>
			<sec>
				<st>
					<p>Algorithm</p>
				</st>
				<sec>
					<st>
						<p>Coverage calculation</p>
					</st>
					<p>We used the blastp algorithm for protein sequence comparison. Since the BLAST search may result in identifying only short local similarities (short local similarities can be obtained from any conserved domains/regions even if the sequences are not derived from homologous genes) or in identifying multiple short similarities from the same CDS (Figure <figr fid="F2">2</figr>), we introduced a parameter called "coverage" to detect gene-wide sequence similarity. The percent alignment coverage (<it>c</it>) is calculated using the following equation:</p>
					<fig id="F2">
						<title>
							<p>Figure 2</p>
						</title>
						<caption>
							<p>A possible output generated by the blast program</p>
						</caption>
						<text>
							<p><b>A possible output generated by the blast program</b>. The blast program may find two or more highly similar regions of the same subject sequence, which need to be combined before we can evaluate global sequence similarity between the query and the subject sequence.</p>
						</text>
						<graphic file="1471-2105-7-S4-S18-2"/>
					</fig>
					<p>
						<graphic file="1471-2105-7-S4-S18-i1.gif"/>
					</p>
					<p>where <it>L</it><sub><it>i</it></sub>, <it>L</it><sub><it>i</it>,<it>j</it></sub>, and <it>L</it><sub><it>query </it></sub>represent the alignment length for the <it>i</it><sup>th </sup>hit, the overlap length between the hits <it>i </it>and <it>j</it>, and the query length, respectively; and <it>k </it>is the total number of hits to the same subject sequence for a given query sequence.</p>
				</sec>
				<sec>
					<st>
						<p>Identification of homologous gene candidates</p>
					</st>
					<p>In order to identify homologous gene candidates and to exclude related genes that share similarities only with limited regions, GenomeBlast can use a combination of following thresholds:</p>
					<p>i) Coverage. The coverage is the length of aligned regions calculated as above. The default threshold is 50%.</p>
					<p>ii) Identity. The identity is the proportion (%) of identical amino acid pairs in the aligned region. The default threshold is 30%.</p>
					<p>iii) E-value. The E-value, expectation value, is the number of different alignments with scores equivalent to or better than the scores that are expected to occur in a database search by chance. The default threshold is 10. In the default setting, GenomeBlast uses only the coverage and identity, but not the E-value threshold.</p>
				</sec>
				<sec>
					<st>
						<p>Genome phylogeny reconstruction</p>
					</st>
					<p>Based on the results of multiple genome comparison, the presence and absence of each CDS is tabulated with 1s (for presence) and 0s (for absence) for each genome. Using this binary character matrix, the maximum parsimony method <abbrgrp><abbr bid="B16">16</abbr></abbrgrp> with the branch-and-bound tree search algorithm is used to infer genome phylogeny. The branch-and-bound algorithm effectively searches the possible tree topologies and guarantees finding the most parsimonious phylogeny <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>.</p>
				</sec>
			</sec>
			<sec>
				<st>
					<p>Backend programs and the Web server</p>
				</st>
				<p>The blastp program in the BLAST stand-alone package <url>ftp://ftp.ncbi.nih.gov/blast/</url> was used for protein sequence comparison. The PENNY program of the PHYLIP package implements the maximum parsimony phylogenetic method using the branch-and-bound tree search algorithm and a binary character data matrix <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. The data processing/analysis and integration of the blastp and PENNY programs into GenomeBlast were implemented with the PERL programming language. The Web applications were developed using PHP. GenomeBlast runs on a Linux server, which has four processors (2.0 GHz each), 4 GB memory, and 400 GB disk space.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<p>We will use thirteen herpesviral genomes described in <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> as an example, and go through GenomeBlast step by step to demonstrate its functions (Figure <figr fid="F1">1</figr>).</p>
			<p>The first step is to set up blastp options. We did not choose the filter option to mask off low compositional complexity or mask for the lookup table. We used the default values provided in GenomeBlast (E-value: 10, word size: 3, gap existence cost: 11, gap extension cost: 1, and scoring matrix: BLOSUM62).</p>
			<p>The next step is to upload genome sequence files. We set up the number of genomes to compare as 13 and clicked the OK button. We then uploaded the 13 herpesviral genome sequence files, which were originally downloaded from NCBI in the GenBank format. The average size of these genomes was approximately 150 kb. Formatting databases and performing all-against-all blastp comparison took 5 minutes 16 seconds on our server.</p>
			<p>The third step is to set up parameters for gene comparisons. We used the default threshold values, i.e., 50% coverage and 30% identity for determining homologous CDS. The last step is to view genome comparison results at three different levels, i.e., single-genome, pairwise-genome, and multiple-genome levels. We chose two alpha viruses, EBV and EHV2, to show functions available for the single-genome level analysis. Note that any number of genome combinations can be used for unique gene or homologous gene candidate identification. A total of 45 and 38 unique gene candidates were found respectively in EBV and EHV (Figure <figr fid="F3">3</figr>), whereas 82 homologous CDS candidates were identified between these two genomes (Figure <figr fid="F4">4</figr>).</p>
			<fig id="F3">
				<title>
					<p>Figure 3</p>
				</title>
				<caption>
					<p>Output window of putative unique genes</p>
				</caption>
				<text>
					<p><b>Output window of putative unique genes</b>. Two alpha herpesviruses, EBV and EHV2, were selected for comparison. A total of 45 and 38 unique CDS candidates were found in EBV and EHV2, respectively.</p>
				</text>
				<graphic file="1471-2105-7-S4-S18-3"/>
			</fig>
			<fig id="F4">
				<title>
					<p>Figure 4</p>
				</title>
				<caption>
					<p>Output window of putative homologous genes</p>
				</caption>
				<text>
					<p><b>Output window of putative homologous genes</b>. EBV and EHV2 were selected for comparison. 82 homologous CDS candidates were identified between them.</p>
				</text>
				<graphic file="1471-2105-7-S4-S18-4"/>
			</fig>
			<p>For the pairwise-genome comparisons, any two genomes can be chosen and a 2D plot of distribution of homologous gene candidates is generated. We clicked the hyperlink EBV.gb-EHV2.gb (alternatively, we can choose from the drop-down menu) and a 2D plot was displayed in a new window as shown in Figure <figr fid="F5">5</figr>. Interestingly, the plot suggests that genomic inversion might have occurred between these two viruses. Clicking each dot in the plot, we can see its corresponding information including the query name, subject name, and % identity. Of the 82 homologous CDS candidates, only two proteins were found to have sequence identities higher than 80% (colored in red), 20 proteins had identities between 50% and 80% (colored in pink), and the rest had identities between 30% and 50% (colored in yellow).</p>
			<fig id="F5">
				<title>
					<p>Figure 5</p>
				</title>
				<caption>
					<p>A 2D plot of homologous gene candidates in genomes</p>
				</caption>
				<text>
					<p><b>A 2D plot of homologous gene candidates in genomes</b>. EBV and EHV2 were selected for comparison. The plot shows the distribution of homologous CDS on EBV and EHV2 genomes. The threshold values used for homologous CDS identification and the color scheme for identity representation are illustrated.</p>
				</text>
				<graphic file="1471-2105-7-S4-S18-5"/>
			</fig>
			<p>At the multiple-genome level, we can obtain the binary gene presence/absence table (not shown) and the genome phylogeny as shown in Figure <figr fid="F6">6A</figr>. The phylogeny indicates that there are three virus groups, which is more clearly shown in the phylogeny redrawn with the TreeView program <abbrgrp><abbr bid="B19">19</abbr></abbrgrp> (Figure <figr fid="F6">6B</figr>). This result showing three groups of herpesviruses is consistent with previous reports <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B4">4</abbr></abbrgrp>.</p>
			<fig id="F6">
				<title>
					<p>Figure 6</p>
				</title>
				<caption>
					<p>Genome phylogeny among the herpes viruses</p>
				</caption>
				<text>
					<p><b>Genome phylogeny among the herpes viruses</b>. The 13 herpesviral genomes described in [1, 4] were used for phylogeny inference. Panel A was generated from GenomeBlast, whereas Panel B was produced with the TreeView program using the same tree file from GenomeBlast.</p>
				</text>
				<graphic file="1471-2105-7-S4-S18-6"/>
			</fig>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st>
			<p>GenomeBlast has several unique features compared with other comparative genomics tools <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr><abbr bid="B11">11</abbr><abbr bid="B12">12</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr></abbrgrp>. Instead of focusing on generating alignments, GenomeBlast identifies unique and shared, possibly homologous, CDS sets among multiple genomes and presents the information in a summary table. It generates 2D plots depicting the distribution of homologous CDS between given pairs of genomes. In order to identify possible homologous CDS, GenomeBlast uses the blastp sequence similarity search program. Combining the length of alignment coverage with % identity of the aligned region, it evaluates gene-wide similarity. This combination of coverage and identity can better identify homologous CDS candidates. GenomeBlast also provides flexibility in choosing different combinations of parameters and their threshold values. Once the blast search is done, there is no need for redoing the blast search and the user can return to the parameter-setting page to reset thresholds for identifying homologous gene candidates.</p>
			<p>GenomeBlast reconstructs genome phylogeny based on gene content using the maximum parsimony method. In this context, GenomeBlast overlap with the Web server, SHOT <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. SHOT also includes a gene-order phylogeny method. Whereas SHOT can be used for only a certain set of genomes, GenomeBlast offers more flexibility.</p>
			<p>Montague and Hutchison <abbrgrp><abbr bid="B4">4</abbr></abbrgrp> reconstructed whole-genome phylogenies for 13 herpesviral genomes based on the Clusters of Orthologous Groups (COGs) data <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>. They used several computer programs/packages before reconstructing the genome phylogenies including the Wisconsin Package (GCG) <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, BLAST programs, and PAUP (Phylogenetic Analysis Using Parsimony) <abbrgrp><abbr bid="B24">24</abbr></abbrgrp>. We performed the same analysis using GenomeBlast alone and our genome phylogeny agreed with their result <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. It demonstrates that GenomeBlast is a very useful application for small genome comparison. Our plan to extend functions in GenomeBlast includes automatic CDS extraction/translation, use of FASTA sequence format, DNA-level analysis using blastn, and gene-order based genome phylogeny.</p>
			<p>GenomeBlast is suitable for small genome comparison. We do not expect it to compare large genomes, such as human and mouse genomes, because such computation with large genomes is extremely expensive, which will take several days or even weeks to complete. For larger genomes, standalone programs such as MUMmer and Artemis can be used. Or for the model organisms, some homologous gene databases such as HomoloGene <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> and Inparanoid are available for use <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusion</p>
			</st>
			<p>We have developed a Web tool for comparative analysis of small genomes. With GenomeBlast, we can identify unique genes and homologous gene candidates among multiple genomes, view their graphical distributions on genomes, and reconstruct genome phylogeny. An example with 13 herpesviral genomes demonstrated that GenomeBlast is a useful tool for genome comparison.</p>
		</sec>
		<sec>
			<st>
				<p>Availability and requirements</p>
			</st>
			<p>&#8226; Project name: GenomeBlast project</p>
			<p>&#8226; Project home page: <url>http://bioinfo-srv1.awh.unomaha.edu/genomeblast/index.php</url></p>
			<p>&#8226; Operating system(s): Linux</p>
			<p>&#8226; Programming language: PERL and PHP</p>
			<p>&#8226; Other requirements: Any standard Web browsers (e.g., Microsoft Internet Explorer 6.0 or later)</p>
			<p>&#8226; Any restrictions to use by non-academics: yes, contact the author GL for details</p>
		</sec>
		<sec>
			<st>
				<p>Authors' contributions</p>
			</st>
			<p>GL conceived of the study, participated in its design and coordination, and drafted the manuscript. LJ participated in the design and implementation. RMK participated in the testing and helped to develop the Web site. TWR participated in the implementation and testing. LZ conceived of the study and helped to draft the manuscript. CZ carried out the software testing and helped to draft the manuscript. EM conceived of the study, participated in its design and coordination, and drafted the manuscript. All authors read and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>This work was funded by the University of Nebraska &#8211; Lincoln Biomedical Research Enhancement Funds. G.L. acknowledges the Pre-tenure Award from University of Nebraska at Omaha.</p>
				<p>This article has been published as part of <it>BMC Bioinformatics </it>Volume 7, Supplement 4, 2006: Symposium of Computations in Bioinformatics and Bioscience (SCBB06). The full contents of the supplement are available online at <url>http://www.biomedcentral.com/1471-2105/7?issue=S4</url>.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>Genomewide function conservation and phylogeny in the Herpesviridae</p>
				</title>
				<aug>
					<au>
						<snm>Alba</snm>
						<fnm>MM</fnm>
					</au>
					<au>
						<snm>Das</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Orengo</snm>
						<fnm>CA</fnm>
					</au>
					<au>
						<snm>Kellam</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2001</pubdate>
				<volume>11</volume>
				<issue>1</issue>
				<fpage>43</fpage>
				<lpage>54</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">311046</pubid>
						<pubid idtype="pmpid" link="fulltext">11156614</pubid>
						<pubid idtype="doi">10.1101/gr.149801</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Versatile and open software for comparing large genomes</p>
				</title>
				<aug>
					<au>
						<snm>Kurtz</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Phillippy</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Smoot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Shumway</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Antonescu</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<issue>2</issue>
				<fpage>R12</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">395750</pubid>
						<pubid idtype="pmpid" link="fulltext">14759262</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Web-based visualization tools for bacterial genome alignments</p>
				</title>
				<aug>
					<au>
						<snm>Florea</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Riemer</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Schwartz</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Stojanovic</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>McClelland</snm>
						<fnm>M</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2000</pubdate>
				<volume>28</volume>
				<issue>18</issue>
				<fpage>3486</fpage>
				<lpage>3496</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">110741</pubid>
						<pubid idtype="pmpid" link="fulltext">10982867</pubid>
						<pubid idtype="doi">10.1093/nar/28.18.3486</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p>Gene content phylogeny of herpesviruses</p>
				</title>
				<aug>
					<au>
						<snm>Montague</snm>
						<fnm>MG</fnm>
					</au>
					<au>
						<snm>Hutchison</snm>
						<fnm>CA</fnm>
						<suf>3rd</suf>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>2000</pubdate>
				<volume>97</volume>
				<issue>10</issue>
				<fpage>5334</fpage>
				<lpage>5339</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">25829</pubid>
						<pubid idtype="pmpid" link="fulltext">10805793</pubid>
						<pubid idtype="doi">10.1073/pnas.97.10.5334</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Phylogeny determined by protein domain content</p>
				</title>
				<aug>
					<au>
						<snm>Yang</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Doolittle</snm>
						<fnm>RF</fnm>
					</au>
					<au>
						<snm>Bourne</snm>
						<fnm>PE</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci U S A</source>
				<pubdate>2005</pubdate>
				<volume>102</volume>
				<issue>2</issue>
				<fpage>373</fpage>
				<lpage>378</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">540256</pubid>
						<pubid idtype="pmpid" link="fulltext">15630082</pubid>
						<pubid idtype="doi">10.1073/pnas.0408810102</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p>The MUMmer package</p>
				</title>
				<url>http://www.tigr.org.</url>
			</bibl>
			<bibl id="B7">
				<title>
					<p>The Artemis software</p>
				</title>
				<url>http://www.sanger.ac.uk/Software/Artemis/.</url>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Viewing and annotating sequence data with Artemis</p>
				</title>
				<aug>
					<au>
						<snm>Berriman</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rutherford</snm>
						<fnm>K</fnm>
					</au>
				</aug>
				<source>Brief Bioinform</source>
				<pubdate>2003</pubdate>
				<volume>4</volume>
				<issue>2</issue>
				<fpage>124</fpage>
				<lpage>132</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bib/4.2.124</pubid>
						<pubid idtype="pmpid" link="fulltext">12846394</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p>ACGT-a comparative genomics tool</p>
				</title>
				<aug>
					<au>
						<snm>Xie</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Hood</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2003</pubdate>
				<volume>19</volume>
				<issue>8</issue>
				<fpage>1039</fpage>
				<lpage>1040</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/btg121</pubid>
						<pubid idtype="pmpid" link="fulltext">12761070</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>VISTA: computational tools for comparative genomics</p>
				</title>
				<aug>
					<au>
						<snm>Frazer</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Pachter</snm>
						<fnm>L</fnm>
					</au>
					<au>
						<snm>Poliakov</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Rubin</snm>
						<fnm>EM</fnm>
					</au>
					<au>
						<snm>Dubchak</snm>
						<fnm>I</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2004</pubdate>
				<volume>32</volume>
				<issue>Web Server</issue>
				<fpage>W273</fpage>
				<lpage>279</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">441596</pubid>
						<pubid idtype="pmpid" link="fulltext">15215394</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B11">
				<title>
					<p>ACT: the Artemis Comparison Tool</p>
				</title>
				<aug>
					<au>
						<snm>Carver</snm>
						<fnm>TJ</fnm>
					</au>
					<au>
						<snm>Rutherford</snm>
						<fnm>KM</fnm>
					</au>
					<au>
						<snm>Berriman</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Rajandream</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Barrell</snm>
						<fnm>BG</fnm>
					</au>
					<au>
						<snm>Parkhill</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<source>Bioinformatics</source>
				<pubdate>2005</pubdate>
				<volume>21</volume>
				<issue>16</issue>
				<fpage>3422</fpage>
				<lpage>3423</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/bioinformatics/bti553</pubid>
						<pubid idtype="pmpid" link="fulltext">15976072</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>M-GCAT: Multiple Genome Comparison and Alignment Tool</p>
				</title>
				<aug>
					<au>
						<snm>Treangen</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Messeguer</snm>
						<fnm>X</fnm>
					</au>
				</aug>
				<source>5th Annual Spanish Bioinformatics Conference (JBI 2004)</source>
				<pubdate>2004</pubdate>
				<fpage>30</fpage>
				<lpage>33</lpage>
			</bibl>
			<bibl id="B13">
				<title>
					<p>SHOT: a web server for the construction of genome phylogenies</p>
				</title>
				<aug>
					<au>
						<snm>Korbel</snm>
						<fnm>JO</fnm>
					</au>
					<au>
						<snm>Snel</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Huynen</snm>
						<fnm>MA</fnm>
					</au>
					<au>
						<snm>Bork</snm>
						<fnm>P</fnm>
					</au>
				</aug>
				<source>Trends Genet</source>
				<pubdate>2002</pubdate>
				<volume>18</volume>
				<issue>3</issue>
				<fpage>158</fpage>
				<lpage>162</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0168-9525(01)02597-5</pubid>
						<pubid idtype="pmpid" link="fulltext">11858840</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<issue>17</issue>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>National Center for Biotechnology Information(NCBI)</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov.</url>
			</bibl>
			<bibl id="B16">
				<title>
					<p>Branch and bound algorithms to determine minimal evolutionary trees</p>
				</title>
				<aug>
					<au>
						<snm>Hendy</snm>
						<fnm>MD</fnm>
					</au>
					<au>
						<snm>Penny</snm>
						<fnm>D</fnm>
					</au>
				</aug>
				<source>Mathematical Biosciences</source>
				<pubdate>1982</pubdate>
				<volume>59</volume>
				<fpage>277</fpage>
				<lpage>290</lpage>
				<xrefbib>
					<pubid idtype="doi">10.1016/0025-5564(82)90027-X</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Phylogeny reconstruction</p>
				</title>
				<aug>
					<au>
						<snm>Swofford</snm>
						<fnm>DL</fnm>
					</au>
					<au>
						<snm>Olsen</snm>
						<fnm>GJ</fnm>
					</au>
				</aug>
				<source>Molecular Systematics</source>
				<publisher>Sunderland, Massachusetts: Sinauer Associates</publisher>
				<editor>Hillis DM, Moritz C</editor>
				<pubdate>1990</pubdate>
				<volume>11</volume>
				<fpage>411</fpage>
				<lpage>501</lpage>
			</bibl>
			<bibl id="B18">
				<title>
					<p>PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle</p>
				</title>
				<aug>
					<au>
						<snm>Felsenstein</snm>
						<fnm>J</fnm>
					</au>
				</aug>
				<pubdate>2005</pubdate>
			</bibl>
			<bibl id="B19">
				<title>
					<p>TreeView: an application to display phylogenetic trees on personal computers</p>
				</title>
				<aug>
					<au>
						<snm>Page</snm>
						<fnm>RD</fnm>
					</au>
				</aug>
				<source>Comput Appl Biosci</source>
				<pubdate>1996</pubdate>
				<volume>12</volume>
				<issue>4</issue>
				<fpage>357</fpage>
				<lpage>358</lpage>
				<xrefbib>
					<pubid idtype="pmpid">8902363</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>Alignment of whole genomes</p>
				</title>
				<aug>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Kasif</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Fleischmann</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<issue>11</issue>
				<fpage>2369</fpage>
				<lpage>2376</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148804</pubid>
						<pubid idtype="pmpid" link="fulltext">10325427</pubid>
						<pubid idtype="doi">10.1093/nar/27.11.2369</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>PipMaker &#8211; a web server for aligning two genomic DNA sequences</p>
				</title>
				<aug>
					<au>
						<snm>Schwartz</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Frazer</snm>
						<fnm>KA</fnm>
					</au>
					<au>
						<snm>Smit</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Riemer</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Bouck</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Gibbs</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Hardison</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2000</pubdate>
				<volume>10</volume>
				<issue>4</issue>
				<fpage>577</fpage>
				<lpage>586</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">310868</pubid>
						<pubid idtype="pmpid" link="fulltext">10779500</pubid>
						<pubid idtype="doi">10.1101/gr.10.4.577</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>A genomic perspective on protein families</p>
				</title>
				<aug>
					<au>
						<snm>Tatusov</snm>
						<fnm>RL</fnm>
					</au>
					<au>
						<snm>Koonin</snm>
						<fnm>EV</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>1997</pubdate>
				<volume>278</volume>
				<issue>5338</issue>
				<fpage>631</fpage>
				<lpage>637</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.278.5338.631</pubid>
						<pubid idtype="pmpid" link="fulltext">9381173</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>GCG: The Wisconsin Package of sequence analysis programs</p>
				</title>
				<aug>
					<au>
						<snm>Womble</snm>
						<fnm>DD</fnm>
					</au>
				</aug>
				<source>Methods Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>132</volume>
				<fpage>3</fpage>
				<lpage>22</lpage>
				<xrefbib>
					<pubid idtype="pmpid">10547828</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>PAUP package</p>
				</title>
				<url>http://paup.csit.fsu.edu/index.html</url>
			</bibl>
			<bibl id="B25">
				<title>
					<p>HomoloGene</p>
				</title>
				<url>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene</url>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Inparanoid</p>
				</title>
				<url>http://inparanoid.cgb.ki.se/</url>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Inparanoid: a comprehensive database of eukaryotic orthologs</p>
				</title>
				<aug>
					<au>
						<snm>O'Brien</snm>
						<fnm>KP</fnm>
					</au>
					<au>
						<snm>Remm</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2005</pubdate>
				<volume>33</volume>
				<issue>Database</issue>
				<fpage>D476</fpage>
				<lpage>480</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">540061</pubid>
						<pubid idtype="pmpid" link="fulltext">15608241</pubid>
						<pubid idtype="doi">10.1093/nar/gki107</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Automatic clustering of orthologs and in-paralogs from pairwise species comparisons</p>
				</title>
				<aug>
					<au>
						<snm>Remm</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Storm</snm>
						<fnm>CE</fnm>
					</au>
					<au>
						<snm>Sonnhammer</snm>
						<fnm>EL</fnm>
					</au>
				</aug>
				<source>J Mol Biol</source>
				<pubdate>2001</pubdate>
				<volume>314</volume>
				<issue>5</issue>
				<fpage>1041</fpage>
				<lpage>1052</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/jmbi.2000.5197</pubid>
						<pubid idtype="pmpid" link="fulltext">11743721</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>

