CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb
1 Department of Bioinformatics and Computational Biology, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA, 20110, USA
2 Current address: Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA
3 Current address: Kingdomain Corporation, 10305 Nantucket Court, Fairfax, VA 22032, USA
BMC Research Notes 2009, 2:168 doi:10.1186/1756-0500-2-168Published: 25 August 2009
Viruses and small-genome bacteria (~2 megabases and smaller) comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG) is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms.
CGUG is available at http://binf.gmu.edu/geneorder.html webcite as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes.
CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.