Email updates

Keep up to date with the latest news and content from BMC Research Notes and BioMed Central.

Open Access Technical Note

CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

Padmanabhan Mahadevan12, John F King13 and Donald Seto1*

Author Affiliations

1 Department of Bioinformatics and Computational Biology, George Mason University, 10900 University Boulevard, MSN 5B3, Manassas, VA, 20110, USA

2 Current address: Department of Biological Sciences, Vanderbilt University, Nashville, TN 37235, USA

3 Current address: Kingdomain Corporation, 10305 Nantucket Court, Fairfax, VA 22032, USA

For all author emails, please log on.

BMC Research Notes 2009, 2:168  doi:10.1186/1756-0500-2-168

Published: 25 August 2009



Viruses and small-genome bacteria (~2 megabases and smaller) comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG) is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms.


CGUG is available at webcite as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes.


CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.