Genomorama: genome visualization and analysis

Gans, Jason D; Wolinsky, Murray

doi:10.1186/1471-2105-8-204

Software
Open access
Published: 14 June 2007

Genomorama: genome visualization and analysis

Jason D Gans¹ &
Murray Wolinsky¹

BMC Bioinformatics volume 8, Article number: 204 (2007) Cite this article

7314 Accesses
7 Citations
Metrics details

Abstract

Background

The ability to visualize genomic features and design experimental assays that can target specific regions of a genome is essential for modern biology. To assist in these tasks, we present Genomorama, a software program for interactively displaying multiple genomes and identifying potential DNA hybridization sites for assay design.

Results

Useful features of Genomorama include genome search by DNA hybridization (probe binding and PCR amplification), efficient multi-scale display and manipulation of multiple genomes, support for many genome file types and the ability to search for and retrieve data from the National Center for Biotechnology Information (NCBI) Entrez server.

Conclusion

Genomorama provides an efficient computational platform for visualizing and analyzing multiple genomes.

Background

With the rapid growth in the number of sequenced genomes has come a corresponding proliferation of computational tools for viewing, comparing and searching genome sequences and annotations. Tools can be divided into two broad categories [1], database-client and stand-alone. In general, database-client tools offer static (or semi-static) visualizations of small sets of predefined genomes, while stand-alone tools allow interactive visualizations of locally stored genomes. Stand-alone tools can serve as graphical front ends for displaying the output of locally run calculations. A high level comparison of common features for these stand-alone tools [2–19] is shown in Table 1 and reveals several trends and patterns. Almost all of the tools are implemented in an interpreted language (i.e. Java, Perl, Tcl/Tk). While this provides for cross platform portability, the responsiveness (i.e. rendering speed, file loading speed) of these applications is poor. While all of the tools can display genome annotations, additional functionalities (i.e. sequence and annotation based searching, multiple sequence alignment, annotation editing, etc.) vary widely between programs.

Table 1 Comparing features of freely available, stand-alone genome viewers

Full size table

Not content with the performance or feature set of existing programs, we wrote Genomorama, a stand-alone tool originally developed to assist in computational signature design for bacterial and viral pathogen detection. Genomorama allows users designing DNA-based hybridization assays, such as PCR or DNA probes, to easily identify the regions of a genome targeted by a given assay. It is distinguished from existing tools by DNA hybridization-based sequence searching, its rapid execution speed, and ability to read and export a diverse set of common file formats. Despite its origins as a viewer for viral and bacterial genomes, Genomorama can also visualize large eukaryotic genomes (e.g. human chromosomes).

Implementation

Genomorama is a software program for interactively displaying and analyzing multiple genomes. It provides a powerful yet easy to use interface that leverages the visualization power of modern computers (via OpenGL) and the substantial bioinformatic infrastructure provided by the NCBI (via the NCBI C toolkit). Genomorama is written in portable, highly optimized C++ and comes in three "flavors" that allow it to run natively on (most) modern operating systems: OS X (using Carbon), Microsoft Windows (using the Microsoft Foundation Classes) and Linux (using Motif). The Motif version allows any X-windows client that supports OpenGL to remotely run Genomorama. Executables and source code are freely provided for all flavors.

Results and discussion

To visualize and compare annotated genome features at all relevant size scales, genomes are displayed on the computer screen as linear, scale-dependent maps. The user interacts with a map using the mouse, keyboard and scroll bars. Semantic zooming [20] is used to display genomic features which occur at a wide range of scales, i.e. ~10⁵ bases for a mammalian gene, ~10⁴ bases for a pathogenicity island, ~10³ bases for a bacterial gene, ~10² bases for a tRNA, ~10¹ bases for a transcription factor binding site and 10⁰ for a single nucleotide polymorphism. Optional 2D graphs, including %G+C, GC skew (automatically computed from the genome sequence) and external data sets (provided by the user in a separate file), can be superimposed on genome maps. Publication quality, WYSIWYG ("What You See Is What You Get") images can be saved in either GIF or PostScript formats.

Genome annotations and sequences are available in a large number of file formats and Genomorama can read a substantial subset of these formats, including GenBank (GBK), European Molecular Biology Laboratory (EMBL), Abstract Syntax Notation One (ASN.1), Protein Table (PTT) and FASTA. Unlike existing programs, Genomorama can read the multi-part GBK, EMBL and ASN.1 files used to store annotations and sequence for partially assembled sequences for both prokaryotic and eukaryotic organisms. The ability to load multipart annotation files allows access to preliminary annotation information provided by sequencing centers during the whole genome shotgun sequencing of an organism (these files are available from the NCBI ftp site [21]). A screen shot of five contigs and associated sequencing quality scores from the genome Sphingopyxis alaskensis RB2256 is shown in Figure 1.

Genomorama can load large (> 10⁸ bases) genomes. Support for large genomes is crucial for visualizing entire eukaryotic chromosomes. A comparison between loading times for Genomorama and two Java-based visualization tools is shown in Figure 2. Conservative memory usage and efficient C++ implementation enable Genomorama to load the sequence and annotations for human chromosome 1 substantially faster (more than an order of magnitude) than either of the Java-based programs on a range of desktop computers.

To assist in experimental design and analysis, Genomorama provides DNA hybridization-based searches to identify probe binding locations and PCR amplification products. Given a pair of PCR primers, Genomorama will display all corresponding PCR amplicons from a target sequence. Both traditional PCR primer and Padlock probe [22] queries are supported. These searches employ a sequence similarity criteria defined by DNA melting temperature [23–28], which allows for non-Watson and Crick base pairing (but currently not gaps or DNA bulges), and an optional number of exact matching bases at the 3' end of each primer. All possible combinations of the forward and reverse PCR primers are tested (i.e. forward-reverse, reverse-forward, forward-forward and reverse-reverse). In contrast, existing in-silico PCR tools are either inflexible (i.e. require a preconfigured server) [29] or rely on heuristic similarity measures (i.e. number of mismatches between primer and template) [30, 31].

Genomorama also performs primer prediction by computing all potential forward and reverse PCR primers that satisfy primer length, melting temperature, %G+C and heuristic base composition requirements. An example of PCR primer based searching, using the B. anthracis specific primers [32], is shown in Figure 3. Finally, sequence searching (both exact and hybridization based) is sensitive to the topology of the target DNA molecule (i.e. either linear or circular) and, as a result, can identify query matches that span the start/stop (i.e. nucleotide 0) of circular genomes.

Conclusion

Genomorama is an easy to use computational tool for a number of genome comparison tasks, including real time display of multiple genomes, high quality output and novel hybridization based sequence searching.

Availability and requirements

Project name: Genomorama
Project homepage: http://snp.lanl.gov/genomorama
Operating systems: OS X, Windows, Linux
Programming language: C++
License: Freely available
Any restrictions on use by non-academics: No

References

Loraine AE, Helt GA: Visualizing the genome: techniques for presenting human genome data and annotations. BMC Bioinformatics 2002., 3(19):
Google Scholar
Lewis SE, Searle SMJ, Harris N, Gibson M, Iyer V, Richter J, Wiel C, Bayraktaroglu L, Birney E, Crosby MA, Kaminker JS, Matthews BB, Prochnik SE, Smithy CD, Tupy JL, Rubin GM, Misra S, Mungall CJ, Clamp ME: Apollo: a sequence annotation editor. Genome Biology 2002, 3(12):1–14. 10.1186/gb-2002-3-12-research0082
Article Google Scholar
Engels R, Yu T, Burge C, Mesirov JP, DeCaprio D, Galagan JE: Combo: a whole genome comparative browser. Bioinformatics 2006, 22(4):1782–1783. 10.1093/bioinformatics/btl193
Article CAS PubMed Google Scholar
Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics 2000, 16(10):944–945. 10.1093/bioinformatics/16.10.944
Article CAS PubMed Google Scholar
Turinsky AL, Ah-Seng AC, Gordon PMK, Stromer JN, Taschuk ML, Xu EW, Sensen CW: Bioinformatics visualization and integration with open standards: The Bluejay genomic browser. In Silico Biology 2004., 5(18):
Google Scholar
Stothard P, Wishart DS: Circular genome visualization and exploration using CGView. Bioinformatics 2005, 21(4):537–539. 10.1093/bioinformatics/bti054
Article CAS PubMed Google Scholar
Fiers MW, van de Wetering H, Peeters TH, van Wijk JJ, Nap JP: DNAVis: interactive visualization of comparative genome annotations. Bioinformatics 2006, 22(3):354–355. 10.1093/bioinformatics/bti807
Article CAS PubMed Google Scholar
Nix DA, Eisen MB: GATA: a graphic alignment tool for comparative sequence analysis. BMC Bioinformatics 2005., 6(9):
Google Scholar
Vernikos G, Gkogkas C, Promponas V, Hamodrakas S: GeneViTo: Visualizing gene-product functional and structural features in genomic datasets. BMC Bioinformatics 2003., 4(1):
Google Scholar
Sato N, Ehira S: GenoMap, a circular genome data viewer. Bioinformatics 2003, 19(12):1583–1584. 10.1093/bioinformatics/btg195
Article CAS PubMed Google Scholar
Baerends R, Smits W, de Jong A, Hamoen L, Kok J, Kuipers O: Genome2D: a visualization tool for the rapid analysis of bacterial transcriptome data. Genome Biology 2004, 5(5):R37. 10.1186/gb-2004-5-5-r37
Article PubMed Central PubMed Google Scholar
Yang J, Wang J, Yao ZJ, Jin Q, Shen Y, Chen R: GenomeComp: a visualization tool for microbial genome comparison. J Microbiol Methods 2003, 54(3):423–426. 10.1016/S0167-7012(03)00094-0
Article CAS PubMed Google Scholar
Gibson R, Smith DR: Genome visualization made fast and simple. Bioinformatics 2003, 19(11):1449–1450. 10.1093/bioinformatics/btg152
Article CAS PubMed Google Scholar
Ghai R, Hain T, Chakraborty T: GenomeViz: visualizing microbial genomes. BMC Bioinformatics 2004., 5(1):
Google Scholar
DiCuccio M, Cherry J, Lebedev V, Shomrat M, Smith R, Tereshkov V, Voronov Y, Yazhuk A: Genome Workbench.[http://www.ncbi.nlm.nih.gov/projects/gbench/]
Affymetrix: IGB.[http://www.affymetrix.com/support/developer/tools/download_igb.affx]
Darling A, Mau B, Blattner FR, Perna NT: Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements. Genome Res 2004, 14(7):1394–1403. 10.1101/gr.2289704
Article PubMed Central CAS PubMed Google Scholar
Hu Z, Frith M, Niu T, Weng Z: SeqVISTA: a graphical tool for sequence feature visualization and comparison. BMC Bioinformatics 2003., 4(1):
Google Scholar
Montgomery SB, Astakhova T, Bilenky M, Birney E, Fu T, Hassel M, Melsopp C, Rak M, Robertson AG, Sleumer M, Siddiqui AS, Jones SJM: Sockeye: A 3D Environment for Comparative Genomics. Genome Res 2004, 14(5):956–962. 10.1101/gr.1890304
Article PubMed Central CAS PubMed Google Scholar
Bederson BB, Hollan JD, Perlin K, Meyer J, Bacon D, Furnas G: Pad++: A Zoomable Graphical Sketchpad For Exploring Alternate Interface Physics. Journal of Visual Languages and Computing 1995, 7: 3–31. 10.1006/jvlc.1996.0002
Article Google Scholar
NCBI ftp site[ftp://ftp.ncbi.nih.gov/genbank/wgs/]
Nilsson M, Banér J, Mendel-Hartvig M, Dahl F, Antson DO, Gullberg M, Landegren U: Making ends meet in genetic analysis using padlock probes. Human Mutation 2002, 19: 410–415. 10.1002/humu.10073
Article CAS PubMed Google Scholar
Allawi HT, SantaLucia J: Thermodynamics and NMR of Internal G-T Mismatches in DNA. Biochemistry 1997, 36: 10581–10594. 10.1021/bi962590c
Article CAS PubMed Google Scholar
Allawi HT, SantaLucia J: Thermodynamics of internal C-T mismatches in DNA. Nucleic Acids Research 1998, 26(11):2694–2701. 10.1093/nar/26.11.2694
Article PubMed Central CAS PubMed Google Scholar
Allawi HT, SantaLucia J: Nearest Neighbor Thermodynamic Parameters for Internal C-A Mismatches in DNA. Biochemistry 1998, 37: 2170–2179. 10.1021/bi9724873
Article CAS PubMed Google Scholar
Allawi HT, SantaLucia J: Nearest-Neighbor Thermodynamics of Internal A-C Mismatches in DNA: Sequence Dependence and pH Effects. Biochemistry 1998, 37: 9435–9444. 10.1021/bi9803729
Article CAS PubMed Google Scholar
Bommarito S, Peyret N, SantaLucia J: Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Research 2000, 28(9):1929–1934. 10.1093/nar/28.9.1929
Article PubMed Central CAS PubMed Google Scholar
Peyret N, Seneviratne PA, Allawi HT, SantaLucia J: Nearest-Neighbor Thermodynamics and NMR of DNA Sequences with Internal A-A, C-C, G-G, and T-T Mismatches. Biochemistry 1999, 38: 3468–3477. 10.1021/bi9825091
Article CAS PubMed Google Scholar
Lexa M, Horak J, Brzobohaty B: Virtual PCR. Bioinformatics 2001, 17(2):192–193. 10.1093/bioinformatics/17.2.192
Article CAS PubMed Google Scholar
Bikandi J, Millán RS, Rementeria A, Garaizar J: In silico analysis of complete bacterial genomes: PCR, AFLP-PCR and endonuclease restriction. Bioinformatics 2004, 20(5):798–799. 10.1093/bioinformatics/btg491
Article CAS PubMed Google Scholar
Schuler GD: Sequence Mapping by Electronic PCR. Genome Research 1997, 7(5):541–550.
PubMed Central CAS PubMed Google Scholar
Radnedge L, Agron P, Hill K, Jackson P, Ticknor L, Kiem P, Anderson G: Genome differences that distinguish Bacillus anthracis from Bacillus cereus and Bacillus thuringiensis. Applied and Environmental Microbiology 2003, 69(5):2755–2764. 10.1128/AEM.69.5.2755-2764.2003
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported in part by the DOE/DHS Chemical Biological National Security Program (CBNP), the DOD/USAMRMC Toxin and Virulence Factor Database Effort (MIPR 2MCTC32157) and the Los Alamos National Laboratory Directed Research Development Program (LDRD 20070010DR). The authors would like to thank N. Pawley for helpful discussions and K. Sirotkin and J. Kans for assistance with the NCBI toolkit.

Author information

Authors and Affiliations

Biosciences Division, Los Alamos National Laboratory, Los Alamos, NM, USA
Jason D Gans & Murray Wolinsky

Authors

Jason D Gans
View author publications
You can also search for this author in PubMed Google Scholar
Murray Wolinsky
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jason D Gans.

Additional information

Authors' contributions

JG wrote the program and documentation. MW oversaw the development process. Both authors prepared and approved the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gans, J.D., Wolinsky, M. Genomorama: genome visualization and analysis. BMC Bioinformatics 8, 204 (2007). https://doi.org/10.1186/1471-2105-8-204

Download citation

Received: 04 January 2007
Accepted: 14 June 2007
Published: 14 June 2007
DOI: https://doi.org/10.1186/1471-2105-8-204

Genomorama: genome visualization and analysis