Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Software

GO2MSIG, an automated GO based multi-species gene set generator for gene set enrichment analysis

Justin Andrew Christiaan Powell

Author Affiliations

Takeda Cambridge Ltd, 418 Cambridge Science Park, Milton Road, Cambridge CB4 0PZ, UK

BMC Bioinformatics 2014, 15:146  doi:10.1186/1471-2105-15-146

Published: 17 May 2014

Abstract

Background

Despite the widespread use of high throughput expression platforms and the availability of a desktop implementation of Gene Set Enrichment Analysis (GSEA) that enables non-experts to perform gene set based analyses, the availability of the necessary precompiled gene sets is rare for species other than human.

Results

A software tool (GO2MSIG) was implemented that combines data from various publicly available sources and uses the Gene Ontology (GO) project term relationships to produce GSEA compatible hierarchical GO based gene sets for all species for which association data is available. Annotation sources include the GO association database (which contains data for over 200000 species), the Entrez gene2go table, and various manufacturers’ array annotation files. This enables the creation of gene sets from the most up-to-date annotation data available. Additional features include the ability to restrict by evidence code, to remap gene descriptors, to filter by set size and to speed up repeat queries by caching the GO term hierarchy. Synonymous GO terms are remapped to the version preferred by the GO ontology supplied. The tool can be used in standalone form, or via a web interface. Prebuilt gene set collections constructed from the September 2013 GO release are also available for common species including human. In contrast human GO based sets available from the Broad Institute itself date from 2008.

Conclusions

GO2MSIG enables the bioinformatician and non-bioinformatician alike to generate gene sets required for GSEA analysis for almost any organism for which GO term association data exists. The output gene sets may be used directly within GSEA and do not require knowledge of programming languages such as Perl, R or Python. The output sets can also be used with other analysis software such as ErmineJ that accept gene sets in the same format. Source code can be downloaded and installed locally from http://www.bioinformatics.org/go2msig/releases/ webcite or used via the web interface at http://www.go2msig.org/cgi-bin/go2msig.cgi webcite.

Keywords:
Gene set enrichment analysis (GSEA); GO ontology; Gene set collection; ErmineJ