Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Andreas Keller*, Christina Backes and Hans-Peter Lenhof

Author Affiliations

Center for Bioinformatics, Saarland University, Building E1 1, 66804 Saarbrücken, Germany

For all author emails, please log on.

BMC Bioinformatics 2007, 8:290  doi:10.1186/1471-2105-8-290

Published: 6 August 2007

Abstract

Background

Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values.

Results

We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.