BMC Bioinformatics

official impact factor 3.03

Open Access Highly Access Methodology article

A general modular framework for gene set enrichment analysis

Marit Ackermann1 and Korbinian Strimmer2*

Author Affiliations

1 Biotechnology Center, Technical University Dresden, 01062 Dresden, Germany

2 Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Härtelstr, 16-18, 04107 Leipzig, Germany

For all author emails, please log on.

BMC Bioinformatics 2009, 10:47 doi:10.1186/1471-2105-10-47

Published: 3 February 2009

Abstract

Background

Analysis of microarray and other high-throughput data on the basis of gene sets, rather than individual genes, is becoming more important in genomic studies. Correspondingly, a large number of statistical approaches for detecting gene set enrichment have been proposed, but both the interrelations and the relative performance of the various methods are still very much unclear.

Results

We conduct an extensive survey of statistical approaches for gene set analysis and identify a common modular structure underlying most published methods. Based on this finding we propose a general framework for detecting gene set enrichment. This framework provides a meta-theory of gene set analysis that not only helps to gain a better understanding of the relative merits of each embedded approach but also facilitates a principled comparison and offers insights into the relative interplay of the methods.

Conclusion

We use this framework to conduct a computer simulation comparing 261 different variants of gene set enrichment procedures and to analyze two experimental data sets. Based on the results we offer recommendations for best practices regarding the choice of effective procedures for gene set enrichment analysis.