Log on / register
Feedback | Support | My details
Open AccessResearch article

Robust extraction of functional signals from gene set analysis using a generalized threshold free scoring function

Petri Törönen1 email, Pauli J Ojala2 email, Pekka Marttinen3 email and Liisa Holm1,4 email

The Holm Group, Biocenter II, Institute of Biotechnology, PO Box 56, 00014 University of Helsinki, Finland

Finnish Red Cross Blood Service Research and Development, Kivihaantie 7, 00310 Helsinki, Finland

Department of Mathematics and Statistics, P.O. Box 68, 00014 University of Helsinki, Finland

Department of Biological and Environmental Sciences, P.O. Box 56, 00014 University of Helsinki, Finland

author email corresponding author email

BMC Bioinformatics 2009, 10:307doi:10.1186/1471-2105-10-307

Published: 23 September 2009

Abstract

Background

A central task in contemporary biosciences is the identification of biological processes showing response in genome-wide differential gene expression experiments. Two types of analysis are common. Either, one generates an ordered list based on the differential expression values of the probed genes and examines the tail areas of the list for over-representation of various functional classes. Alternatively, one monitors the average differential expression level of genes belonging to a given functional class. So far these two types of method have not been combined.

Results

We introduce a scoring function, Gene Set Z-score (GSZ), for the analysis of functional class over-representation that combines two previous analysis methods. GSZ encompasses popular functions such as correlation, hypergeometric test, Max-Mean and Random Sets as limiting cases. GSZ is stable against changes in class size as well as across different positions of the analysed gene list in tests with randomized data. GSZ shows the best overall performance in a detailed comparison to popular functions using artificial data. Likewise, GSZ stands out in a cross-validation of methods using split real data. A comparison of empirical p-values further shows a strong difference in favour of GSZ, which clearly reports better p-values for top classes than the other methods. Furthermore, GSZ detects relevant biological themes that are missed by the other methods. These observations also hold when comparing GSZ with popular program packages.

Conclusion

GSZ and improved versions of earlier methods are a useful contribution to the analysis of differential gene expression. The methods and supplementary material are available from the website http://ekhidna.biocenter.helsinki.fi/users/petri/public/GSZ/GSZscore.html webcite.


© 1999-2009 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.