Email updates

Keep up to date with the latest news and content from BMC Bioinformatics and BioMed Central.

Open Access Highly Accessed Methodology article

Gene set analyses for interpreting microarray experiments on prokaryotic organisms

Nathan L Tintle1*, Aaron A Best2, Matthew DeJongh3, Dirk Van Bruggen13, Fred Heffron4, Steffen Porwollik5 and Ronald C Taylor6

Author affiliations

1 Department of Mathematics, Hope College, Holland, Michigan, USA

2 Department of Biology, Hope College, Holland, Michigan, USA

3 Department of Computer Science, Hope College, Holland, Michigan, USA

4 Department of Molecular Microbiology and Immunology, Oregon Health and Science University, Portland, Oregon, USA

5 Sidney Kimmel Cancer Center, San Diego, California, USA

6 Computational Biology & Bioinformatics Group, Pacific Northwest National Laboratory, Richland, WA

For all author emails, please log on.

Citation and License

BMC Bioinformatics 2008, 9:469  doi:10.1186/1471-2105-9-469

Published: 5 November 2008

Abstract

Background

Despite the widespread usage of DNA microarrays, questions remain about how best to interpret the wealth of gene-by-gene transcriptional levels that they measure. Recently, methods have been proposed which use biologically defined sets of genes in interpretation, instead of examining results gene-by-gene. Despite a serious limitation, a method based on Fisher's exact test remains one of the few plausible options for gene set analysis when an experiment has few replicates, as is typically the case for prokaryotes.

Results

We extend five methods of gene set analysis from use on experiments with multiple replicates, for use on experiments with few replicates. We then use simulated and real data to compare these methods with each other and with the Fisher's exact test (FET) method. As a result of the simulation we find that a method named MAXMEAN-NR, maintains the nominal rate of false positive findings (type I error rate) while offering good statistical power and robustness to a variety of gene set distributions for set sizes of at least 10. Other methods (ABSSUM-NR or SUM-NR) are shown to be powerful for set sizes less than 10. Analysis of three sets of experimental data shows similar results. Furthermore, the MAXMEAN-NR method is shown to be able to detect biologically relevant sets as significant, when other methods (including FET) cannot. We also find that the popular GSEA-NR method performs poorly when compared to MAXMEAN-NR.

Conclusion

MAXMEAN-NR is a method of gene set analysis for experiments with few replicates, as is common for prokaryotes. Results of simulation and real data analysis suggest that the MAXMEAN-NR method offers increased robustness and biological relevance of findings as compared to FET and other methods, while maintaining the nominal type I error rate.