Open Access Research article

Heading Down the Wrong Pathway: on the Influence of Correlation within Gene Sets

Daniel M Gatti1, William T Barry2, Andrew B Nobel345, Ivan Rusyn15 and Fred A Wright45*

Author Affiliations

1 Department of Environmental Sciences & Engineering, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

2 Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA

3 Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

4 Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

5 Centers for Environmental Bioinformatics and Computational Toxicology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

For all author emails, please log on.

BMC Genomics 2010, 11:574  doi:10.1186/1471-2164-11-574

Published: 18 October 2010

Additional files

Additional file 1:

Additional file 1, Figure S1 False positive rates are greatly increased using independence assumption methods, KEGG pathways. The proportion of permutations in which at least one KEGG pathway is called significant using an independence assumption method with a Bonferroni correction (α = 0.05), the Benjamini & Hochberg FDR (α = 0.05, 0.10), and the resampling approach described in this manuscript. Additional file 1, Figure S2. Variance inflation due to gene expression correlation increases the false positive rate, even when using a Bonferroni correction. The percentage of permutations in which at least one KEGG pathway was called significant is plotted versus the variance of the standardized gene set statistic (signed square root of the χ2 statistic). Results are shown for two human (a,b) and two mouse (c,d) arrays. Additional file 1, Figure S3. KEGG pathways that are called significant by chance under permutation are likely to be called significant in the observed data. The proportion of times that a KEGG pathway is declared significant under permutation is plotted versus the proportion of times it is called significant in the observed data. Additional file 1, Figure S4. The variance of the gene set statistic (signed square root of χ2 statistic) increases in proportion to the variance inflation factor (VIF = 1 + (m-1)ρ). The VIF is plotted versus the variance of the gene set statistic versus for two human (a, b) and two mouse (c, d) arrays. Spearman correlations are shown in the upper right corner.

Format: DOC Size: 1.1MB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data

Additional file 2:

Additional File 2. Table S1. HG-U95A array datasets. PubMed, GEO and other IDs and descriptions of the datasets used. Additional File 2. Table S2. HG-U133A array datasets. PubMed, GEO and other IDs and descriptions of the datasets used. Additional File 2. Table S3. mgu74a array datasets. PubMed, GEO and other IDs and descriptions of the datasets used. Additional File 2. Table S4. moe430a array datasets. PubMed, GEO and other IDs and descriptions of the datasets used.

Format: DOC Size: 389KB Download file

This file can be viewed with: Microsoft Word Viewer

Open Data