Basic Research Program, SAIC-Frederick, Inc. NCI-Frederick, Frederick, MD, USA

Chaire de Bioinformatique, Conservatoire National des Arts et Metiers, 75003, Paris, France

Laboratory of Genomic Diversity, NCI-Frederick, Frederick, MD, USA

Abstract

Background

As we enter an era when testing millions of SNPs in a single gene association study will become the standard, consideration of multiple comparisons is an essential part of determining statistical significance. Bonferroni adjustments can be made but are conservative due to the preponderance of linkage disequilibrium (LD) between genetic markers, and permutation testing is not always a viable option. Three major classes of corrections have been proposed to correct the dependent nature of genetic data in Bonferroni adjustments: permutation testing and related alternatives, principal components analysis (PCA), and analysis of blocks of LD across the genome. We consider seven implementations of these commonly used methods using data from 1514 European American participants genotyped for 700,078 SNPs in a GWAS for AIDS.

Results

A Bonferroni correction using the number of LD blocks found by the three algorithms implemented by Haploview resulted in an insufficiently conservative threshold, corresponding to a genome-wide significance level of α = 0.15 - 0.20. We observed a moderate increase in power when using PRESTO, SLIDE, and simpleℳ when compared with traditional Bonferroni methods for population data genotyped on the Affymetrix 6.0 platform in European Americans (α = 0.05 thresholds between 1 × 10^{-7 }and 7 × 10^{-8}).

Conclusions

Correcting for the number of LD blocks resulted in an anti-conservative Bonferroni adjustment. SLIDE and simpleℳ are particularly useful when using a statistical test not handled in optimized permutation testing packages, and genome-wide corrected p-values using SLIDE, are much easier to interpret for consumers of GWAS studies.

Background

Since the first successful genome-wide association studies (GWAS) in 2005, over 600 GWAS have been reported

The probability of a Type I error (incorrectly ascribing scientific significance to a statistical test) is generally controlled by setting the significance level, α, for a test, but the probability of making at least one Type I error in a study,

is a function of n, the number of independent comparisons made, as well as α. The direct application to a GWAS is that, with a significance level typical to small studies and candidate gene studies (e.g. α = 0.05, α = 0.01, α = 0.001), the probability of not committing a GWAS-wide Type I error is very small.

The standard for evidence of significance in GWAS to securely identify a genotypephenotype association in European Americans is generally considered to be p < 5 × 10^{-8 }or p < 1 × 10^{-8}, for α = 0.05 and 0.01, respectively

Several methods are commonly used to control the GWAS-wide Type I error rate: p-value adjustments for multiple comparisons have long been used when making multiple comparisons

A Bonferroni adjustment fits our problem particularly well because many comparisons are made and a GWAS is considered agnostic, with no prior hypotheses

One relevant question is then not how many SNPs are being tested, but how many independent statistical comparisons are being made. In the context of a principal components analysis (PCA) of the genotype data, the number of independent comparisons can be defined as the number of principal components accounting for a large portion (99.5% has been suggested) of the variance in the data

What is not clear, however is which SNPs fall into the informative set, so all SNPs are tested. The assumption is then made that the test statistics are distributed similarly to the test statistics from an analysis including only the informative SNPs. Based on the simulations done by Gao et. al. this seems to be a reasonable assumption

Another relevant question is how to adjust the p-values directly, rather than relying on a significance threshold ^{-8 }and 4.1 × 10^{-10}, becomes much more tractable after a genome-wide correction, resulting in corrected p-values of 0.0291 and 0.0004, respectively.

There have been a number of studies attempting to provide an accurate picture of how SNPs, and/or statistical tests of SNPs, are correlated in genome-wide studies. These fall into three general categories: variations and alternatives to permutation testing

We have recently genotyped 1514 European Americans for 700,078 SNPs using the Affymetrix 6.0 platform in a GWAS to search for AIDS restriction genes. Here we compare traditional Bonferroni significance thresholds with methods from each of these statistical correction strategies to identify an appropriate measure of significance in our GWAS: 1) PRESTO, an optimized permutation algorithm

Our aim is to identify the most appropriate method for obtaining accurate GWAS-wide significance thresholds and/or corrected p-values among 700,000 linked SNPs, the best method being one that results in an accurate estimate of the number of comparisons and has reasonable computational requirements.

Methods

GWAS Data

After filtering for a 90% sample call rate, 1,514 European Americans were successfully genotyped on the Affymetrix 6.0 platform. These subjects consisted of 1,255 HIV- infected and 259 HIV-negative individuals at risk of HIV infection; clinical categories were distributed randomly across plates and batch effects were monitored. We chose 700,078 SNPs, after filtering each SNP for >95% call rate, Hardy-Weinberg equilibrium, Mendel errors, and a minor allele frequency below 1%. After re-clustering and filtering bad SNPs, all sample call rates were >95% with an average call rate of 98.9%. Individuals were unrelated, with the exception of 8 CEPH trios used to check for Mendel errors in the genetic data. A principal components analysis of the genetic data using Eigensoft was used to identify population structure. No significant outliers were identified, however, since there is some stratification in European American populations, SNPs that contributed significantly to population structure were tagged in subsequent analyses

To address the concern that an excess number of cases to controls would lead to less generalizable results, we analyzed a random sample of 259 cases with all 259 controls. Other than the changes in case/control ratio and sample size, all other variables were left unchanged.

Variations and Alternatives to Permutation Testing

^{th }percentile of the uncorrected distribution. This distribution was used as the standard by which each method's accuracy is gauged, and corresponding significance levels for all other methods were estimated using this distribution. Results from PRESTO were compared with the results from PERMORY, another optimized permutation testing software package that was recently released

Principal Components Analysis

Analysis of Underlying LD

LD blocks were inferred in our GWAS data using the three methods available in Haploview

The Gabriel protocol, the default method for Haploview, was used with an upper D' confidence interval bound of 0.98, a lower D' confidence interval bound of 0.70, and with 5% of informative markers required to be in strong LD

Results and Discussion

Variations and Alternatives to Permutation Testing

^{-8 }(see Table ^{-7 }(see Table

Summary of Analysis Results

**Method**

**Significance Threshold**

**Corresponding α level**

Bonferroni

0.71 × 10^{-7}

0.046

PRESTO

0.76 × 10^{-7}

0.05

simpleℳ

0.82 × 10^{-7}

0.053

SLIDE

1.09 × 10^{-7}

0.068

Gabriel

2.72 × 10^{-7}

0.151

4-Gamete

3.06 × 10^{-7}

0.166

Solid spine

3.71 × 10^{-7}

0.195

The significance threshold for each method is shown (α_{GWAS }= 0.05), as well as the corresponding genome-wide α level when compared with the PRESTO method. A strict Bonferroni significance threshold is also given.

Difference in Significance Threshold in a Subset of the Data

**Method**

**Δ Significance Threshold**

simpleℳ

-8 × 10^{-11}

4-Gamete

-8 × 10^{-10}

SLIDE

-5 × 10^{-9}

Gabriel

-6 × 10^{-8}

PRESTO

7 × 10^{-7}

Solid spine

-8 × 10^{-7}

The difference in significance threshold is given, comparing an analysis of the full data set to a subset of the data with an equal number of cases and controls (1,514 and 518 individuals, respectively).

^{-8}, which corresponded to a genome-wide significance level of α ≈ 0.05 when compared with PRESTO (see Table

^{-7}, which corresponded to a genome-wide significance level of α = 0.07 when compared with PRESTO (see Table ^{-9 }(see Table

Principal Components Analysis

^{-7}, corresponding to a genome-wide significance level of α ≈ 0.05 when compared with the PRESTO results. As with SLIDE, the analysis of the smaller sample was remarkably similar, differing only by 8 × 10^{-10}. These results indicate that simpleℳ is also an excellent alternative to a full permutation test. However, because of the concern of how variations in region size would affect the accuracy of the simpleℳ analysis, regions with as many SNPs as we had computational resources to analyze (some regions included nearly 30,000 SNPs, others consisted of entire chromosomes) were compared to the results in Table ^{-9}. It is important to note, however, that since this is an ^{2}) problem, the memory and serial time required to analyze these larger regions increases rapidly with the size of the regions analyzed. Regions containing more than a few thousand SNPs, however, seem to result in very similar significance thresholds in this data set, and the computational resources required are reasonable for regions of a few thousand SNPs (see Figure

Change in computation time and significance threshold for varying region sizes

**Change in computation time and significance threshold for varying region sizes**. The change in serial computation time (solid black line) and significance threshold (dotted blue line) are plotted as a function of the mean number of SNPs in each region in a GWAS-wide analysis using the simpleℳ method.

The simpleℳ method is currently the fastest way to calculate the effective number of independent tests in a GWAS ^{2}) nature of this algorithm the genome needs to be broken up into small regions to maintain this computational speed. This adds complexity to the analysis and requires a significant amount of pre-analysis. Considering the many examples of long range LD across the genome, simpleℳ could also lead to a slightly more conservative estimate in some studies

Analysis of Underlying LD

The three LD-based methods using Haploview are the least conservative, with significance thresholds between 2.72 × 10^{-7 }and 3.71 × 10^{-7}, corresponding to α levels between 0.15 and 0.20 as compared to permutation testing using PRESTO (see Table

Comparison of α levels when restricting the definition of a haplotype

**Method**

**Parameters**

**Significance Threshold**

**Corresponding α level**

Gabriel

D'U > 0.98

2.72 × 10^{-7}

0.151

D'L > 0.70

D'U > 0.98

2.11 × 10^{-7}

0.12

D'L > 0.85

4-Gamete

Cutoff = 1%

3.06 × 10^{-7}

0.166

Cutoff = 0.5%

2.50 × 10^{-7}

0.139

Solid spine

D' = 0.80

3.71 × 10^{-7}

0.195

D' = 0.95

2.79 × 10^{-7}

0.155

Significance thresholds with corresponding α levels are given for each haplotype calling method with the standard parameters and a set of more restricted parameters. D'_{U }and D'_{L }represent the, upper and lower confidence limits of D', respectively.

Nicodemus et al.

Conclusions

A one-size-fits-all Bonferroni correction, although conservative, may not result in a large Type II error rate with a sample size in the tens of thousands, but as the sample size drops, so does statistical power. In studies where gathering large numbers of cases is prohibitive (e.g. when disease prevalence is low), a Bonferroni correction becomes overly conservative by detrimentally inflating the Type II error rate. The methods considered here can ameliorate this loss of power and make interpretation of study results less enigmatic.

The results from the PRESTO, SLIDE and simpleℳ methods appear to be equally good in population data genotyped on the Affymetrix 6.0 platform in European Americans (α = 0.05 thresholds between 1 × 10^{-7 }and 8 × 10^{-8}), and each presents a modest gain in power over the strict Bonferroni thresholds advocated by some

Authors' contributions

RCJ conceived and carried out the analysis. GWN, CAW, and SJO contributed to the study design. JLT, JAL, BDK, RCJ, CAW, GWN, and SJO contributed to the GWAS data. RCJ wrote the manuscript with contributions from GWN, CAW, JAL, and SJO. All authors read and approved the final manuscript.

Acknowledgements

This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD

This project has been funded in whole or in part with federal funds from the NationalCancer Institute, National Institutes of Health, under contract HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. This research was supported [in part] by the Intramural Research Program of NIH, National Cancer Institute, Center for Cancer Research.