Program in Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY, USA

Abstract

Background

Genome-wide association studies are a promising new tool for deciphering the genetics of complex diseases. To choose the proper sample size and genotyping platform for such studies, power calculations that take into account genetic model, tag SNP selection, and the population of interest are required.

Results

The power of genome-wide association studies can be computed using a set of tag SNPs and a large number of genotyped SNPs in a representative population, such as available through the HapMap project. As expected, power increases with increasing sample size and effect size. Power also depends on the tag SNPs selected. In some cases, more power is obtained by genotyping more individuals at fewer SNPs than fewer individuals at more SNPs.

Conclusion

Genome-wide association studies should be designed thoughtfully, with the choice of genotyping platform and sample size being determined from careful power calculations.

Background

One goal of modern human genetics is to identify the genetic variants that predispose individuals to develop common, complex diseases. It has been proposed that population-based association studies will be more powerful than traditional family-based linkage methods in identifying such high-frequency, low-penetrance alleles ^{5 }– 10^{6}) would need to be genotyped to capture the common variation in the genome

One key question in designing such studies is the choice of tag SNPs. Numerous methods for choosing the best set of tagging SNPs have been developed and compared ^{2}, between the tag SNPs and all other SNPs ^{2 }represents the correlation between two SNPs. It is a useful measure because, if N individuals are needed for a specific power with a direct test of association, N/r^{2 }individuals would be needed for an indirect test of association ^{2 }above some threshold) with at least one tag

There are two related problems with this measure of coverage. First, the binary decision of whether r^{2 }is above or below a threshold does not capture the continual decrease in power as r^{2 }decreases. If the cutoff value of r^{2 }is 0.8, a SNP that shows LD of r^{2 }= 0.75 with a tag would be called undetectable since the measure of LD is below the threshold. In truth, association would be detectable, albeit with reduced power. Second, knowledge of the coverage of a set of tag SNPs says nothing about the number of individuals needed for a well-powered study. A better measure to evaluate tag SNPs would be an explicit calculation of the probability that a genome-wide association study will find a statistically significant association given that such an association exists (^{2 }adjusted power" that integrates LD and tag SNP information to provide the overall power of a study

Realistically, one does not have an unlimited choice of SNPs but rather chooses among several competing commercial products with fixed sets of tag SNPs. Therefore, instead of choosing a set of tag SNPs, a more common problem now is how to evaluate which of several fixed sets of tag SNPs is better for a particular study. Several papers have looked at power for hypothetical and commercial sets of tag SNPs through empirical simulations on a subset of chromosomal regions

Here, I present a method for computing the power of a genome-wide association study when a genetic model and sample size are specified and LD information is available for the population being studied. This method is equivalent to the cumulative ^{2 }adjusted power of Jorgenson and Witte

Results and discussion

The power calculations require genotype data on a large representative sample of common SNPs from the population as well as a list of which of these representative SNPs are the tag SNPs (SNPs to be genotyped). Power is computed in three steps. First the best tag SNP for each of the representative SNPs is found. Then, the power for detecting association for each of the representative SNPs assuming that SNP directly influences the phenotype is computed. For this computation, it is assumed that the study will be performed by testing for genotype frequency differences between cases and controls using a two-degree of freedom ^{2 }test in which multiple tests are corrected for using the Bonferroni correction. This test explicitly assumes a codominant model. I use this test because it is the most general, at the cost of reduced power relative to a model-specific test. While a multimarker tagging approach could be taken

Taking the average power over all the SNPs is justified using probability theory. Assume there are _{
i
}. Let _{
i
}represent SNP _{
i
}represent SNP _{
i
}is given as _{
i
}= Pr(_{
i
}|_{
i
}). Thus, if each _{
i
}multiplied by Pr(_{
i
}), we get

The added assumption that each SNP is equally likely to be causative yields

This final equation is the same as taking the average power over all the SNPs.

This method was applied to examine the power of genome-wide association studies in the four populations studied in the International HapMap Project

The number of SNPs present in each population and present in each commercial genotyping system

Population

CEU

JPT+CHB

YRI

SNPs in HapMap

3868157

3890416

3796934

SNPs w/MAF >= 0.05 (%)

2230515 (58%)

2046163 (53%)

2477182 (65%)

Common SNPs on Affy 100 K chip (%)

91400 (79%)

82995 (72%)

91363 (79%)

Common SNPs on Affy 500 K chip (%)

378415 (77%)

346887 (70%)

409849 (83%)

Common SNPs on Illumina 300 K chip (%)

313265 (99%)

251560 (79%)

252678 (80%)

Common SNPs on Illumina 550 K chip (%)

506543 (91%)

425631 (77%)

441884 (80%)

The percentages given are the fraction of SNPs from the overall SNP set, and from each of the genotyping platforms, that are present with a MAF of at least 0.05 in each population

I next asked how power changes with increasing sample size for the various genotyping platforms (Figure

Power for the test of genotypic association as a function of sample size at different genotype relative risks (GRR)

Power for the test of genotypic association as a function of sample size at different genotype relative risks (GRR). All panels are for the CEU HapMap population when the number of cases equals the number of controls and a multiplicative model is used. **(A) **Power for the Affymetrix 100 K system. **(B) **Power for the Illumina 300 K system. **(C) **Power for the Affymetrix 500 K system. **(D) **Power for the Illumina 550 K system.

Power of genome-wide association studies with various parameters. Each line of the file contains the power of a genome-wide association study conducted with the specified HapMap population, genetic model, and sample size (N) based on the SNPs present in a variety of commercially available genotyping products.

Click here for file

One critique of this approach is that the non-specific test used may not be the most powerful approach if we know the genetic model the disease follows. For instance, to study a trait that we believe follows a multiplicative model; a 2 × 2 contingency table to test for allelic association may be more appropriate. Power calculations for this test (Figure

Power for genotypic and allelic tests

Power for genotypic and allelic tests. Data is shown for a GRR of 1.5 under a multiplicative model, the CEU HapMap population, and the specified genotyping system.

Another possible criticism of this method is that the SNPs genotyped as part of the International HapMap Project may not be a representative subset of the common SNPs in the genome as a whole. To investigate this possibility, I compared the coverage of the various SNPs in the ENCODE and non-ENCODE regions from the HapMap project (Figure ^{2 }with the tag SNPs and therefore could slightly inflate the power estimation. As the fraction of SNPs with an r^{2 }greater than the cutoff differs between the ENCODE and non-ENCODE regions by at most ten percentage points, and an average of three percentage points, this overestimation is not likely to be extreme.

Coverage of tag SNPs

Coverage of tag SNPs. Fraction of non-tag SNPs in LD with a tag SNP with r^{2 }above specified threshold for the ENCODE and non-ENCODE regions of the HapMap project for the CEU and YRI populations. Results are shown for the Illumina 550 K **(A) **and Affymetrix 500 K **(B) **chips. The JPT+CHB population was not included because the curves generally overlap with the CEU curves and would make the graph harder to read. Results for the JPT+CHB population and for the other chips are qualitatively similar to the curves shown here.

An easy and useful way to compare the power of different tag SNP sets in different populations is the sample size needed to achieve 80% power. The Illumina 550 K clearly performs best in all three populations (Figure

Total individuals required for 80% power

Total individuals required for 80% power. The computations assume the number of cases equals the number of controls and a GRR of 1.75. CEU, JPT+CHB, and YRI are the HapMap populations. Affy 250 K Nsp and Affy 250 K Sty represent the two chips that make up the Affymetrix 500 K genotyping system.

Power as a function of number of chips needed for the Affymetrix 500 K system and its two components

Power as a function of number of chips needed for the Affymetrix 500 K system and its two components. Calculations are done for a GRR of **(A) **1.5 and **(B) **2.0.

I have presented a method to compute the power of a genome-wide association study in which a fixed set of tag SNPs will be genotyped. For the sake of simplicity, I only considered one straightforward single-SNP analysis scheme. While this approach has been used successfully

Conclusion

Proper design of a genome-wide association study requires careful calculation of the power. These calculations will be invaluable to anyone who is planning a genome-wide association study. Using these calculations, the proper sample size to get adequate power in a given study can be computed. Furthermore, the performance of different genotyping platforms can be compared, allowing an investigator to choose whatever is best for his or her study. By performing such calculations, genome-wide association studies can be optimized to get the maximal power possible for a given set of resources.

Methods

Genotype data and populations

I used genotype data from release 21 (phase II) of the International HapMap project

Calculation of power

To compute the overall power of an association study, I use three steps. First, I find the best tag SNP for each genotyped SNP in the data set. Then, I compute the power for each SNP assuming the specified GRR and sample size. Finally, I take an average power over all the SNPs to get the overall power.

To find the best tag SNP for each genotyped SNP, I look at the linkage disequilibrium between each SNP and all tag SNPs within 300 kb of it. For each pair of SNPs, I infer the two-locus haplotype frequencies between them using expectation maximization and compute r^{2 }between the two SNPs from the inferred haplotype frequencies ^{2}.

To compute the power for a SNP, I assume that we are looking at genotype frequency differences using a two-degree of freedom ^{2 }test. The power of this test is computed using a non-central ^{2 }distribution with non-centrality parameter ^{2 }test

where _{
A
}and _{
U
}are the number of case (affected) and control (unaffected) individuals, respectively; _{00}, _{01}, and _{02 }are the genotype frequencies in the cases; and _{10}, _{11}, and _{12 }are the genotype frequencies in the controls. If, instead of a 3 × 2 table we use a 2 × 2 table for a one-degree of freedom test of allelic association, the non-centrality parameter is given by

where _{
A
}and _{
U
}are the frequencies of allele 0 in the cases and controls, respectively.

I use the Bonferroni correction for multiple testing and require a ^{2 }for the power computation

I assume that the disease has a low enough prevalence in the population that the risk allele frequency in those without the disease approximates the risk allele frequency in the population. I can set the disease to follow a multiplicative, additive, dominant, or recessive model with a specified genotype relative risk (GRR) for the SNP of interest _{10}, _{11}, and _{12 }from the observed genotype frequencies in the population, _{00}, _{01}, and _{02 }are computed as follows:

Multiplicative

Additive

Dominant

Recessive

After the power is computed for each SNP, I take the overall power to be the average power over all the SNPs. In taking the average power over all SNPs, I give less weight to the tag SNPs since they are over-represented in the set of SNPs being analyzed. Assume that of the _{
i
}be the power for SNP

In this manner, the tag SNPs are only considered representative of themselves, while the non-tag SNPs for which we have LD data are considered representative of all common non-tag SNPs. For these calculations, I use ^{7}.

Implementation

A computer program to implement these calculations was written in C. The source code is available upon request from the author.

Authors' contributions

RJK conceived of the experiments, implemented them, analyzed the data, and wrote the manuscript.

Acknowledgements

I am grateful to Jurg Ott, in whose lab the bulk of this work was performed; Joe Garsetti from Illumina for help in obtaining the list of SNPs on the Illumina chips; and Sara Hamon for critical comments on the manuscript. This work was performed while RJK was a postdoctoral fellow funded by F32HG003681 from NIH.