Abstract
Background
In designing genomewide association (GWA) studies it is important to calculate statistical power. General statistical power calculation procedures for quantitative measures often require information concerning summary statistics of distributions such as mean and variance. However, with genetic studies, the effect size of quantitative traits is traditionally expressed as heritability, a quantity defined as the amount of phenotypic variation in the population that can be ascribed to the genetic variants among individuals. Heritability is hard to transform into summary statistics. Therefore, general power calculation procedures cannot be used directly in GWA studies. The development of appropriate statistical methods and a userfriendly software package to address this problem would be welcomed.
Results
This paper presents GWAPower, a statistical software package of power calculation designed for GWA studies with quantitative traits, where genetic effect is defined as heritability. Based on several popular onedegreeoffreedom genetic models, this method avoids the need to specify the noncentrality parameter of the Fdistribution under the alternative hypothesis. Therefore, it can use heritability information directly without approximation. In GWAPower, the power calculation can be easily adjusted for adding covariates and linkage disequilibrium information. An example is provided to illustrate GWAPower, followed by discussions.
Conclusions
GWAPower is a userfriendly free software package for calculating statistical power based on heritability in GWA studies with quantitative traits. The software is freely available at: http://dl.dropbox.com/u/10502931/GWAPower.zip webcite
Background
Statistical power and sample size calculation is an important step during experiment design in genomewide association (GWA) studies. It estimates the probability that a true genetic effect can be detected under the experimental constraints and other reasonable assumptions. To date, more than 230 GWA studies have been conducted and reported. Among these, case control design (>100 studies) and cohort studies with unrelated subjects (>70 studies) are the two most popular designs. In the majority of cohort studies, one or more phenotypic variables are quantitative traits. A number of statistical power calculation softwares have been developed for casecontrol GWA studies [15]. This report presents a simple and userfriendly statistical software, GWAPower, for power calculation of GWA studies with quantitative traits.
General statistical power calculation procedures for quantitative measures often assume normality and require information of the first two distribution moments (means and variances) of the quantitative measure. These power calculation procedures cannot be applied directly in genetic studies. One challenge is that a key parameter, i.e. the size of the hypothesized genetic effect, is represented in a genetic term, "heritability". In its broad sense, heritability is defined as the amount of phenotypic variation in the population that can be ascribed to the genetic variation among individuals. If the total variability of the phenotype is V(t) and the variations that can be explained by genetic variants and error terms are V(g) and V(e), respectively, where V(t) = V(g) + V(e), then the heritability (H) is estimated as: H = V(g)/V(t). Although H is intrinsically related to the distribution of the quantitative measure, the exact mathematical transformation formula is not clear; given the value of H, the first two moments of the quantitative measure cannot be determined directly.
However, a power calculation procedure based on heritability is possible for GWA studies, since most applied genetic models in GWA analysis are simple and have only one degree of freedom. Using straightforward algebra, a sound statistical procedure and convenient graphical user interface software have been developed to estimate the statistical power for quantitative traits in GWA studies.
Results and Discussion
Methods for calculating power with heritability in onedegreeoffreedom models
The heritability (H) can be estimated by the Analysis of Variance (ANOVA) approach. For independent random samples drawn from natural populations, H is the coefficient of determination, or R^{2}. In general, the ANOVA Ftest can be used as the base of power calculations. However, this approach requires the noncentrality parameter λ of the Fdistribution, which is defined as:
where τ_{i }is the phenotype mean of genotype group i; σ^{2 }is the error variation; and r_{i }is the number of individuals in each genotype group i. Therefore, distribution parameters such as means and variances are still necessary for this approach.
We provide a solution that avoids assuming the noncentrality parameter in power calculations. In GWA data analysis, some popular genetic models have one degree of freedom including the dominant model, the recessive model and the additive model. When these genetic models are assumed, the oneway ANOVA model (dominant and recessive models) and the simple linear regression model (additive models) are used in the majority of GWA data analyses for quantitative traits. In these studies, the Fstatistic is related to Student's t statistic, i.e. if a random variable X has a t distribution with d degrees of freedom, then X^{2 }has F distribution (1, d). This is true for central and noncentral distributions [6]. This suggests that so long as the genetic model contains only one degree of freedom, then an F distribution can be used to perform the square root transformation to obtain the t statistics for the power calculation.
For dominant and recessive models, the oneway ANOVA table and the terminologies are described in Table 1. The heritability is defined as H = SSM/SST. Given H and the total sample size N, SSE = (1H)SST, MSE = (1H)SST/(N2), and F = H(N2)/(1H), which is now independent of SSM and SSE. The F statistics can be converted into t statistics using the square root. The (relative) standard deviation of the ttest is std = square root (MSE). The (relative) numerator of the tstatistic is: Δm = std*t. For the additive model, assuming a linear regression model is used, the variance can be similarly decomposed. For all models, the Fstatistic is a function only of H and N. The power under the twosided ttest at level α is expressed as
Table 1. The ANOVA table for the dominate and the recessive model
It should be noted that (1) the absolute values of SST and SSM are unknown, as are the absolute values of numerator and denominator of the tstatistic. However, in this approach, only the relative values are necessary to calculate power; (2) the ttest is associated with the ANOVA table. In this study, an ANOVA (and the associated Ftest) is used owing to its ability to accommodate the concept of heritability and to adjust for covariate effects. Therefore, if a dominant or a recessive genetic model is considered, the ttest compares two sample means within an ANOVA framework, although an ANOVA ttest is different from a "two sample ttest".
The Software
GWAPower calculates statistical power using the following input parameters: (1) The heritability (a number in the range 0~1); (2) Total sample size; (3) The number of SNPs in the GWA study; (4) The type 1 error rate: The default value is calculated using a Bonferroni correction. A snapshot of the main page of the software is presented in Figure 1.
Figure 1. A snapshot of the GUI interface of GWAPower.
GWAPower considers other design factors so that users have the options to explore the power of the study under various situations. The factors include:
(1) Covariates: users can input the percentage of the variation explained by the covariates, in addition to the genetic effects and the number of total covariates. The information is included in the ANOVA table and the statistical power is adjusted accordingly, i.e. the degree of freedom of the ttest is N2number of covariates;
(2) Linkage Disequilibrium (LD): users can input the twolocus LD (r^{2}) between the SNP marker and the hypothesized causal variant. When the LD between the genotyped and "causative" marker is r^{2}, the sample size is increased by 1/r^{2}, because in order to have the same power to detect a causative variant when using a marker that is r^{2 }away, an increase in sample size of 1/r^{2 }is required [7,8].
The output information include: (1) the statistical power; (2) a family of power curves outputted with different heritability and sample size combinations, as demonstrated in Figure 2.
Figure 2. GWAPower outputs a family of power curves. The calculated power with parameters inputted from the main page is also identified in the graph.
Implementation
The GWAPower package was developed using Microsoft Visual C++ 2008 and consists a main dialog graphical user interface (GUI). Users can download and install the compiled versions of the GUI that run as standalone applications under Windows operating systems. A demonstration of the GUI for power calculations with the corresponding parameters setting is presented in Figure 1. By clicking the button "SNPs Power Curves", the GUI displays power curves of various sample size with H = 1% ~ 10%. The graphs can be copied into a Windows clipboard by using Menu  Copy to Clipboard. The graphs can be pasted into other Windows applications.
Example for HIV studies
The betaversion of GWAPower has been used to support the designs of several GWA studies [911]. A GWA study was conducted to search for host genetic variants related to HIV setpoint variation. In the first stage of the study, 486 HIV infected Caucasian subjects were genotyped, and the associations between the SNPs and one quantitative phenotype, i.e. the viral load at the setpoint, were tested [10]. In the second stage, 2068 subjects were genotyped from the same population, to increase the total sample size to 2554 [11]. Using GWAPower, the statistical powers of the two stages of the studies were calculated (Table 2). The results indicated that assuming H = 7%, a sample size of 486 was adequate to provide 80% statistical power to detect the associated SNP; with the sample size increased to 2554, the smallest H that could be detected was 1.4%. In real data analyses, the largest H was estimated at 9.6%.
Table 2. The statistical power estimated by GWAPower for the HIV studies
Conclusions
The GWAPower package provides a simple and useful statistical power calculation procedure for GWAS with quantitative traits. It is designed specifically to allow genetic researchers to use the genetic term, heritability, instead of the general statistical term, 'phenotype means of each genotype', in power calculations. Current users have reported that the tool is very userfriendly and easy to operate.
Although GWAPower is developed for GWA studies, it can be used for general genetic studies if researchers wish to use heritability as the parameter for genetic effect size. However, this approach should be used with caution. Some complicating factors are not considered such as population stratification, various interactions among genetic variants, and environmental factors. Furthermore, Bonferroni correction for multiple tests is known to be overconservative. Therefore, for complicated designs with multiple confounding factors, researchers are encouraged to consult a statistical geneticist.
One reviewer has reported that another software "genetic power calculator" (GPC) [1] can be used to estimate statistical power in GWA studies with quantitative traits, with some simple adjustments. A major difference between GPC and GWAPower is that GPC is based on a comprehensive mixed model approach [12], making it amenable for use in various experimental designs. GWAPower takes advantage of the fact that most GWA study designs are reasonably simple, and much simpler models such as ANOVA or regression models are often adequate. In such cases, the approximations of estimating distribution parameters are not necessary.
Authors' contributions
SF and LL derived the method. SF and SW developed the software. SF, SW, CCC and LL wrote the manuscript. All authors read and approved the final manuscript.
Acknowledgements
This study was supported by funding from the Measurement to Understand ReClassification of Disease of Cabarrus and Kannapolis (MURDOCK) Study and by a grant from the NIH CTSA (Clinical and Translational Science Award) 1UL1RR02412801 to the Duke University.
References

Purcell S, Cherny SS, Sham PC: Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits.
Bioinformatics 2003, 19(1):149150. PubMed Abstract  Publisher Full Text

Gordon D, Haynes C, Blumenfeld J, Finch SJ: PAWE3D: visualizing power for association with error in casecontrol genetic studies of complex traits.
Bioinformatics 2005, 21(20):39353937. PubMed Abstract  Publisher Full Text

Skol AD, Scott LJ, Abecasis GR, Boehnke M: Joint analysis is more efficient than replicationbased analysis for twostage genomewide association studies.
Nat Genet 2006, 38(2):209213. PubMed Abstract  Publisher Full Text

Menashe I, Rosenberg PS, Chen BE: PGA: power calculator for casecontrol genetic association analyses.

Edwards BJ, Haynes C, Levenstien MA, Finch SJ, Gordon D: Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies.

Shao J: Mathematical statistics: exercises and solutions. 2nd edition. Springer; 2005.

Risch N, Teng J: The relative power of familybased and casecontrol designs for linkage disequilibrium studies of complex human diseases I. DNA pooling.
Genome Res 1998, 8(12):12731288. PubMed Abstract  Publisher Full Text

Pritchard JK, Przeworski M: Linkage disequilibrium in humans: models and data.
Am J Hum Genet 2001, 69(1):114. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Weedon MN, Lango H, Lindgren CM, Wallace C, Evans DM, Mangino M, et al.: Genomewide association analysis identifies 20 loci that influence adult height.
Nat Genet 2008, 40(5):575583. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, CozziLepri A, Luca AD, Easterbrook P, Francioli P, Mallal S, MartinezPicado J, Miro JM, Obel N, Smith JP, Wyniger J, Descombes P, Antonarakis SE, Letvin NL, McMichael AJ, Haynes BF, Telenti A, Goldstein DB: A wholegenome association study of major determinants for host control of HIV1.
Science 2007, 317(5840):944947. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, Zhang K, Gumbs CE, Smith JP, Castagna A, CozziLepri A, De Luca A, Easterbrook P, Günthard HF, Mallal S, Mussini C, Dalmau J, MartinezPicado J, Miro JM, Obel N, Wolinsky SM, Martinson JJ, Detels R, Margolick JB, Jacobson LP, Descombes P, Antonarakis SE, Beckmann JS, O'Brien SJ, Letvin NL, McMichael AJ, Haynes BF, Carrington M, Feng S, Telenti A, Goldstein DB: Common genetic variation and the control of HIV1 in humans.
PLoS Genet 2009, 5(12):e1000791. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Sham PC, Cherny SS, Purcell S, Hewitt JK: Power of linkage versus association analysis of quantitative traits, by use of variancecomponents models, for sibship data.
Am J Hum Genet 2000, 66:16161630. PubMed Abstract  Publisher Full Text  PubMed Central Full Text