Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Department of Health and Human Services, Rockville, MD, USA

Department of Mathematics and Statistics, Concordia University, Montréal, Québec, Canada

Abstract

Background

Statistical power calculations inform the design and interpretation of genetic association studies, but few programs are tailored to case-control studies of single nucleotide polymorphisms (SNPs) in unrelated subjects.

Results

We have developed the "Power for Genetic Association analyses" (PGA) package which comprises algorithms and graphical user interfaces for sample size and minimum detectable risk calculations using SNP or haplotype effects under different genetic models and study constrains. The software accounts for linkage disequilibrium and statistical multiple comparisons. The results are presented in graphs or tables and can be printed or exported in standard file formats.

Conclusion

PGA is user friendly software that can facilitate decision making for association studies of candidate genes, fine-mapping studies, and whole-genome scans. Stand-alone executable files and a Matlab toolbox are available for download at:

Background

Case-control genetic association studies are increasingly being used in studying the genetic basis of human complex traits

The principals for power calculation can be found in standard statistical textbooks. Moreover, the scientific literature describes the mathematics of power analyses for a variety of specialized experimental designs

Implementation

The "Power for Genetic Association Analyses" (PGA) package was developed in Matlab and consists a toolbox of command line functions and three unifying graphical user interfaces (GUIs). Users with a Matlab software can run the three GUIs or the command line functions in Matlab environment. Users without a Matlab license can download and install the compiled versions of the three GUIs that run as stand-alone applications under Windows XP or Vista operating systems.

The program assumes that SNPs are biallelic and in Hardy-Weinberg equilibrium. All statistical tests are two-sided. The GUIs called PGA1 and PGA2 can display up to 9 scenarios simultaneously. Hence, they can be used to identify a robust choice of sample size. The graphs produced by each GUI can be printed or exported as TIF files, and tables of numerical results can be exported as HTML or csv files.

Results

The GUI called PGA1 provides a computational and graphical interface for the relation between statistical power and sample size for dominant, co-dominant and recessive SNP or haplotype effect (Figure ^{2 }= 1.0) is 800 cases and 800 controls (Figure

Graphical user interfaces for statistical power calculations

**Graphical user interfaces for statistical power calculations.** (A) PGA1 – statistical power is calculated and plotted for different sample sizes and various genetic and statistical parameters. Input variables (e.g. 'Genetic mode of inheritance', 'disease allele frequency', 'relative risk (RR)', etc.) can be specified using slider controls, or by typing specific values in the corresponding text boxes. Pressing the 'Run' button executes the calculations and plots the relationships between power and sample size according to the specified study parameters. A keyed legend listing the corresponding parameters is shown on the graph. Up to eight different analyses (color-coded) can be displayed simultaneously, allowing the comparison of different scenarios. (B) PGA2 – Minimal detectable relative risk (MDRR) is calculated and plotted for various minor allele frequencies (MAFs) of potential genotyped loci. Input and output is similar to PGA1.

The GUI PGA2 has a similar interface to PGA1, but it is designed to calculate and plot the minimum detectable relative risk (MDRR) for genetic loci, given a fixed number of cases and controls, according to their minor allele frequencies (MAFs). MDRR can calculate the smallest relative risk that can be detected, with sample in hand, at the target level of power. Hence, PGA2 can assist in designing fine mapping studies of prominent genomic loci, identified from familial linkage analyses or genome-wide association studies. For example, multiple markers along a 600-kb segment on human chromosome 8q24 have recently been associated with prostate cancer susceptibility

An important utility for PGA1 and PGA2 is the GUI EDF, which calculates the effective degrees of freedom (EDF) for a particular set of SNP genotypes in linkage disequilibrium. This tool allows the user to assess the extent of multiple testing that is often overestimated or underestimated in naive power analyses. The EDF calculator accepts as input genotype data files from Hapmap ^{2}) among the SNPs in the dataset, and from these data computes a summary measure of the EDF

Supplementary Methods.

Click here for file

Effective degrees of freedom calculator

**Effective degrees of freedom calculator.** (A) HapMap SNP genotype data from human chromosome 8q24 (chr8:128100000-128700000) is used as an input. The calculated EDF for SNPs with MAF > 0.05 in this dataset is 608. (B) LD map for the selected SNPs is also displayed in the output.

All the procedures included in the PGA GUIs are available in a single Matlab toolbox and can be executed at the Matlab command line. This allows Matlab users to use some of the incorporated functions in their own Matlab scripts. For example, to calculate EDF for 100 different regions with 80 SNPs each, took ~176 sec to run using a Windows XP dual 3.19 GHz, Intel Xion workstation.

Discussion

The PGA package is well suited for power calculations where relatively small genomic regions are scanned for disease susceptibility loci. However, it can also be used to assess larger regions and even genome-wide association studies, via appropriate specification of the false positive rate, i.e. α/m where m is the number of genotyped markers in the study. Similarly to other popular software in this field

Other freely-available software packages have features that are complimentary to PGA (see Additional file

Table 1. Major features of four commonly used power software for case-control association studies.

Click here for file

Conclusion

The PGA package assembles a broad spectrum of statistical power calculations for genetic association studies in a single Matlab toolbox and three stand-alone GUIs. The software offers user-friendly tools for advanced calculations of statistical power and sample size and presents the results 'on the fly' in graphs and tables. Hence, PGA may significantly facilitate decision making and interpretation of association studies of candidate genes, fine-mapping studies, and genome-wide scans.

Availability and requirements

• **Project name**: Power for genetic association analyses (PGA).

• **Project home page**:

• **Operating system(s)**: Windows XP & Vista.

• **Programming language**: Matlab.

• **Other requirements**: To run the stand-alone GUIs, users without Matlab licenses should install first the MATLAB Component Runtime (MCR) that is available in the PGA home page.

• **Any restrictions to use by non-academics**: None

• **Reviewers access to the software**: reviewers can download the software in a way that preserves their anonymity, through the following links:

Readme file:

PGA.exe file:

MCRinstaller file:

Authors' contributions

IM drafted the manuscript and assisted in the design and implementation of the software. PSR conceived of the study, assisted in the design and implementation of the software and in drafting the manuscript. BEC developed the software and helped draft the manuscript.

Acknowledgements

This research was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics.