Estimates of array and pool-construction variance for planning efficient DNA-pooling genome wide association studies
1 Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, BC, Canada
2 Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
3 Department of Biomedical Physiology and Kinesiology, Simon Fraser University, Burnaby, BC, Canada
BMC Medical Genomics 2011, 4:81 doi:10.1186/1755-8794-4-81Published: 28 November 2011
Until recently, genome-wide association studies (GWAS) have been restricted to research groups with the budget necessary to genotype hundreds, if not thousands, of samples. Replacing individual genotyping with genotyping of DNA pools in Phase I of a GWAS has proven successful, and dramatically altered the financial feasibility of this approach. When conducting a pool-based GWAS, how well SNP allele frequency is estimated from a DNA pool will influence a study's power to detect associations. Here we address how to control the variance in allele frequency estimation when DNAs are pooled, and how to plan and conduct the most efficient well-powered pool-based GWAS.
By examining the variation in allele frequency estimation on SNP arrays between and within DNA pools we determine how array variance [var(earray)] and pool-construction variance [var(econstruction)] contribute to the total variance of allele frequency estimation. This information is useful in deciding whether replicate arrays or replicate pools are most useful in reducing variance. Our analysis is based on 27 DNA pools ranging in size from 74 to 446 individual samples, genotyped on a collective total of 128 Illumina beadarrays: 24 1M-Single, 32 1M-Duo, and 72 660-Quad.
For all three Illumina SNP array types our estimates of var(earray) were similar, between 3-4 × 10-4 for normalized data. Var(econstruction) accounted for between 20-40% of pooling variance across 27 pools in normalized data.
We conclude that relative to var(earray), var(econstruction) is of less importance in reducing the variance in allele frequency estimation from DNA pools; however, our data suggests that on average it may be more important than previously thought. We have prepared a simple online tool, PoolingPlanner (available at http://www.kchew.ca/PoolingPlanner/ webcite), which calculates the effective sample size (ESS) of a DNA pool given a range of replicate array values. ESS can be used in a power calculator to perform pool-adjusted calculations. This allows one to quickly calculate the loss of power associated with a pooling experiment to make an informed decision on whether a pool-based GWAS is worth pursuing.