Testing rare variants directly is possible with next-generation sequencing technology. In this article, we propose a sliding-window-based optimal-weighted approach to test for the effects of both rare and common variants across the whole genome. We measured the genetic association between a disease and a combination of variants of a single-nucleotide polymorphism window using the newly developed tests TOW and VW-TOW and performed a sliding-window technique to detect disease-susceptible windows. By applying the new approach to unrelated individuals of Genetic Analysis Workshop 18 on replicate 1 chromosome 3, we detected 3 highly susceptible windows across chromosome 3 for diastolic blood pressure and identified 10 of 48,176 windows as the most promising for both diastolic and systolic blood pressure. Seven of 9 top variants influencing diastolic blood pressure and 8 of 9 top variants influencing systolic blood pressure were found in or close to our top 10 windows.
Hypertension is a common chronic destructive disease with unknown complex etiology . More than1billion people worldwide have hypertension, defined as blood pressure (BP) ≥140 mm Hg systolic (SBP) or ≥90 mm Hg diastolic (DBP) , which is a major risk factor for stroke, myocardial infarction, heart failure, and a cause of chronic kidney disease [3-5]. Both genetic and environmental bases are likely to contribute to this disease. Ehret et al. conducted a large-scale genome-wide association study of hypertension in 2011 and identified 10 novel loci related to BP physiology . Although numerous common genetic variants with small effects on BP have been identified [6-8], the identified variants account for only a small fraction of disease heritability . One potential source of missing heritability is the contribution of rare variants. Recently, next-generation sequencing technologyhas enabled the sequencing of the whole genome of large groups of individuals,which makes directly testing rare variants feasible. The Genetic Analysis Workshop 18 (GAW18) data, which consists of a whole genome sequencingdata set, is a large-scale pedigree-based sample with 959 individuals, 464 directly sequenced and the rest imputed.
Several statistical methods have been proposed to detect associations of rare variants, including the combined multivariate and collapsing (CMC) method  and the weighted sum statistic (WSS) . We have proposed a novel test for measuringthe effect of an optimally weighted combination of variants (TOW) . In addition, based on the TOW, we proposed a variable weight-TOW (VW-TOW) aiming to test effects of both rare and common variants. Both TOW and VW-TOW are applicable to quantitative and qualitative traits, allow covariates, and are robust to directions of effects of causal variants.
In this article, we report a novel whole genome sliding window approach to detect genetic association between a trait and single-nucleotide polymorphism (SNP) regions across the entire genome. This approach integrates TOW and VW-TOW with the concept of sliding window . Applied to the GAW18 replication 1, chromosome 3 data set, our approach yielded results consistent with the top genes influencing simulated SBP and DBP, which were generated from the GAW18 simulation model.
Consider a sample of individuals. Each individual has been genotyped at variants in a genomic region. Denote as the quantitative trait value. Denote as the genotypic score of the ith individual, where is the number of minor alleles that the ith individual has at the mth variant.
Using the generalized linear model (GLM) to model the relationship between trait values and genotypes is equivalent to modeling the relationship between the residuals of trait values and the residuals of genotypes through GLM (1), where is a monotone "link" function.
Under the GLM, the score test statistic to test the null hypothesis is given by , where and . The statistic asymptotically follows a chi-square distribution with degrees of freedom (DF). For rare variants, however, the score test may lose power as a result of the sparse data and a large DFk. In rare variants association studies, to test for the effect of the weighted combination of variants, , the score test statistic becomes
Because rare variants are essentially independent, we have
As a function of , reaches its maximum when or . We denote as the optimal weight which is given by . Let . Then We propose the new test statistic TOW to test the effect of the optimally weighted combination of variants as . is equivalent to since is a constant. The optimal weight will put big weights to the variants that have strong associations with the traits of interest and adjust the direction of the association. Also, will put big weights to rare variants. TOW targets rare variants and will lose power when testing for the effect of both rare and common variants. For testing the effects of both rare and common variants, we propose a new statistic, VW-TOW. We divide variants into rare (minor allele frequency [MAF] <the rare variant threshold [RVT]) and common (MAF > RVT), and apply TOW to the rare and common variants separately.
Define the test statistic of VW-TOW as , where is the p value of ., and denote the test statistics of TOW for rare and common variants, respectively. Here, we evaluate the minimization by dividing the interval [0, 1] into subintervals of equal-length. Let for . Then, .
We use permutation tests to evaluate p values of both and . To evaluate the p value of the test , let denote the value of the test statistic based on the original data set. For each permutation, we randomly resample from residuals of trait values and denote the value of the test statistic based on the permuted data setby . We perform the permutation procedure many times. Then the p value of the test is the proportion of the number of permutations with . We permute times of permutations to evaluate the p value of . Let and denote the values of and based on the permuted data, where represents the original data. Based on and , we can calculate for and , where and are estimated using and . Then, we transfer to by , where is the indicator function. Let . Then the p value of is given by , where is the indicator function.
We use TOW and VW-TOW to analyze the data set of unrelated individuals of GAW18 replication 1 on chromosome 3. To apply TOW and VW-TOW to the entire chromosome 3, we propose a sliding-window approach . To use sliding windows, we divide all SNPs into contiguous windows and apply TOW and VW-TOW in each window. Suppose that we use windows with a window size of S, then, all the SNPs can be divided into windows: 1 to S, S+1 to 2S, 2S+1 to 3S, and so on.
To analyze the data set of GAW18 replication 1, chromosome 3 for unrelated individuals, we set the window size as 20. First we performed quality control tests for the genotype data with the PLINK toolset. We used 10,000,000 permutations to evaluate the empirical p values of TOW for DBP and SBP data, and 100,000 permutations to evaluate the empirical p values of VW-TOW for DBP and SBP data. Becausethe sample of unrelated individuals in GAW18 is relatively small, it is not so reasonable to claim the significance either by the false-discovery rate or by the Bonferroni-corrected threshold. Therefore, we recommend the top 10 most promising windows with the smallest p values for follow-up studies.
We applied TOW and VW-TOW incorporating the sliding window approaches to analyze the hypertension unrelated individuals'data set of GAW18. To facilitate comparisons among GAW18 contributions, we analyzed only replicate 1 on chromosome 3. To evaluate type I error rates of TOW and VW-TOW, we used all 200 replicates of simulated phenotype data. There are 157 unrelated individuals in the GAW18 pedigree-based sample. Among the 157 individuals, 142 have observations for SBP, DBP, and other demographic/clinical variables at exam 1. Our analysis was based on the 142 individuals and their genotypes, quantitative trait SBP, DBP, and other characteristicsat exam 1.
The total genotyping rate in the 142 individuals is 0.9997. We did not find any duplicated samples or sample contamination. No individual was filtered out from the multidimensional scaling (MDS) analysis. Of the 1,215,399 SNPs on chromosome 3, we removed 251,892 completely missing SNPs and retained 963,507 SNPs for final analysis. Because SBP and DBP varied by sex and increased with age, age and sex were considered as covariates in this study.
We listed the top 10 most promising windows out of 48,176 windows across the entire chromosome 3. The top 8 windows all reside in gene MAP4, which is the most susceptible gene on chromosome 3 for hypertension. Seven of 9 top variants influencing DBP and 8 of 9 top variants influencing SBP on chromosome 3 were found in or close to our top windows. Tables 1 and 2 show the top 10 most promising windows by TOW that are associated with DBP and SBP, respectively. The p values of TOW in the top 3 windows of Table 1 are very small. SNP 3_47957996, 3_ 47956424, and 3_47957741 are the third, fourth, and ninth variants in Table 2 of the GAW18 answer sheet. They all fell into our third window in Table 1 and the first window in Table 2.
Table 1. Top 10 most promising windows associated with DBP
Table 2. Top 10 most promising windows associated with SBP
To evaluate the type I error rates of the proposed sliding window approach, we chose 100 blocks (20 variants in each block) from chromosome 3 that are far from causal variants. In each block, we applied TOW and VW-TOW to each of the 200 replicates to test association between genotypes and the trait DBP. We obtained 1 p value for each replicate and each block. Figure 1 shows the histograms of TOW and VW-TOW. The histograms indicate that the type I error rates of both TOW and VW-TOW are under control.
Figure 1. Histograms of p values for TOW and VW-TOW.
In this article, we proposed a sliding-window-based optimal weighted approach to test for the effects of both rare and common variants across the whole genome. In each window, our recently developed TOW and VW-TOW were applied to test genetic association between a disease and a combination of variants. Then, we applied the method to unrelated individuals of GAW18 on replicate 1, chromosome 3. We detected 3 susceptible windows across chromosome 3 for DBP and identified 10 out of 48,176 windows as the most promising windows for DBP and SBP. Becausethis is a simulated dataset, it is possible that the other genes identified were not listed in the top 10 windows but are actually related to SBP or DBP.
In this study, we use each window of size 20 across the entire chromosome 3. How to choose an appropriate window size is a critical question. We evaluated the effect of window size by running window sizes at 30, 40, and 50, respectively. However, the power of TOW was not increased when using a larger window size. Although the power of VW-TOW was slightly increased when using a larger window size, no window can pass the entire chromosome 3 Bonferroni-corrected threshold.
TOW and VW-TOW can be robust to population stratification by adjusting the first principalcomponents (PCs) of genotypes at genomic markers as covariates when calculating the residuals of trait and of genotype matrix. In this GAW18 data analysis, we did not adjust for PCsbecausewe believed that population stratification was not severe in this data based on our MDS analysis.
To further assess our new approach, we compared the power of TOW, VW-TOW, CMC, and WSS to detect association between gene MAP4 and DBP. The MAP4 was split into 44 windows (blocks) with 20 variants in each window. In each window, we calculated the power of each method based on 200 replicates. The power comparisons based on phenotype measurement DBP are given in Figure 2. This figure shows that in most of the windows, TOW is the most powerful test; VW-TOW is the second most powerful test.
Figure 2. Power comparisons of TOW, CMC, VW-TOW, and WSS using DBP as phenotype measurement. The numbers on the x axis refer to the 44 blocks of gene MAP4.
The authors declare that they have no competing interests.
XW designed the overall study. XZ and XW conducted statistical analysis. XZ, QS, and SZ drafted the manuscript. All authors read and approved the final manuscript.
We thank Dr. Claire L. Simpson (funded by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health) for helpful PLINK format GAW18 genotype data. QS and SZ were supported by the National Human Genome Research Institute of the National Institutes of Health under Award Number R03HG006155. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The GAW18 whole genome sequence data were provided by the T2D-GENES Consortium, which is supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. The other genetic and phenotypic data for GAW18 were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, which are supported by NIH grants P01 HL045222, R01 DK047482, and R01 DK053889. The Genetic Analysis Workshop is supported by NIH grant R01 GM031575.
This article has been published as part of BMC Proceedings Volume 8 Supplement 1, 2014: Genetic Analysis Workshop 18. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcproc/supplements/8/S1. Publication charges for this supplement were funded by the Texas Biomedical Research Institute.
Lewington S, Clarke R, Qizilbash N, Peto R, Collins R, Prospective Studies Collaboration: Age-specific relevance of usual blood pressure to vascular mortality: a meta-analysis of individual data for one million adults in 61 prospective studies.
International Consortium for Blood Pressure Genome-Wide Association Studies, Ehret GB, Munroe PB, Rice KM, Bochud M, Johnson AD, Chasman DI, Smith AV, Tobin MD, et al.: Genetic variants in novel pathways influences blood pressure and cardiovascular disease risk.
Newton-Cheh C, Johnson T, Gateva V, Tobin MD, Bochud M, Coin L, Najjar SS, Zhao JH, Heath SC, Eyheramendy S, et al.: Genome-wide association study identifies eight loci associated with blood pressure.