Department of Epidemiology, Erasmus MC, Rotterdam, 3000 CA, The Netherlands

Quantitative Integrative Genomics Group, Institute of Cytology and Genetics SD RAS, Novosibirsk, 630090, Russia

Abstract

Background

Presence of interaction between a genotype and certain factor in determination of a trait's value, it is expected that the trait's variance is increased in the group of subjects having this genotype. Thus, test of heterogeneity of variances can be used as a test to screen for potentially interacting single-nucleotide polymorphisms (SNPs). In this work, we evaluated statistical properties of variance heterogeneity analysis in respect to the detection of potentially interacting SNPs in a case when an interaction variable is unknown.

Results

Through simulations, we investigated type I error for Bartlett's test, Bartlett's test with prior rank transformation of a trait to normality, and Levene's test for different genetic models. Additionally, we derived an analytical expression for power estimation. We showed that Bartlett's test has acceptable type I error in the case of trait following a normal distribution, whereas Levene's test kept nominal Type I error under all scenarios investigated. For the power of variance homogeneity test, we showed (as opposed to the power of direct test which uses information about known interacting factor) that, given the same interaction effect, the power can vary widely depending on the non-estimable direct effect of the unobserved interacting variable. Thus, for a given interaction effect, only very wide limits of power of the variance homogeneity test can be estimated. Also we applied Levene's approach to test genome-wide homogeneity of variances of the C-reactive protein in the Rotterdam Study population (

Conclusions

Screening for differences in variances among genotypes of a SNP is a promising approach as a number of biologically interesting models may lead to the heterogeneity of variances. However, it should be kept in mind that the absence of variance heterogeneity for a SNP can not be interpreted as the absence of involvement of the SNP in the interaction network.

Background

Genome-wide association (GWA) study has become the tool of choice for the identification of loci associated with complex traits. In GWA analysis, the association between a trait of interest and genetic variation is studied by using thousands of subjects typed for hundreds of thousands of polymorphisms. Thus several hundred loci for dozens of complex human disease and quantitative traits have been discovered utilizing this method

However, it has become clear that for most complex traits, loci discovered using GWA studies currently explain a small portion of total trait's heritability and are not likely to explain all of the heritability of the trait even with additional new loci discovered using progressively larger sample sizes

If a method allowing detection of SNPs potentially involved in interaction networks based on the SNP and trait information (but not the information about the interacting factor(s)) existed, that would provide a substantial advancement to the field. Indeed, if such method existed, we could first screen potentially interacting SNPs using such method, and then restrict the search for the other interacting factor (genetic or environmental) to these SNPs only, dramatically decreasing the search space.

It has been suggested that analysis of equality and heterogeneity of variances of the trait between different genotypes may become such a tool

Distribution of hypothetical trait with expectation determined by genotype and its interaction with a binary trait

**Distribution of hypothetical trait with expectation determined by genotype and its ****interaction with a binary trait**. A, B, C: distribution of the trait for genotypes AA, AB, BB, correspondingly in a case when the interacting factor is present. D, E, F: distribution of the trait for genotypes AA, AB, BB, correspondingly in a case when the factor is absent. G, H, I: distribution of the trait for genotypes AA, AB, BB, correspondingly in a case when factor is unknown. In this case the distributions present mixtures of upper two ones.

In this work, we assume an underlying model, in which the trait is generated based on knowledge of the SNP genotype and the interacting factor, and using fixed assumed model parameters. The analysis of variances of the trait is based on SNP information only, as the interacting factor is assumed to be unknown in such analysis aimed to identify potentially interacting SNPs without knowledge of an interacting variable. Using this defined framework we first evaluate type I error of different variance heterogeneity tests using simulated data. Second, assuming known interaction model involving SNP and an interacting factor, we relate the power of the variance heterogeneity test to the parameters of the underlying model.

Underlying model of the trait

We assumed the following linear model:

where _{
i
}is a value of the trait for ^{
th
}individual, _{
g
}is effect of a SNP, _{
F
}is effect of an interacting factor, _{
gF
}is effect of interaction between the SNP and the factor, _{
i
}~ _{
g
}, _{
B
}) is a SNP, which is assumed to be binomialy distributed with _{
g
}= 2 (number of alleles in the genotype) and _{
B
}∈ [0; 1] (frequency of the interacting _{
F
}and variance _{
i
}is residual random error. Since many traits regularly are not normally distributed we studied seven types of distribution of ϵ_{
i
}: normal distribution, ^{2 }distributions (with _{
i
}was standardized to have zero mean and variance of one. We assumed that the distributions of _{
i
}, _{
i
}, and ϵ_{
i
}are independent.

Without loss of generality we can assume that _{
F
}= 0, and

Homogeneity of variance tests

Bartlett's test is defined as:

where _{
j
}is the sample size of the ^{
th
}group (_{
a
}
_{=}
_{
b
}is an indicator variable taking value one if _{
i
}is a value of the trait for _{
th
}individual, _{
i
}is a SNP of ^{
th
}individual, ^{2}, is distributed as

Bartlett's test with prior rank-transformation to normality was done by applying Bartlett's test to a transformed trait. Rank-transformation to normality is transformation (in absence of ties) that leaves the same ranks but distribution becomes perfectly normal.

Levene's (Brown-Forsythe) test is defined as:

where _{
i
},

Under a null hypothesis of variance homogeneity, the value of the test, ^{2}, is distributed as _{1 }= (_{2 }= (^{2 }is excellently approximated with

The number of genotypes is at the most three, which corresponds to genotypes

Simulations

To study Type I error, simulations were performed. Effects of a factor and an interaction term were set to zero (_{
F
}= _{
gF
}= 0). Interacting allele frequencies studied were set to 5%, 10%, 25%, and 50%. For each fixed allelic frequency, we set the effect of SNP, _{
g
}in order to explain 0%, 1%, and 5% of the total variance of the trait. Denoting this proportion as ^{2}, the corresponding SNP effect was computed as

where

Under the alternative hypothesis, assuming normally distributed residual error, we have developed an analytical expression for **Power **in section **Results**). To check correctness of our analytical solutions, we have studied several points from the model space by simulations. The parameters studied were allele frequency _{
B
}= {0.05, 0.5}, SNP effect _{
g
}= {0, 0.3}, and effect of factor _{
F
}= {0, 1}.

Power of direct test for interactions

The difference in power between direct method and variance homogeneity tests were also studied. Direct test was defined as regression analysis when all variables, including the interacting factor, are known and relationships between dependent and independent variables are estimated.

Power is a function of non-centrality parameter. Analytical expression for non-centrality parameter (_{
gF
}by direct test is

Where

Results

Type I error

Figure _{
B
}= 10%.

Type I error at the threshold corresponding to

**Type I error at the threshold corresponding to α = 5% for interacting allele frequency 10%**. A: SNP effect is absent, B: SNP effect explains 5% of total trait's variance.

From Figure

Bartlett's test with prior rank transformation to normality has acceptable type I error 5% only in case of SNP effect absence. Only type I error of Levene's test does not show dependence on model parameters. In case of SNP effect presence, rank transformation to normality of a trait which follows a non-normal distribution results to perfectly normally distributed trait whereas distribution of a trait for each genotype becomes distorted. Additional file

**Supplementary figures**. The file contains the following figures: Figure S1: Distribution of a trait for each genotypic groups and for all groups together before transformation to normality of a trait and after transformation. Figure S2: Dependence of power on interaction effect for direct test and different variance homogeneity tests. Figure S3: Dependence of non-centrality parameter of variance homogeneity test on effect of a factor for a case when group AA is tested against AB and BB. Figure S4: Dependence of non-centrality parameter of variance homogeneity test on effect of a factor for a case when group AB is tested against AA and BB. Figure S5: Dependence of non-centrality parameter of variance homogeneity test on effect of a factor for a case when group BB is tested against AA and AB. Figure S6: Dependence of power of variance homogeneity test on interaction effect for threshold α corresponding to 5·10^{-8 }and 0.01. Figure S7: Genome-wide -_{
value
}) and Q-Q plot for Levene's variance homogeneity test applied for the Rotterdam Study.

Click here for file

Results for type I error for other frequencies of interacting allele are similar to those shown in Figure _{
B
}= 5%, 10%, 25%, and 50%

**Type I error for a case when all three genotypes are tested against each other**. Type I error for variance homogeneity tests when there is effect of SNP which explains 0%, 1%, and 5% of total trait's variance for different frequency of interacting allele (5%, 10%, 25% and 50%) and for different distribution of residual error (normal, three types of t and chi square distribution).

Click here for file

Results for type I error for one degree of freedom tests are presented in the tables of Additional file

**Type I error for a case when genotype AA is tested against AB and BB**. Type I error for 1df variance homogeneity tests when AA is tested against AB and BB when there is effect of SNP which explains 0%, 1%, and 5% of total trait's variance for different frequency of interacting allele (5%, 10%, 25% and 50%) and for different distribution of residual error (normal, three types of t and chi square distribution ).

Click here for file

**Type I error for a case when genotype AB is tested against AA and BB**. Type I error for 1df variance homogeneity tests when AB is tested against AA and BB when there is effect of SNP which explains 0%, 1%, and 5% of total trait's variance for different frequency of interacting allele (5%, 10%, 25% and 50%) and for different distribution of residual error (normal, three types of t and chi square distribution ).

Click here for file

**Type I error for a case when genotype BB is tested against AB and AA**. Type I error for 1df variance homogeneity tests when BB is tested against AA and AB when there is effect of SNP which explains 0%, 1%, and 5% of total trait's variance for different frequency of interacting allele (5%, 10%, 25% and 50%) and for different distribution of residual error (normal, three types of t and chi square distribution ).

Click here for file

Power

We have derived an expression for dependence of trait's variances on model parameters for each genotype of a SNP.

where

These expressions can be substituted to expression (2) to obtain expected _{
gF
}by direct test does not depend on effect of factor (_{
B
}= {0.05, 0.4, 0.6, 0.95} and different effects of interaction: the top curve on each plot shows results for interaction effect equals _{
gF
}= 1, the middle curve is for _{
gF
}= 0.5, and the bottom curve is for _{
gF
}= 0.1.

Dependence of non-centrality parameter of variance homogeneity test on main effect of a factor

**Dependence of non-centrality parameter of variance homogeneity test on main effect of a factor**. The top curve on each plot shows results for interaction effect _{gF }= 1, the middle curve is for _{gF }= 0.5, and the bottom curve is for _{gF }= 0.1. Each subplot shows different frequency of interacting allele. (A - 0.05, B - 0.4, C - 0.6, D - 0.95).

One can see that non-centrality parameter grows with increasing of interaction effect and minor allele frequency. The dependence is not monotonic and there are certain optimal effects of the factor

The plots for such dependence but for one degree of freedom tests are similar. They are shown in Additional file

It is of interest to note that _{
B
}(say 0.05 and 0.95) may look like mirror images at first glance: however, this symmetry is not complete. Asymmetry between plots for complementary frequencies can be explained by taking into account that heterogeneity of variances for a case

whereas in an opposite case, when genotype

The optimal effect of factor in the first case is given by

Similarly, in second case,

Figure

Dependence of power to detect interaction (left plot) with threshold corresponding to

**Dependence of power to detect interaction (left plot) with threshold corresponding to α = 0.05 and non-centrality parameter (right plot) on effect of interaction**. Thin curve on each subplot corresponds to direct test, bold curve corresponds to upper limit of variance homogeneity test. Each subplot corresponds to different frequency of interacting allele (A - 0.05, B - 0.4, C - 0.6, D - 0.95).

Such a dependence but for threshold corresponding to ^{-8 }and

Table ^{-8}).

Power of variance homogeneity test under optimal effect of factor when power of direct test is 80%.

**5%**

**40%**

**60%**

**95%**

0.05

0.414

0.409

0.409

0.414

0.01

0.342

0.334

0.334

0.342

5·10^{-8}

0.125

0.107

0.107

0.125

Each column presents allele frequency of interacting allele, each row presents threshold

Performance of proposed method on real data

In order to measure the performance of the proposed method using clinical data, we applied Levene's variance homogeneity test on genome wide data for C-reactive protein (CRP), an inflammatory marker in the Rotterdam Study.

The Rotterdam Study (RS) _{
value
}= 4.77^{-06 }corresponded to SNP rs2399332 which is located on chromosome 3.

In the work of Guillaume Pare et al _{
value
}= 1.6^{-29}. We tested the same SNP in Rotterdam Study and found a _{
value
}of 0.011, with minor allele frequency of 0.385 for the risk-allele "G". The trait variances (and sample size) for genotypes

Discussion

Assuming that a genotype interacts with some factor in determination of a trait's value, it is expected that the trait's variance is increased in the group of subjects having this genotype. Thus, test of heterogeneity of variances can be proposed as a test to screen for potentially interacting SNPs. In this work, we evaluated type I error and power of variance heterogeneity analysis in respect to the detection of potentially interacting SNPs under the scenario when an interaction variable is unknown.

Three different tests of variance homogeneity were chosen in order to investigate their type I error performance. They are Bartlett's, Bartlett's with prior rank-transformation to normality of a trait and Levene's (Brown-Forsythe) tests. Not surprisingly, our results were in agreement with what is known from standard statistical theory

We showed that even if a large interaction effect is present, the power of the "screening" variance heterogeneity test depends strongly on the main effect of the interacting factor and may be quite limited. This results may at first seem surprising and contra-intuitive. To help better understanding of this phenomenon, here we provide a simple example of situation when there is an interaction effect, but the variances for all genotypes are equal, thus the variance test has no power. Consider binary factor **F **∈ {-1, 1} with effect on the trait - in accordance to our previous notation - equal to _{
F
}, and frequency of "1" denoted as _{
g
}= 0. Let us denote the effect of genotype by factor interaction as _{
gF
}. Let the residual variance is **F **= -1) = -_{
F
}(when the value of factor is -1) and **F **= 1) = _{
F
}. For genotype "1", the expectations are **F **= -1) = -_{
F
}- _{
gF
}and **F **= 1) = _{
F
}+ _{
gF
}. It is easy to see that the conditional variance of the trait in genotype _{
gF
}= 0 (absence of interaction) or _{
F
}= -_{
gF
}/2. Taking a simple example with _{
F
}= -_{
gF
}/2, the conditional variances _{
g
}= 0). As _{
F
}deviates from -_{
gF
}/2 in any direction, the conditional variance _{
F
}| → ∞,

While in this work we consider a model assuming a SNP having additive effect and following Hardy-Weinberg distribution and an interaction factor following normal distribution, the same principal result - non-monotonic dependence of the power of variance test on the main effect of interacting variable - should hold for other models and other types of interacting factor (e.g. binary, as we show above, or three-level, such as other SNPs); also, a deviation from HWE will not affect our major conclusions.

Our analysis of power was performed using Bartlett's test. Bartlett's has highest power in case of normally distributed trait, but is not robust to non-normality in trait distribution. Levene's test has better performance under deviations from normality, but has lower power compared to Bartlett's test. Therefore our principal findings will not change whether Bartlett's or Levene's test is used: particular figures provided estimate maximal power, but the relation of the power to the underlying model parameters will be the same for both tests.

We considered testing for heterogeneity of variances as a screening tool for potentially interacting SNPs in the context of population-based design. It has been proposed that this testing can be more effectively done in the context of monozygotic twins or migrant studies

Thus, for a wide range of designs, models and test used, we can conclude that that absence of significant heterogeneity of variances can not be interpreted as absence of strong interaction because the power of the variance test depends much on the main effect of the (unobserved) interacting factor.

It is interesting to consider whether presence of significant variance heterogeneity tells us that a SNP indeed interacts with some factor. First of all, variance heterogeneity will be detected for a SNP having main effect when the distribution of the trait is heteroscedastic, i.e. the variance increases with the mean - a situation rather common in biology. This suggests that prior test for heteroscedasity should be performed before running variance heterogeneity as an "interaction screening" test. Another - biological - possibility is that a genotype indeed affects the variance of the trait without any specific interaction. We can speculate that there may be genotypes which affect the stability of development or homeostasis, leading to wider trait's variance.

Detection of a variance homogeneity for a given SNP does not necessary indicate that a single factor is interacting with a studied SNP. Moreover, it can suggest the presence of a complex network with many other SNPs and factors involved. The variance heterogeneity test may be especially effective to detect such SNPs - in case of multiple interacting factors it is very unlikely that the cumulative effects of the interacting factor will fall into the point at which the power of the variance test is minimal.

Further dissection of the SNPs demonstrating strong heterogeneity of variances may be a challenging task, requiring the search of the interactors through phenomic screening. Straightforward testing whether the identified interactor does explain heterogeneity of variances can be easily performed by using the variance homogeneity test on the residuals from the regression involving identified factor.

A number of genetic interaction models may lead to variance heterogeneity. These are straightforward interaction models as discussed above, when an environmental of other genetic factor changes the expectation of the trait value in the concert with the SNP studied. Other interesting model, leading to specific increase of the variance of the heterozygous genotype, is parent-of-origin model, when the expectation of the trait in heterozygous individuals (

We showed that when one interacting factor is considered, the power of direct test, exploiting the knowledge of the interacting factor, is always greater then the power of the variance heterogeneity test. An interesting scenario in which the power of variance heterogeneity test may be greater than the power of direct test occurs when multiple interacting factors induce variance heterogeneity, in which case the power of identification any single of them (or all together) may be - due to small effects associated with particular interacting factor and with increased number of degrees of freedom - lower then the power of variance heterogeneity test.

In present GWAS, association between a SNP and a trait is studied by detecting difference between mean values of the genotypes for a given SNP. We conclude that screening for differences in variances is a promising approach as a number of biologically interesting models may lead to the heterogeneity of variances. However, it should be clearly considered that absence of variance heterogeneity for a SNP can not be interpreted as absence of involvement of the SNP into interactions network, while the presence of significant heterogeneity may be explained not only by plain interaction with some factor, but also by other biological mechanisms and statistical artifacts.

Conclusion

The method have been proposed for genome wide search of interaction between a SNP and a factor. The method is based on testing of variance homogeneity of a trait distributions in genotypes in which no knowledge of a factor is present. We have investigated type I error and power of three variance homogeneity tests (i.e. Bartlett's, Bartlett's with prior rank transformation of a trait to normality, and Levene's). Under variation of model parameters and distribution of residual errors only Levene's test kept acceptable type I error. We have obtained an analytical expression for power to detect interaction of direct test and variance homogeneity test. We also showed that the power of variance homogeneity test has lower power comparing to direct test under any model parameters when a single interacting variable is considered. As opposed to direct test, power of variance homogeneity test depends on the main effect of a factor. This dependency is non monotonic and for a given factor effect and it has its own maximums and minimums. By replicating the results of previous study

Authors' contributions

MS planned and carried out the simulation study, obtained analytical expressions, wrote the manuscript, and analyzed the data. AD, and JW provided data for the real example. CvD participated in planning and discussion of the study. YA planned simulation study, obtained analytical expressions and wrote the manuscript. All authors read and approved the final manuscript.

Additional Files

Additional file ^{2 }distribution with different degrees of freedom). The data is presented for a model which determine a trait where is effect of SNP presented which explains 0%, 1%, and 5% of total trait's variance.

Acknowledgements

The authors would like to thank Prof. Tatiana Axenovich, Prof. David Balding, Ayse Demirkan, Prof. Paul Eilers, Prof. Ben Oostra, Dr. Samuli Ripatti, Natalia V Rivera for their help in conducting this work. This work was supported by grants from Center for Medical Systems Biology (CMSB), Netherlands Genomics Initiative (NGI), and Netherlands Organization for Scientific Research (NWO). Genome-wide genotyping of the Rotterdam Study was supported by NWO (175.010.2005.011).