Genetics Dept., Rosetta Inpharmatics, LLC, wholly owned subsidiary of Merck & Co., Inc., 401 Terry Avenue North, Seattle Washington, 98109, USA

Abstract

Background

There has been intense effort over the past couple of decades to identify loci underlying quantitative traits as a key step in the process of elucidating the etiology of complex diseases. Recently there has been some effort to coalesce non-biased high-throughput data, e.g. high density genotyping and genome wide RNA expression, to drive understanding of the molecular basis of disease. However, a stumbling block has been the difficult question of how to leverage this information to identify molecular mechanisms that explain quantitative trait loci (QTL). We have developed a formal statistical hypothesis test, resulting in a p-value, to quantify uncertainty in a causal inference pertaining to a measured factor, e.g. a molecular species, which potentially mediates a known causal association between a locus and a quantitative trait.

Results

We treat the causal inference as a 'chain' of mathematical conditions that must be satisfied to conclude that the potential mediator is causal for the trait, where the inference is only as good as the weakest link in the chain. P-values are computed for the component conditions, which include tests of linkage and conditional independence. The Intersection-Union Test, in which a series of statistical tests are combined to form an omnibus test, is then employed to generate the overall test result. Using computer simulated mouse crosses, we show that type I error is low under a variety of conditions that include hidden variables and reactive pathways. We show that power under a simple causal model is comparable to other model selection techniques as well as Bayesian network reconstruction methods. We further show empirically that this method compares favorably to Bayesian network reconstruction methods for reconstructing transcriptional regulatory networks in yeast, recovering 7 out of 8 experimentally validated regulators.

Conclusion

Here we propose a novel statistical framework in which existing notions of causal mediation are formalized into a hypothesis test, thus providing a standard quantitative measure of uncertainty in the form of a p-value. The method is theoretically and computationally accessible and with the provided software may prove a useful tool in disentangling molecular relationships.

Background

It has become increasingly appreciated in recent years that empirical evidence of causal links between genotype and multiple quantitative traits such as transcript abundances and clinical phenotypes can provide information on causal relationships between those quantitative traits

Causal effect estimates often considered in 'Mendelian randomization' approaches

It's hard to overstate the importance of this problem as emerging technology and increased understanding of molecular biology allows cost-effective high-throughput quantification of multiple classes of molecular traits such as gene expression, CNV, miRNA, metabolite levels, splice variants, methylation, and protein expression. Designing and orchestrating classical studies to learn about causal pathways by perturbing individual factors would undoubtedly prove to be glacially slow given the extremely high dimensional nature of the problem.

Here we propose a statistical test to infer causal status for a potential mediator between a locus and a quantitative trait based on a set of mathematical conditions, thus providing a causal call and a quantitative measure of uncertainty in the causal inference in the form of a formal p-value. By 'causal', we mean that variance in the mediator determines some proportion of variance in the trait, even if that proportion is small. While the proposed approach does rely on some distributional and linear assumptions, generalizations to nonparametric and nonlinear models are straightforward.

Results

Conditions for causality

A genotype marker at a specific locus is denoted by L, transcript abundance for a specific transcript by G, and a measured clinical trait by T. Schadt

Causal inference test (CIT)

Chen

Component statistical tests

Each mathematical condition is assessed with a corresponding statistical test. Biallelic markers were considered here and fully modeled using two indicator covariates, L_{1 }and L_{2}, for one and two variant alleles, respectively, in a co-dominant coding. The four conditions are tested in the parameters of the following three linear regression models,

where the _{ji }represent independent random noise with variance

For conditions 1 through 3, we use standard F tests for linear model coefficients, where the null hypotheses correspond to each set of parameters equaling zero. The F tests are conditional on the remaining terms in the model, that is, partial F tests are conducted for tests of parameters in equations (2) and (3). For the forth condition, the alternative hypothesis is {_{7}, _{8}} = 0 (conditional on the _{6 }term), which is an equivalence testing problem, that is, a non-significant test of the null that {_{7}, _{8}} = 0 does not correspond to statistically significant equivalence. Generally, in setting up an equivalence test, it is necessary to specify critical bounds that define a region in which the parameter is sufficiently close to the target value as to make any difference practically irrelevant. Chen et al.

More specifically, the null distribution is estimated using a bootstrap type approach that consists of the following steps: 1) A random variable _{1}_{1 }+ _{2}_{2 }+

are fitted and a single realization of the test statistic,

where _{1 }and _{2 }denote the degrees of freedom. However, conditional on the observed T, the distribution has less spread than the unconditional distribution, so a standard

where ^{2}, from the empirical distribution of

That is, the transformed observed statistic

Estimated distributions of the test statistic,

**Estimated distributions of the test statistic, Z*, under the null for the equivalence test of conditional independence between the locus and the trait**. An additive effect for a single biallelic locus under a simple independence model was simulated for both the gene and the trait under normally distributed errors. Minor allele frequency = .2. Sample size = 1000. B = 500.

Negative log 10 p-values for the semi-parametric and non-parametric versions of the CIT applied to 10,000 replicate data sets simulated under the independence model

**Negative log 10 p-values for the semi-parametric and non-parametric versions of the CIT applied to 10,000 replicate data sets simulated under the independence model**. For each replicate of 100 observations the genetic variance for the gene and the trait were each randomly sampled from a uniform distribution ranging from 8 to 32 percent. The gene and trait were normally distributed and a biallelic locus was simulated with allele frequency of .5.

Omnibus test

The CIT can be thought of as testing the strength of a chain of mathematical conditions that as a set are consistent with causal mediation. Thus, the strength of the chain is only as strong as the weakest link, and correspondingly, the rejection region of the omnibus test is the intersection of rejection regions of the component tests. Cassela and Berger _{γ }and with rejection region _{γ}, then the 'intersection-union' test with rejection region ∩_{γ }is a level sup(_{γ}) test. Thus, the p-value for the CIT corresponds to the p-value for an intersection-union test, which is the supremum of the four p-values for the component tests.

A typical experiment where one might attempt to quantify causal associations between genes and clinical traits is an F2 cross setting where clinical traits, high density genotyping and genome wide RNA expression are collected. Our fundamental goal is to link all of this information in a meaningful way such that we can predict the genes that drive the traits of interest. More specifically, our goal here was to construct a robust statistical test to assess the hypothesis that a potential mediator between an initial randomized variable and an outcome variable is causal for that outcome. To achieve this objective we first identified a suite of mathematical conditions that could serve as a working mathematical definition of causality, could be formally statistically tested, and could support currently held views on minimum requirements necessary for causal inference. Second, we choose a method to compute a p-value for each condition. And third, we used the Intersection-Union Test framework

Schematic diagram of the CIT

**Schematic diagram of the CIT**. A) Study subjects are sampled for i) a trait of interest, T (for example, cholesterol or fat mass), ii) a potential mediating factor, G (for example, an mRNA or protein concentration), and iii) genotype at a polymorphic locus, L, that is thought to affect both G and T. B) The four component tests of the CIT are conducted yielding four corresponding p-values. (Plots are shown as a conceptual device, see text for details of the actual tests.) Associations in 1 through 3 but not 4 are consistent with causal mediation. C) The largest of the four p-values becomes the omnibus p-value, the final result of the CIT.

CIT algorithm testing

Performance of the CIT was evaluated against three model selection approaches using simulated data from an F2 mouse cross measured on a genome wide high density panel of 2310 SNPs. The genotype data were simulated using RQTL software

Model selection approaches for comparison

CC

The first model selection method, denoted by CC, is based on the conditional correlation approach _{4}, _{5}}, equation 2, is less than .05 and the p-value for the partial F for {_{7}, _{8}}, equation 3, is greater than .05, then the causal model is selected; if the converse is true, the reactive model is selected; if both p-values are below .05, the independence model is selected; and if both p-values are above .05 then no call is made.

AIC

The second approach, which depends on AIC values computed for regression models corresponding to the independence, reactive, and causal models, implemented by Chen et al.

BNC

The third method, denoted by BNC, employs the Bayesian Score, which is estimated for three networks, each including exactly two arcs, corresponding to the independent, the reactive, and the causal models (Figures

One thousand replicate crosses were simulated under each of six causality scenarios using equations 1–3 (Figure

Four causal inference strategies, CC, CIT, BNC, and AIC were applied to simulated data under five distinct causal models, A-E, shown above

**Four causal inference strategies, CC, CIT, BNC, and AIC were applied to simulated data under five distinct causal models, A-E, shown above**. Here a genotype marker at a specific locus is denoted by L, a gene corresponding to measured transcript abundance is denoted by G, and a measured clinical trait is denoted by T. H denotes an unmeasured molecular trait.

Marginal effect sizes (sample R^{2 }values) for all six causality scenarios

**Marginal effect sizes (sample R ^{2 }values) for all six causality scenarios**. Causal models are, causal (C), reactive (R), independent (I), hidden variable affecting both traits (H), and no associations between genotypes and traits (Null). R

Causal scenarios

A) Null. Trait data were sampled from a normal distribution and were completely independent of genotype data.

B) Independent. Data were simulated according to equations 2 and 3, with {_{3}, _{6}} fixed at zero.

C) Independent/hidden variable. This scenario is like B but with the inclusion of an additional normal covariate term to simulate a hidden factor.

D) Causal. Data were simulated according to equations 2 and 3, with {_{3}, _{7}, _{8}} fixed at zero. Thus, the association between L and T is entirely mediated by G.

E) Causal/independent. This scenario is like D but only _{3 }is fixed at zero. Parameters {_{7}, _{8}} were randomly assigned small positive or negative non-zero values.

F) Causal/hidden. This scenario is like D but with the inclusion of an additional normal covariate term to simulate a hidden variable.

Simulation results (Figure

Type I error and power comparison between causality methods derived from computer simulated F2 mouse crosses

**Type I error and power comparison between causality methods derived from computer simulated F2 mouse crosses**. For each autosome of each replicate cross of N = 1000 total crosses, a clinical trait and potential mediating trait were simulated under a variety of true causal scenarios. For each scenario, a wide range of positive and negative effect sizes were randomly selected for each chromosome of each cross. 'Neighbors' denote chromosome-specific QTL peak pairs. Causal models are, causal (C), reactive (R), independent (I), hidden variable affecting both traits (H), and no associations between genotypes and traits (Null).

The CC and AIC methods performed similarly under the null scenario, both yielding approximately 7 percent of false causal and reactive calls. While slightly elevated, this is closer to the target 0.05 type I error rate. The CIT clearly demonstrated the lowest type I error under the null scenario, yielding zero false positives. This result highlights one of the positive attributes of the CIT, the formal integration of diverse criteria into the final causal inference. Thus, the CIT is robust to the choice of prior filters applied to the data. Unlike more standard statistical tests that are designed to function under a single null distribution, the CIT must be robust to an array of null conditions that include the null, independent, and independent-hidden, scenarios. It is understandable then, that depending on conditions the CIT may over or undershoot the target type I error.

Under the simple independence model, type I error for the CIT and the other methods is low, and all four methods are conservative under these conditions. The CIT and AIC are significantly more conservative under this scenario than the CC and BNC methods, with about 3 percent fewer false positives. All methods have more false causal calls than false reactive calls, an indication that a potential mediator is more likely to be called 'causal' if the eQTL is more significant than the cQTL.

The independent-hidden scenario revealed clearly challenging conditions, with all four methods surpassing the .05 type I error target by a significant amount. The CIT had the lowest type I error, 15 percent, which was less than half that of the AIC, 41 percent, and the BNC, 48 percent. The CC fell between these two groups, 21 percent. As discussed above, type I error for all methods could be improved by applying various filtering criteria, however, it is not trivial to determine the global optimal thresholds.

Qualitatively similar differences in power between methods were observed for the causal, causal-independent, and causal-hidden scenarios. The CC had more power than the CIT, and the AIC and BNC had more power than the CC. The performances of the AIC and BNC were almost identical under these conditions. The largest spread in power was observed for the causal-independent scenario, which resulted in approximately 69 percent power achieved for AIC and BNC methods compared with 29 percent for the CIT and 38 percent for the CC. This spread reflects the challenge for the CC and CIT methods of satisfying the conditional independence criterion.

In terms of sensitivity and specificity, we observed that the methods with higher sensitivity had lower specificity and vice versa. There is a trade-off between sensitivity and specificity, and it is often possible to adjust the mix using tuning parameters or filtering devices. If type I error is approximately adjusted by filtering out QTL with p-values greater than 0.001 and requiring a bootstrap consistency of greater than .7 for the AIC and BNC methods, then power is similar for the CIT, AIC, and BNC methods for the causal and causal-hidden scenarios. Under the causal-independent scenario, the AIC and BNC method are more powerful than the CC and CIT methods (Figure

Type I error and power comparison between causality methods derived from computer simulated F2 mouse crosses

Thus, by creating a model selection method from the CIT, we were able to compare it to existing methods and demonstrate the conservative character of the test under most conditions without the need for heuristic filtering procedures. The intent here was not to supplant existing causal inference model selection methods with the CIT but rather to provide a complimentary statistic to quantify uncertainty in the causal call.

Application – predictive power of the causal network

We used the CIT to construct a causal transcriptional regulatory network for previously published genotypic and expression data from yeast by defining a directed edge between two genes if one of the genes tested significantly as a causal mediator for the other but not vice versa. One objective way to assess the relative predictive power of several networks is to use the networks to infer the causal regulators for previously identified eQTL hot spots

Schematic of an eQTL hotspot, a locus identified to affect transcript abundances for many genes

**Schematic of an eQTL hotspot, a locus identified to affect transcript abundances for many genes**. Directly affected are genes in cis, some of which, the 'cis regulators', propagate the 'perturbation' to other genes. Among cis regulated genes are the putative cis regulators identified by BN.full (yellow) as well as targeted in vivo experimentation (red).

The yeast cross dataset (see Brem et al. for full methods

CIT reconstructed causal transcriptional regulatory network

**CIT reconstructed causal transcriptional regulatory network**. Yellow circles indicate putative hotspot regulators from Table 1, and red circles indicate those that have been experimentally validated.

We compared the CIT to the CC, and Bayesian networks

We assessed the predictive power of causal networks to identify the known causal regulators for the 13 eQTL hot spots. Table

P-values for overlap of putative transcriptional regulators of eQTL hotspots identified in yeast

Bayesian

Bayesian

Hot spot

Gene Symbol

Gene Chrom.

Full

Network

CIT

CC

Chr 2 560000

SUP45

2

0

0

2.04E-99

Chr 2 560000

ARA1

2

0

0

0

Chr 2 560000

TBS1

2

0

0

0

Chr 2 560000

CNS1

2

0

0

0

Chr 2 560000

AMN1

2

7.74E-73

1.52E-99

0

0

Chr 2 560000

CSH1

2

0

7.16E-21

7.46E-106

Chr 2 560000

TOS1

2

0

8.12E-287

0

Chr 3 1e+05

ILV6

3

1.57E-182

0

6.01E-147

Chr 3 1e+05

NFS1

3

2.04E-40

0

0

Chr 3 1e+05

LEU2

3

0

0

0

0

Chr 3 1e+05

CIT2

3

7.67E-43

0

0

Chr 3 1e+05

MATALPHA1

3

4.66E-63

9.18E-91

0

0

Chr 5 130000

URA3

5

2.04E-145

1.01E-157

3.74E-49

Chr 8 130000

GPA1

8

0

9.00E-307

1.69E-11

6.41E-07

Chr 12 680000

HAP1

12

0

0

3.63E-45

4.43E-83

Chr 12 107000

YRF1-4

12

0

0

Chr 12 107000

YRF1-5

12

0

0

Chr 14 503000

MSK1

14

1.39E-08

9.65E-09

2.96E-108

1.46E-141

Chr 14 503000

SAL1

14

0

0

0

Chr 14 503000

TOP2

14

1.51E-30

2.74E-73

5.56E-169

1.73E-174

Chr 15 180000

PHM7

15

0

4.48E-289

0

0

Discussion

The CIT is the first method that we know of for computing a p-value for a potential causal mediator. By quantifying the evidence of causal mediation status it provides a tool for 1) ranking a large number of potential mediators, 2) communicating the strength of evidence to other researchers as well as the greater scientific community, and 3) making go-no-go decisions regarding future research. A user-friendly R script, implementing the CIT as a function (see Additional file

**CIT R script**. This file includes a script to implement the CIT as a function in R.

Click here for file

We have shown that by using the CIT as a model selection method, the results are conservative yet comparable to other diverse model selection approaches when the type I error rate is controlled. We have also shown that type I error for the CIT is well controlled under the independence model and lower than other methods when a hidden variable affects both the gene and the trait. The introduction of hidden variables is challenging for most statistical techniques, however, in the case of transcriptional regulatory networks, this issue is particularly important in view of known and possibly unmeasured molecular species that can affect transcript abundance. We have shown that the CIT is robust to a variety of conditions without the need for additional heuristic filters.

When eQTL and cQTL peaks are close, but not occurring at precisely the same marker, the investigator is faced with the problem of choosing a test marker between those peaks, inclusive. We have decided that evidence and logic favors the trait peak marker. Schadt et al.

It has been demonstrated that additional information such as TFBS and protein-protein interaction data can increase the predictive power of the network

There are several possible reasons for the strong performance of the CC and CIT relative to the BN. First, the BN is optimized based on a global objective function yet all methods are evaluated on a local level. It is still possible that the BN out-performed the other methods on a global scale that was not evaluated (due to the difficulty of such an evaluation). Secondly, the BN approach requires a limit to the number of parents each node can have (five), as well as the requirement of no feedback loops, which are not required for the other methods. Third, it is possible that an adjustment of priors could improve the performance of the BN approach.

The main caveat to all causal inference approaches considered here, is that there are limitations in our ability to make causal inferences for any variable that is not experimentally perturbed. For instance, a potential mediator that tests positive may be acting as a surrogate for a tightly linked unmeasured variable that is truly causal. However, even in such a case the evidence of causality may help direct the research to the true unmeasured factor.

We emphasized a semi-parametric approach here, which is useful for high throughput data where computational resources are often limiting, but the approach does require the reliance on some distributional assumptions. However, each component p-value of the CIT could be computed using permutation tests, which would yield a completely non-parametric approach. Alternatively, the probit transformation could be used for the gene and the trait to approximate normality, which is a valid approach under large-sample conditions

It's important to note that this is a flexible framework that can be generalized to handle complex eQTL methods that require the use of covariates and complex data structures

Conclusion

As microarray gene expression profiling and other high-throughput technologies for measuring intracellular molecular traits become more commonly used, there is increasing demand for statistical tools that distinguish between competing causal models such as pleiotropic (independence) and reactive transcriptional control. The CIT is unique in that it provides a highly interpretable quantitative measure of uncertainty in the form of a formal p-value that can be computed for a trio of variables when association between two implies causation and the third is a potential mediator. The CIT can be conducted as building block for a network reconstruction problem or as an isolated test apart from a larger network.

Authors' contributions

JM designed and implemented the CIT, performed the computer simulations, and wrote the manuscript. BZ performed all statistical analyses for the Application section, wrote the Application section, and contributed to the concept and editing of the manuscript. EES and JZ contributed to the concept, writing, and editing of the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We would like to thank Manikandan Narayanan, Jeff Sachs, and David Nickle for their careful evaluation of this work and their insightful comments.