Assessing statistical significance in causal graphs

Chindelevitch, Leonid; Loh, Po-Ru; Enayetallah, Ahmed; Berger, Bonnie; Ziemek, Daniel

doi:10.1186/1471-2105-13-35

Methodology article
Open access
Published: 20 February 2012

Assessing statistical significance in causal graphs

Leonid Chindelevitch¹,
Po-Ru Loh²,
Ahmed Enayetallah³,
Bonnie Berger² &
…
Daniel Ziemek¹

BMC Bioinformatics volume 13, Article number: 35 (2012) Cite this article

10k Accesses
11 Citations
6 Altmetric
Metrics details

Abstract

Background

Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing.

Results

First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds.

Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed causal graphs; previous work on graph randomization has studied undirected graphs and directed but unsigned graphs.

Conclusion

We present algorithmic solutions to two statistical significance questions necessary to apply the causal graph methodology, a powerful tool for biological network analysis. The algorithms we present are both fast and provably correct. Our work may be of independent interest in non-biological contexts as well, as it generalizes mathematical results that have been studied extensively in other fields.

Background

Causal graphs are a convenient representation of causal relationships between variables in a complex system: variables are represented by nodes in the graph and relationships by directed edges. In many applications the edges are also signed, with the sign indicating whether a change in the causal variable positively or negatively affects the second variable. Causal graphs can serve as predictive models, and conclusions can be drawn from comparing the models' predictions to experimental measurements of these variables. Pollard et al. [1] pioneered the use of large-scale causal graphs to interpret gene expression data and the approach has been used successfully in several contexts [2–4]. We present our own causal reasoning approach in our companion paper [5]; here we give a brief overview.

Published research in biology provides a wealth of regulatory relationships within the cell that we mine to produce a causal network. The edges in this network are directed (by the flow of causality among the corresponding variables) and signed (by the sign of the correlation between the variables). Directed paths within the network thus predict putative upregulation and downregulation that would be effected downstream by changes in the level of a given entity (i.e., vertex in the graph). Our companion paper [5] shows that this reasoning can be applied to the inverse problem: given data from a gene expression assay, our causal network enables us to infer potential upstream causes for the measured gene expression changes. The key output of the method is a list of upstream hypotheses that explain a large fraction of the observed changes in a statistically significant manner. As hypotheses are based on existing literature, they are easily interpretable by biological experts and can provide building blocks for a more comprehensive understanding of causal drivers of the processes under consideration. Figure 1 provides a schematic of the approach.

In this paper, we study the problem of evaluating statistical significance of the conclusions drawn from a causal graph-based model given a particular gene expression dataset. To form a null distribution, either the correspondence between gene transcripts and experimental expression values or the connectivity of the graph can be randomized. Thus, the statistical significance question splits into two subproblems. First, how likely is it for the same level of agreement between predicted and observed regulation to be achieved when the classification of gene transcripts (as upregulated, downregulated, or neither) is randomly drawn from a family of all classifications with similar characteristics? Second, how likely is it to occur when the causal graph is randomly drawn from a family of all causal graphs with similar characteristics?

Answering the first question amounts to computing the distribution of the dot product of two vectors with components in {-1, 0, 1}, each drawn randomly from the family containing all such vectors with a fixed number of components of each value. This problem, which we call Ternary Dot Product Distribution, generalizes Fisher's exact test [6] to ternary variables and we thus believe it is of independent interest. Fisher's exact test is ubiquitously used in gene set enrichment analysis and many other areas of computational biology [7]. This test is appropriate to assess statistical significance of enrichment in many settings but neglects the sign of differential regulation. In many cases, the sign of the regulation is available and could be harnessed to obtain additional insights. One example where our proposed extension is directly applicable is as an alternative scoring mechanism for the well-known Connectivity Map approach [8].

Answering the second statistical significance question analytically does not appear to be possible, but the desired likelihood may be approximated by sampling uniformly at random from the family of all causal graphs with the same basic structure as the original causal graph: namely, the same positive and negative in- and out-degrees of each vertex. Because of the structure of the problem, even drawing one causal graph from this family is challenging. We call this the Causal Graph Randomization problem. Previous work on the problem of graph randomization has focused on undirected graphs [9–11]; the context of directed graphs is less well-studied theoretically [12–17] despite finding many uses in bioinformatics [18–20].

The rest of this paper is organized as follows. We begin by describing the regulatory network model based on causal graphs and discuss the way conclusions are drawn from it and the importance and subtleties of computing their statistical significance. We then describe the Ternary Dot Product Distribution problem and present two efficient algorithms to solve it: an algorithm with complexity cubic in the number of variables (i.e., vertices) in the graph but requiring computation in exact arithmetic, and an algorithm with a weaker complexity guarantee but numerically stable and efficient in practice. Finally, we discuss the challenges of the Causal Graph Randomization problem and present a practical algorithm for it using local graph operations, and conclude by describing future work.

Model Description

The two fundamental properties of causal relationships between biological entities are (1) the direction of causality between them; and (2) the qualitative response (i.e., upregulation or downregulation) of the second entity when the first one is upregulated or downregulated. This information can be encapsulated in a signed directed graph G = (V, E) whose nodes V are genes, transcripts, compounds, or biological processes, and where a directed edge from node a to node b means that the abundance or activity of b is regulated by the abundance of a. The edge (a, b) is labeled with a "+" sign if the regulation is positive (i.e., an increase in a leads to an increase in b), and it is labeled with a "-" sign if the regulation is negative. We call G a causal graph.

For any two nodes a and z not necessarily connected by an edge, the causal graph G models the effects of a change in the abundance of a on the abundance of z by tracing the shortest directed path from a to z in G and then evaluating its sign, given by the product of the signs of the edges along the path. If this overall sign turns out to be a plus sign, it is expected that a upregulates z, and if it is a minus sign, that a downregulates z [1].

Hypothesis scoring

Given a gene expression dataset, we may classify gene transcripts into three families: significantly upregulated, significantly downregulated, and not significantly regulated. We refer to this classification as the experimental classification. We wish to understand what perturbations may have led to these observations.

Given a particular entity v ∈ V in our causal graph, we can examine the predicted effects of upregulating or downregulating it. We call v together with the direction of perturbation a hypothesis. This hypothesis also classifies the gene transcript nodes in the graph into three families: those predicted to be upregulated by the perturbation of v, those predicted to be downregulated by the perturbation of v, and those not predicted to be regulated by v. We refer to this classification as the predicted classification.

In order to evaluate the goodness-of-fit of a particular hypothesis to the observed gene expression dataset, we declare a prediction to be correct if the predicted sign matches the experimental sign and the regulation was significant: i.e., both signs are + or both are -. In case of a mismatch (a + and a -), we declare the prediction to be incorrect. In all other cases, we declare the prediction to be ambiguous. We may now score a hypothesis by awarding 1 point for each correct prediction, -1 for each incorrect prediction, and 0 for each ambiguous prediction.

Statistical significance

The scores computed for each putative hypothesis provide us with an overall ranking of all hypotheses. However, a good score does not necessarily imply good explanatory power, because of possible connectivity differences between the transcript nodes of G. In particular, "hubs" with high degree are more likely to have higher scores regardless of which genes are experimentally observed to be significantly regulated. Therefore, we also need to look at the statistical significance of each score when the gene expression data is randomized, preserving the number of upregulated and downregulated gene transcript nodes, but not the nodes themselves.

In addition, we need to understand how significant the rank of a hypothesis is with respect to another null model, in which the gene expression data remains fixed but the causal graph is allowed to vary, only keeping basic connectivity properties. More specifically, we examine the rank of a hypothesis of interest in the family of graphs with the same sequence of positive and negative in-degrees and out-degrees as G, but randomly connected otherwise. If these degrees rather than the full structure of G suffice to give a hypothesis of interest a good rank, this hypothesis should not be deemed statistically significant.

Illustrative Example

To build intuition for the proposed method we outline an example application based on previously published experimental data (GEO accession GSE7683 [21]) and a large-scale causal network containing approximately 250,000 unique relationships licensed from Ingenuity, Inc. and Selventa, Inc. The original study was devised to study the effect of dexamethasone on the differentiation and development of primary mouse chondrocytes using gene expression microarrays. Interestingly, the authors report difficulties in drawing clear conclusions about the pathways and biological categories affected by dexamethasone using traditional microarray analysis methods and Gene Ontology annotations. The authors suggest that the difficulty may be due to modest response to dexamethasone (i.e., weak signal compared to background noise) that limited the ability of traditional approaches to make inference [21].

Our approach provides a statistical framework for causal inference that may be particularly valuable in such a situation. As outlined above, we consider each entity in our causal graph together with a direction of perturbation as a hypothesis; based on the network model, perturbing the entity should effect changes downstream, and we assess significance of the concordance between the predicted and experimentally measured changes by computing p-values based on the Ternary Dot Product and Causal Graph randomized null models. For simplicity, in this example we only consider predicted downstream effects one step downstream of each entity. Figure 2 illustrates the scoring for one particular hypothesis, KLF4+ (i.e., upregulation of KLF4). Note that graph entities are not limited to genes or transcripts but may include more abstract concepts tied to expression changes in the literature; an example we will encounter below is Response to hypoxia. In this case, the "direction of perturbation" included in a hypothesis is also to be understood more abstractly: e.g., Response to hypoxia+ corresponds to an increase in the effects of hypoxia (as opposed to a concrete "upregulation").

Table 1 shows the top ten hypotheses obtained from the dexamethasone treatment data (specifically, the 24 hr time point) along with corresponding computed p-values. Five of the top hypotheses directly reflect the primary experimental perturbation: the perturbation itself (Dexamethasone+), the target receptor (NR3C1+), its drug family (Glucocorticoid+) and two other glucocorticoids (Hydrocortisone+ and Triamcinolone acetonide+). Other top hypotheses describe major players in chondrocyte development and differentiation. For example, Response to hypoxia+ may reflect the central role of hypoxia response factors in the development and survival of avascular tissues such as the chondrocytes being studied here [22]. In fact, examination of the biological context of the evidence supporting Response to hypoxia+ revealed corresponding results in the literature such as the promotion of chondrocyte differentiation by hypoxia [23]. Similarly, KLF4 (shown with supporting transcriptional evidence in Figure 2) is an important gene in cell differentiation and chondrogenesis [24] and has been shown to be upregulated during hypoxia-induced mesenchymal stem cell differentiation [25].

Table 1 Top hypotheses by score and corresponding p-values on an example dataset

Full size table

Importantly, hypotheses are based on overlapping but different sets of regulated transcripts. Thus, while we assess significance of each hypothesis in isolation, the evidence shared among hypotheses should be helpful in building a more global understanding. For instance, 50% of the KLF4+ transcriptional evidence is also part of the Response to hypoxia+ evidence. This supports a major role of hypoxia in chondrogenesis which is partially mediated through KLF4.

Only 23 of the top 50 hypotheses by score pass a significance cutoff of 0.001 for both metrics, indicating the utility of significance assessment--not just score--in discerning hypotheses worthy of further investigation. For example, NRF2+, ranked 17th by score, is not deemed statistically significant according to our metrics; this is consistent with current knowledge as NRF2 negatively regulates chondrocyte differentiation contrary to the reported effect of dexamethasone. In contrast to our significance tests, a standard test for enrichment based on Fisher's Exact Test would have given a p-value < 10^-5, a result that is probably spurious.

This example is not meant as a comprehensive discussion of the affected biology but should provide some intuition how the proposed measures can be used. For complex biological phenotypes, many hypotheses may be reported as significant that may include overlapping but distinct sets of transcriptional changes as supporting evidence. While our proposed metrics judge significance of single hypotheses independently, the results provide a statistically well-founded substrate on which to form a more comprehensive picture of potential drivers of the observed expression changes.

Results

We divide this section into two parts corresponding to the two statistical significance questions we address: Ternary Dot Product Distribution and Causal Graph Randomization.

Ternary Dot Product Distribution

We begin by establishing notation and phrasing the problem in a slightly more abstract setting which we find helpful for investigating its mathematical structure.

Problem definition

A ternary classification of a ground set $T$ (such as the gene transcript nodes of the causal graph G in our motivating example) is a function from $T$ to {-1, 0, 1}. Given an arbitrary but fixed ordering of the elements of $T$ , we can naturally represent a ternary classification C of $T$ as a ternary vector u(C) whose i-th component is the value of C on the i-th element of $T$ . Then, for two ternary classifications C and C' of $T$ , the agreement between C and C' (corresponding to the goodness-of-fit in our motivating example) is computed as the dot product u(C) · u(C').

We are interested in understanding the distribution of the agreement between the fixed experimental classification C and a random classification whose parameters (numbers of -1, 0 and 1 components) are taken from the predicted classification C'. In other words, given two classifications C and C' of $T$ , we are interested in the distribution of the agreement between C and a randomized version of C' over all possible randomizations, where a randomization of C' is a classification $C_{R}^{'}$ of $T$ with the same parameters as C'.

Denote the parameters of C and C' by

q_{σ} : = # {i | u {(C)}_{i} = σ}, n_{σ} : = # {i | u {(C^{'})}_{i} = σ},

where σ ∈ {-1, 0, 1}. Also let

n_{σ r} : = # {i | u {(C)}_{i} = σ, u {(C^{'})}_{i} = r}

for σ, τ ∈ {-1, 0, 1}, corresponding to the nine ways in which the classifications C and C' can overlap. This gives us the 3 × 3 contingency table for the joint classification (C, C') shown in Table 2. (For notational convenience we write {-, 0, +} instead of {-1, 0, 1} when indexing variables.)

Table 2 Contingency table comparing predicted and experimental classifications

Full size table

The same 3 × 3 contingency table will arise from a large number of randomized classifications $C_{R}^{'}$ , and the number of such classifications, which we denote by D[n₊₊, n_+-, n_-+, n_--], depends only on the top left 2 × 2 corner of the table since the other entries are determined by the constraints on row and column sums. Using multinomial coefficients, we can write

\begin{gathered} D [n_{+ +}, n_{+ -}, n_{- +,} n_{- -}] = \\ (\begin{matrix} q_{+} \\ n_{+ +}, n_{+ -}, n_{+ 0} \end{matrix}) (\begin{matrix} q_{-} \\ n_{- +}, n_{- -}, n_{- 0} \end{matrix}) (\begin{matrix} q_{0} \\ n_{0 +}, n_{0 -}, n_{00} \end{matrix}) . \end{gathered}

We will write D[n_±±] as shorthand for this quantity.

The score for a classification $C_{R}^{'}$ yielding this table is simply

S [n_{+ +}, n_{+ -}, n_{- +}, n_{- -}] : = n_{+ +} + n_{- -} - n_{+ -} - n_{- +} .

We also know that the total number of possible randomized classifications is

D_{tot} : = \sum_{n_{+ +}, n_{+ -}, n_{- +}, n_{- -}} D [n_{\pm \pm}] = (\begin{matrix} |T| \\ n_{+}, n_{-}, n_{0} \end{matrix}) .

Thus, the distribution we are seeking is a sum of the D[n₊₊, n_+-, n_-+, n_--] aggregated by the score S[n₊₊, n_+-, n_-+, n_--] and normalized by D_tot. Explicitly, the probability of a score S is given by

p (S) = \sum_{(n_{+ +} + n_{+ -}) - (n_{- +} + n_{- -}) = S} \frac{D [n_{\pm \pm}]}{D_{tot}},

and the p-value of a score can be computed by summing the right tail of the distribution.

In the context of our illustrative example, these are the p-values given for hypotheses of interest in the "Ternary Dot Product p" column of Table 1. Computing these p-values naïvely is computationally intensive, however; to perform the calculations efficiently, we developed and applied an algorithm we now describe.

Algorithm

The Ternary Dot Product Distribution problem can be solved by computing each D-value individually in constant time (see Methods), giving a total running time that scales as the product n₊, n_-, q₊, q_-, i.e., O(N⁴) where N := max(n₊, n_-, q₊, q_-). While this complexity is acceptable for moderate values of N (say up to 100), it becomes prohibitively slow for larger values of N, typically between 100 and 1000, that often arise in applications. Hence, faster alternatives are necessary; we give two improvements below.

Instead of computing all the D-values individually, we can aggregate them by the value of n₊₊ + n_--. This still makes it possible to group them by the score S, as S only depends on n₊₊ + n_-- and n_-+ + n_+-. We can write the sum of all the D-values with a fixed n := n_+- + n_-+ in the form of a constant times

F [n] : = \sum_{k} (\begin{matrix} n \\ k \end{matrix}) (\begin{matrix} v - n \\ w - k \end{matrix}) (\begin{matrix} x - n \\ y - k \end{matrix}),

where k = n_+-, v = q₊ + q_- - n₊₊ - n_--, w = q₊ - n₊₊, x = n₊ + n_- - n₊₊ - n_--, and y = n_- - n_--. It turns out that F[n] satisfies a three-term linear recursion obtained by using the WZ algorithm [26]. With this recursion, each F[n] can be computed in average constant time. Since there are only O(N³) values of F[n] to compute, we get a O(N³) algorithm for our problem. (See Methods for a full description.)

This cubic algorithm is of theoretical interest but in practice requires exact arithmetic to obtain correct answers due to numerical instability (see Testing). We therefore developed a second algorithm that is both fast and practical, having the important advantage of working in floating-point arithmetic.

The key observation underlying our algorithm is that the vast majority of contingency tables are highly improbable (i.e., D[n₊₊, n_+-, n_-+, n_--]/D_tot ≪ 1) and thus may be safely ignored if we:

(a)
need only carry out the computation to fixed precision; and
(b)
do not care about the precise values of tail probabilities: it is enough to know that they are small.

Moreover, the quantities D[n_±±] follow an easily described law on certain families of contingency tables, thus allowing us to identify entire families of tables that can be discarded after a constant amount of computation.

Consider families of configurations in which the row and column sums of the upper-left 2 × 2 submatrix (n_±±) are fixed. Denote these sums by r₊, r_-, c₊, c_-, noting that as before, one constraint is redundant as r₊ + r_- = c₊ + c_- =: t is the total of the entries in the submatrix. Thus, in each family, one degree of freedom remains, which we may parameterize by the value of n₊₊. It turns out that within each such family, D[n_±±] is maximized when n_±± are distributed in proportion to the 2 × 2 row and column sums, i.e.,

n_{σ τ} \approx τ_{σ} c_{τ} / t for σ, τ \in {+, -}

(with appropriate rounding), and moreover, the probability decreases monotonically as n₊₊ is varied in either direction from the optimum. (See Methods for details and a proof.)

Our algorithm thus proceeds as follows (Figure 3, Algorithm 1a). First, compute the global maximum D-value D_max over all 3 × 3 contingency tables with row and column sums q_σ, n_τ. As in the 2 × 2 case just discussed, D_max is achieved when $n_{σ τ} \approx q_{σ} n_{τ} / |T|$ for σ, τ ∈ {+, -, 0}. Now iterate through the O(N³) families of contingency tables with fixed upper-left 2 × 2 row and column sums r_σ, c_τ. For each such family, compute its maximum D-value D_fam by setting n_στ≈ r_σc_τ/t for σ, τ ∈ {+, -} (and inferring the remaining five n_στwith σ = 0 or τ = 0). If D_fam is less than D_max times a chosen threshold factor ϵ (perhaps machine epsilon--i.e., the maximum relative error of rounding in floating point arithmetic--divided by N³, though machine epsilon itself is likely sufficient for practical purposes), discard this family and proceed to the next one. Otherwise, the maximum probability for the family is non-negligible; in this case, iterate through the family upward and downward from the maximizing n₊₊, updating the aggregate probabilities of the scores S[n₊₊, n_+-, n_-+, n_--] obtained, until the D-value of the current contingency table drops below ϵD_max.

In practice, very few 2 × 2 families are within threshold. In fact, the computation time is often governed by the O(N³) initial threshold tests for each family (with fewer than N³ additional D-value computations). This observation allows us to obtain further speedup by considering superfamilies in which only the row sums r_σof the upper-left 2 × 2 submatrix are fixed, leaving two degrees of freedom. Each such superfamily is the union of a set of families we considered above, and as before, the maximal D-value achieved by any contingency table within the superfamily is obtained by assigning counts to the left 3 × 2 submatrix proportionally to its row and column sums. We can thus apply the algorithm described above to the O(N) families of 3 × 2 left submatrices with fixed row sums. When the maximal D-value of the 3 × 2 family is below threshold, we may eliminate an entire one-parameter family of 2 × 2 families, achieving further efficiency (Figure 3, Algorithm 1b).

Testing

We tested our algorithms on a wide range of problem parameters and found that our thresholded algorithm achieves substantial speed gains across parameter distributions. Table 3 compares the scaling of run times of the simple quartic algorithm (computing all D-values) and Algorithm 1b, the version thresholded on 3 × 2 families, for a parameter distribution representative of typical use cases. For large cases, the thresholded algorithm reduces run times from days to minutes.

Table 3 Run times for Ternary Dot Product Distribution algorithm

Full size table

To further investigate the efficiency attained by thresholding, we computed counts of the numbers of D-values computed by the quartic algorithm and during 2 × 2 and 3 × 2 thresholding; we compare these counts to the actual numbers of contingency tables and families that pass threshold (Figure 4). We performed these computations for two parameter distributions: one with n₀ = 5n₊ and one with n₀ = 50n₊. The first case is relatively dense, i.e., a sizeable portion (around 30%) of the gene transcripts are significantly upregulated or downregulated. The second case is sparser; here, there are many more genes but only a few percent of them are found to be regulated. This latter case is typical in practice.

The solid black curve in Figure 4 indicates the amount of work performed by the simple quartic algorithm while the dotted black curve indicates the number of D-values that exceed ϵD_max, thus placing a lower bound on the amount of work that any thresholding-based algorithm must perform. The disparity between these two curves immediately demonstrates the reason our thresholding algorithms achieve speedup: only a tiny fraction of the D- values are non-negligible. The comparison between the left and right panels of Figure 4 also makes clear the relative effects of 2 × 2 versus 3 × 2 thresholding in different parameter settings. In the dense case n₀ = 5n₊, we see that 2 × 2 thresholding (Algorithm 1a) is probably already close to optimally efficient: the amount of work required to do the threshold checks (solid blue curve) is comparable to the total amount of work required to compute all relevant D-values (dotted black line). On the other hand, in the sparse case n₀ = 50n₊, even performing 2 × 2 threshold checks leaves much room for improvement because the number of relevant D-values is far smaller. In this situation it is much more efficient to only compute O(N²) 3 × 2 threshold checks (solid red line). For an analytical discussion of these phenomena and a proof that the 2 × 2 thresholding algorithm has complexity O(N^3.5), see Methods.

We have left our cubic algorithm out of the previous figures and discussion because unfortunately, our tests showed that it is numerically unstable, at least in the form stated; we now briefly discuss this issue. While the cubic algorithm does yield the correct distribution when implemented in arbitrary-precision exact arithmetic, it fails when implemented in floating-point arithmetic because the range of values in the recurrence F[n] is extremely large and subject to cancelation error. For instance, when the parameters are set to the relatively small values v = 20, w = 10, x = 10, y = 5, the values of F[n] already go from 46558512 for n = 0 to 6006 for n = 15, which means that each term is approximately a factor of 2 smaller than the previous one. We consider some alternatives in Discussion.

Implementation

We implemented all of our algorithms in R [27], vectorizing computations when possible. A few remarks are in order about implementation details necessary to make the thresholding algorithm numerically stable. The large factorials in the D-value formula require us to perform all computations in log-transformed space so as to stay within floating point range. This causes no difficulty; multiplication simply becomes addition and addition can be implemented by exponentiating the difference of two log-transformed values, adding 1, taking the log, and adding a shift. Numerically, there is no risk of cancelation error because D-values are only summed and never subtracted; thus, all rounding error is additive and well-controlled. The number of summands per score value S is O(N³), and using a stochastic model of rounding error, the total accumulated relative error is thus bounded by O(N^3/2) times machine epsilon. In practice N is typically not more than 1000 while machine precision is 10^-16 so there is no concern.

The only caveat, as we noted initially, is that our algorithm guarantees precision relative to the maximum probability of all score values--not the probability of each particular score. In other words, very small tail probabilities are known only to the extent that they are understood to be negligible compared to probabilities from the bulk distribution; their precise values are not computed.

Causal Graph Randomization

We now turn to our second computational problem arising from statistical significance evaluation in causal graph models, that of graph randomization. We begin by defining the Causal Graph Randomization problem and placing it in context with previous work on graph randomization. We then explain the special challenges of randomizing a signed causal graph and present an algorithm that successfully overcomes these challenges in practice.