Philip Morris International R&D, Philip Morris Products S.A., Quai Jeanrenaud 5, Neuchâtel, 2000, Switzerland

Philip Morris International R&D, Philip Morris Research Laboratories GmbH, Fuggerstr.3, Cologne, 51149, Germany

Selventa, One Alewife Center, Cambridge, MA, 02140, USA

Abstract

Background

High-throughput measurement technologies produce data sets that have the potential to elucidate the biological impact of disease, drug treatment, and environmental agents on humans. The scientific community faces an ongoing challenge in the analysis of these rich data sources to more accurately characterize biological processes that have been perturbed at the mechanistic level. Here, a new approach is built on previous methodologies in which high-throughput data was interpreted using prior biological knowledge of cause and effect relationships. These relationships are structured into network models that describe specific biological processes, such as inflammatory signaling or cell cycle progression. This enables quantitative assessment of network perturbation in response to a given stimulus.

Results

Four complementary methods were devised to quantify treatment-induced activity changes in processes described by network models. In addition, companion statistics were developed to qualify significance and specificity of the results. This approach is called Network Perturbation Amplitude (NPA) scoring because the amplitudes of treatment-induced perturbations are computed for biological network models. The NPA methods were tested on two transcriptomic data sets: normal human bronchial epithelial (NHBE) cells treated with the pro-inflammatory signaling mediator TNFα, and HCT116 colon cancer cells treated with the CDK cell cycle inhibitor R547. Each data set was scored against network models representing different aspects of inflammatory signaling and cell cycle progression, and these scores were compared with independent measures of pathway activity in NHBE cells to verify the approach. The NPA scoring method successfully quantified the amplitude of TNFα-induced perturbation for each network model when compared against NF-κB nuclear localization and cell number. In addition, the degree and specificity to which CDK-inhibition affected cell cycle and inflammatory signaling were meaningfully determined.

Conclusions

The NPA scoring method leverages high-throughput measurements and a priori literature-derived knowledge in the form of network models to characterize the activity change for a broad collection of biological processes at high-resolution. Applications of this framework include comparative assessment of the biological impact caused by environmental factors, toxic substances, or drug treatments.

Background

Acquisition of large-scale data sets representing a variety of data modalities has become a crucial aspect of experimental system characterization. This strategy enables the broad capture of biological information in a short time and with a relative small investment of effort, in the hope that valuable biological insights might be gained. However, the amount of information collected can be overwhelming, making interpretation of the data difficult and subsequent detailed biological understanding elusive. Reducing the complexity of such data by evaluating it in a relevant biological context is required in order to gain meaningful insight.

High-throughput measurements can be evaluated against literature-curated “cause and effect” relationships extracted from the Selventa Knowledgebase (see Methods). As illustrated in Figure

Constructing and scoring causal network models.

**Constructing and scoring causal network models. (a)** The Selventa Knowledgebase is composed of individual causal relationships that are curated from available sources (generally published scientific literature). For example, in the statement “A → exp(X)” entity A may represent the activity of a particular kinase that has been shown to lead to increased expression of gene X. A HYP is a set of causal relationships derived from the Selventa Knowledgebase that relate an upstream entity to the downstream measurable entities that it regulates. In the HYP example entity A regulates the expressions of genes W, X, Y, and Z following the specific regulation signs “→” (positive regulation) and “--|” (negative regulation). **(b)** An aggregated HYP can be generated from a causal network model, which is a collection of causally linked entities derived from the Selventa Knowledgebase (grey edges). This causal network model can be augmented with the causal relationships to the downstream measureable entities of its nodes (black edges), also derived from the Selventa Knowledgebase. As a consequence, a causal network model can be equivalently viewed as a collection of causally linked HYPs. Next, all the downstream measurable entities of the network model nodes are combined by adjusting their signs based on the causal relationships in the network. This essential step toward the construction of the aggregated HYP is well-defined as long as the network model is causally consistent (see Methods). Because these nodes have a negative causal relationship with the reference node (node A), the regulation signs “→” or “--|” of the downstream measureable entities from nodes C and D are inverted (red edges) when constructing the aggregated HYP from the causal network model. **(c)** NPA scores are calculated from high-throughput data, such as differential gene expression data obtained from treatment versus control comparisons, and applied to an NPA scoring algorithm in the context of a specific causal network model represented by its aggregated HYP.

The HYP can describe causal relationships between an upstream biological activity and any type of high-throughput data. However, the work described here focuses on the evaluation of whole-genome mRNA expression changes; thus, the HYP is the equivalent of a causal “gene expression signature” for a given entity or process, for example, the activity of a particular kinase. Previous work has explored the importance of uncovering a characteristic signature of gene expression changes that results from one or more perturbations to a biological process, and the subsequent scoring of that signature’s presence in additional data sets as a measure of that process’s specific activity amplitude. Most work in this field has involved identifying and scoring signatures that are correlated with a disease phenotype

Alternatively, a number of studies have focused on measuring causal signatures based on very specific upstream perturbations, either performed directly in the system of interest

As a consequence, coupling specific causal HYPs captured in the Selventa Knowledgebase with a measure of perturbed activity would be a means to further realize this clinical potential, as well as the potential to increase basic biological understanding that is harbored within high-throughput data. However, the HYP infrastructure has been previously exclusively employed as an exploratory tool for identifying relevant perturbed biology by drawing qualitative mechanistic inferences based on statistical enrichments

To assess HYP amplitude, four complementary scoring algorithms were developed: Strength, Geometric Perturbation Index (GPI), Measured Abundance Signal Score (MASS), and Expected Perturbation Index (EPI). NPA scoring was then applied to different inflammatory and cell cycle-related HYPs using two transcriptomic data sets: a TNFα dose and time series in normal human bronchial epithelial (NHBE) cells and a CDK inhibitor R547 dose and time series in HCT116 colon cancer cells

Results

The HYP is the foundation for scoring network models

The HYP represents the relationships between a set of measured data, here gene expression data, and a biological entity that is a known controller of those genes. Additionally, these relationships include the sign (positive or negative) of influence between the upstream entity and the differential expression of the downstream genes. The downstream genes of a HYP are drawn from a database of literature-curated causal biological knowledge (Figure

A more general causal network model can be constructed from a set of HYPs that are themselves causally connected by literature-derived edges (see Methods). Such a network model can be thought of as providing higher-level connections between HYPs by linking the upstream controllers of these HYPs, based on the pathway’s graph structure. Complex biological processes such as cell proliferation or cellular stress can be efficiently described by causal network models

A complex causal network model of biological entities can be transformed into a single HYP by collecting the individual HYPs representing entities in the model and regrouping the connections of all the downstream genes to a single upstream process representing the whole complex causal network model; this in essence is a flattening of the underlying graph structure (Figure

Scoring HYPs with four NPA methods

NPA scoring applies a defined algorithm to an experimental data set consisting in a series of treatment versus control comparisons, where the experimental data is filtered down to a particular scope of biology (and thus a particular set of gene expression relationships) by the context of a defined biological network model (Figure

In this study, gene expression data was used to demonstrate the NPA approach using four different scoring methods: Strength, GPI, MASS, and EPI (see Methods). Briefly, Strength is the mean treatment-induced differential expressions of the HYP’s downstream genes, adjusted for the sign of their causal connection to the upstream entity of the HYP. GPI is similar to Strength, except that the contribution of each gene is additionally weighted by taking into account the statistical significance of its differential expression. MASS is the change in absolute downstream quantities in a direction supporting an increase in the upstream entity (i.e., the sum of the measurements corrected for their causal connection to the upstream entity), divided by an average of a total absolute quantity of the downstreams. Finally, EPI is a “smoothed” version of GPI, obtained by averaging a slightly modified form of GPI over all possible values of a threshold for statistically significant differential gene expressions. Each method has specific yet complementary advantages (Table

**Method**

**Main features**

**Assumptions**

**Pros**

**Cons**

Strength

Linear, unbiased.

Based on log_{2} differential measurements (e.g., log_{2} fold changes of measurements).

The contributions from the noisy downstream measurables sum up to zero.

Intuitive.

Noisy/biased signals can artificially decrease/increase the results.

GPI

Based on log_{2} differential measurements.

Down-weights weak differential measurements using false non-discovery rates.

The noisy downstream measurables have low false non-discovery rates which can be used to minimize their contributions.

Intuitive.

False non-discovery rate depends on the number of experimental replicates.

False non-discovery rate depends on the number of experimental replicates.

MASS

Linear and unbiased in absolute non-log_{2} scale.

Dependent on absolute changes in measurements.

Absolute changes in measurements are more important than relative changes.

Intuitive.

Measurements must be directly comparable across all downstream measurables.

EPI

Based on log_{2} differential measurements.

Up-weights strong differential measurements without using false non-discovery rates.

The downstream measurables with higher differential values should have stronger contributions than those with lower differential values.

More robust to noisy signals than Strength.

Highest sensitivity to strong differential measurements.

Less intuitive.

Bootstrapping is needed for calculating Uncertainty.

Statistical annotation of NPA scores

Each NPA score, regardless of algorithm, represents an abstracted view of a set of biological measurements in the context of a particular HYP. As such, dozens, hundreds, or even thousands of measurements may be aggregated into a single score. In order to better characterize an NPA score and derive value from its use, additional statistics that qualify the score are required. Two such statistics, Uncertainty and Specificity, were developed. Uncertainty is a confidence interval for a particular NPA score, while Specificity tests whether an NPA score is specific to the downstream genes represented by a particular HYP, and not due to a general trend of the data (see Methods).

NPA scoring of an NF-κB HYP accurately assesses NF-κB activity

In order to evaluate the NPA approach, an NF-κB HYP was scored for a well-understood and controlled experimental system – TNFα-treated NHBE cells. The NPA results were then compared with an explicit measure of the NF-κB complex activity provided by its nuclear translocation.

Activation of the stress- and immune-response transcription factor NF-κB (

- n

- f

- kappa

- B

Each amplitude scoring method was investigated using a HYP created to be a specific measure of NF-κB activation, the NF-κB-direct HYP (Additional file

**The NF-κB-direct HYP.** Each row of the table contains a causal statement describing the connection between NF-κB and one of its direct target genes.

Click here for file

HYP scores for TNFα-treated NHBE cells.

**HYP scores for TNFα-treated NHBE cells.** Transcriptomic data from TNFα-treated NHBE cells was scored using each scoring method (Strength, GPI, EPI, and MASS) for **(a)** the NF-κB-direct HYP, **(b)** the IKK/NF-κB signaling HYP, **(c)** the TNF HYP, and **(d)** the E2F1-direct HYP. Error bars represent the 95 % confidence interval as determined by the Uncertainty statistic, and scores that failed the Specificity criterion (Specificity

HYP scores for CDK inhibitor-treated HCT116 cells.

**HYP scores for CDK inhibitor-treated HCT116 cells.** Transcriptomic data from HCT116 colon cancer cells treated with the CDK inhibitor R547 was scored using each scoring method (Strength, GPI, EPI, and MASS) for **(a)** the NF-κB-direct HYP, **(b)** the IKK/NF-κB signaling HYP, **(c)** the TNF HYP, and **(d)** the E2F1-direct HYP. Error bars represent the 95 % confidence interval as determined by the Uncertainty statistic. Scores that failed the Specificity criterion (Specificity

Next, NF-κB-direct HYP scores were compared to NF-κB nuclear translocation. Upon activation, NF-κB is transported into the nucleus where it acts to regulate the expression of many genes

**TNFα dose-dependent induction of NF-κB nuclear translocation.****(a)** Non-treated NHBE cells show a diffuse cytoplasmic staining of NF-κB (left) while after adding TNFα to the culture medium (right), the nucleus of stimulated cells is strongly labeled (magnification 10X/0.3); **(b)** NF-κB nuclear fluorescence intensity (FI) per dose of TNFα. For each group, the nuclear fluorescence intensity was measured in 500 cells per well of three replicates. Across-group comparisons by one-way ANOVA test were all p < 0.001.

Click here for file

**HYP scores versus NF-κB nuclear translocation.** The NF-κB-direct HYP scores for each amplitude scoring method (Strength, GPI, EPI and MASS) and each time point (30 minutes, 2 hours, 4 hours, 24 hours) plotted against NF-κB nuclear translocation at 30 minutes. Score error bars represent the 95 % confidence interval as determined by the Uncertainty statistic. Error bars in NF-κB nuclear translocation represent the standard deviation of the mean nuclear translocation for three different fields of view of the same population of cells.

Click here for file

HYP scores versus NF-κB nuclear translocation.

**HYP scores versus NF-κB nuclear translocation.** The NF-κB-direct HYP scores at 30 minutes for each amplitude scoring method (Strength, GPI, EPI and MASS) plotted against NF-κB nuclear translocation at 30 minutes. Score error bars represent the 95 % confidence interval as determined by the Uncertainty statistic. Error bars in NF-κB nuclear translocation represent the standard deviation of the mean nuclear translocation for three different fields of view of the same population of cells. The Pearson correlation between nuclear translocation and NPA score was 0.98 for Strength, 0.92 for GPI, 0.85 for EPI, and 0.97 for MASS.

NPA scoring of additional HYPs can quantitatively assess response to TNFα

The effects of HYP size and composition were investigated. First, the effect of hand-selecting a set of measurements that are known to be modulated by NF-κB specifically in a TNFα-dependent manner was assessed. A HYP was constructed from a set of 20 genes that were previously measured via RT-PCR to assess NF-κB activity in response to TNFα treatment in 3T3 mouse fibroblast cells (omitting 2 genes that have no direct human ortholog)

**Comparison of NF-κB-direct HYP scores with 20-gene NF-κB HYP scores.** Transcriptomic data from TNFα-treated NHBE cells was scored using each scoring method (Strength, GPI, EPI and MASS) for **(a)** the NF-κB-direct HYP, **(b)** a HYP composed of 20 NF-κB-regulated genes reported to be TNFα-responsive in mouse 3 T3 fibroblast cells (NFKBIA, CASP4, CCL5, TNFAIP3, CCL2, ZFP36, RIPK2, TNFSF10, NFKBIE, IL6, CCL20, ICAM1, TNFRSF1A, TNFRSF1B, SQSTM1, NRG1, SOD1, IL1RL1, HIF1A, ERBB2)

Click here for file

Next, the effects of using HYPs derived from upstream biological entities that are less proximal to the measurement were investigated. To do so, two additional HYPs were constructed: the IKK/NF-κB signaling HYP (Additional file

**The aggregated IKK/NF-κB signaling HYP.** Each row of the table contains a causal statement extracted from the Selventa Knowledgebase and obtained from the aggregation of the individual HYPs of the IKK/NF-κB signaling causal network model (Additional file 6).

Click here for file

**The IKK/NF-κB signaling causal network model.** The full causal model is given (top), along with a schematic of the basic model architecture (middle). CHUK, IKBKB, and IKBKG act as inhibitors of NFKBIA, NFKBIB, and NFKBIE, which are in turn inhibitors of NFKB1, NFKB2, and RELA. The nodes used in the model are listed under each section. The nodes in bold represent nodes that have downstream gene expression measurables in the knowledgebase, and the number of measurables is given in the square brackets (because the same downstream may be found under multiple nodes, these 1227 downstream measurables correspond to 992 unique measurables). The notations used in the knowledgebase are as follows: “CHUK P@S” represents CHUK phosphorylated at serine (where the residue is given if known), “CHUK P@ST” represents CHUK phosphorylated at serine or threonine (the exact residue is unknown), “kaof(CHUK)” represents the kinase activity of CHUK, “CHUK:IKBKB” represents the complex of CHUK and IKBKB proteins, “IkappaB kinase complex Hs” represents an aggregate of the various IκB kinases (CHUK, IKBKB, and IKBKG) in Homo sapiens (Hs), “degradationof(NFKBIA)” represents the process of NFKBIA degradation, and “taof(NFKB1)” represents the transcriptional activity of NFKB1.

Click here for file

**The TNF HYP.** Each row of the table contains a causal statement extracted from the Selventa Knowledgebase and describes a gene known to be modulated by the TNFα treatment of cells.

Click here for file

The IKK/NF-κB signaling HYP and TNF HYP give insight into the behaviors of HYPs at different levels of proximity to the measurements. The IKK/NF-κB signaling HYP is primarily composed of genes that are regulated (either directly or indirectly) by NF-κB (Additional file

NPA scoring detects limited cross-talk between NF-κB and cell cycle signaling

To assess the ability of HYPs to respond specifically to relevant TNFα signaling perturbations, a HYP was constructed for a key cell-cycle component, the transcription factor E2F1 (Additional file

**The E2F1-direct HYP.** Each row of the table contains a causal statement extracted from the Selventa Knowledgebase and describes the connection between E2F1 and one of its direct target genes.

Click here for file

**CellTiter-Glo® measurements of cell numbers.** CellTiter-Glo® fluorescence measurements of cell number after 24 hours of NHBE cell treatment with various doses of TNFα. Fluorescence intensity is reported as a percentage of the mock-treated sample, and error bars represent the standard deviation of the fluorescence intensity for three different fields of view of the same population of cells.

Click here for file

In order to provide a comparison of NPA results for biology not directly related to NF-κB signaling, the NPA response of the four HYPs introduced above (NF-κB-direct, IKK/NF-κB signaling, TNF, and E2F1-direct) were assessed in response to inhibition of cell cycle progression via a CDK inhibitor. Specifically, a publicly available microarray data set was used for treatment of HCT116 colon cancer cells with three different concentrations of the CDK inhibitor R547 (GSE15395)

Discussion

Previous work has demonstrated the utility of exploring the reverse causal interpretation of large scale data sets using HYPs as opposed to reasoning downstream from the data

Causal directionality is key for the HYP framework

For all NPA metrics, the proper scoring of a HYP is dependent upon the directionality (signs) of the causal influences linking the upstream biological entity represented by the HYP to the downstream genes whose expression it regulates. The knowledgebase harbors information about the specific signs (positive or negative) of the regulation exerted by the entity represented by the HYP on the expression of each of the downstream genes. The logic for incorporating differential gene expression measurements into an NPA score based on a knowledgebase-specified directional blueprint can be made via arguments against two specific alternative scoring schemes. The case of scoring an activity without taking into account the sign of causal influence in the HYP can make sense if the HYP represents a transcription factor that always activates or represses genes. However, if there are downstream genes in a HYP that are controlled in an opposite manner to the others, the error of an activity score based on an assumption of a single sign becomes apparent: the score contribution of genes that are known to be negatively regulated within a HYP might cancel, instead of add to, the score contribution of genes that are known to be positively regulated within a HYP. An alternative tactic would be to incorporate the absolute values of the differential expression for each gene. This has the problem of always producing positive scores for a HYP as well as artificially inflating scores: genes that change in a manner opposite of how a HYP is known to control a gene would add to a HYP activity score rather than detract from it. Standard gene enrichment analyses usually ignore regulation signs when scoring pathways

HYPs can be rapidly constructed from an appropriate knowledgebase

HYPs were constructed from the Selventa Knowledgebase, a database of causal biological knowledge that allows rapid creation of HYPs for any biological process, entity, or causally consistent network model that is adequately connected to gene expression changes (see Methods). The TNF and E2F1-direct HYPs were created from this knowledgebase without any additional literature or experimental investigation. For the NF-κB-direct HYP, because the content of the knowledgebase was biologically too limited in this context, additional genes that are directly regulated by NF-κB were mined from the literature and added to the knowledgebase as causal relationships. The additional knowledge concerning the direct effects of NF-κB was necessary to ensure a broadest representation of NF-κB biology. To construct the IKK/NF-κB signaling HYP, a network model was built by assembling causal relationships between relevant entities that were represented in the Selventa Knowledgebase. Similarly, HYPs and network models can integrate information from other sources besides the Selventa Knowledgebase, including literature articles and curated databases, as long as this information can be interpreted as signed causal relationships. The boundaries of HYPs and network models are defined during their construction (see Methods). For this study the model boundaries were chosen based on the biology known to be associated with the TNFα treatment of NHBE cells. In a case where the expected biology is unknown, a process of identifying biology is required to determine the most appropriate network models and HYPs to score. Such an exploratory perspective can be provided by evaluating the resulting HYPs using the RCR approach

Building the IKK/NF-κB signaling network model afforded the ability to aggregate the gene expression measurements that underlie all the individual HYPS of a specific NF-κB network and provide a single score for that network. However, in condensing a complex model into a single score, there are caveats to consider. Gene expressions that have an ambiguous relationship to the network (both causally positively and negatively regulated) must effectively be removed for scoring purposes. These ambiguities affect approximately 6 % of the downstream genes of the aggregated IKK/NF-κB signaling HYP, which is similar to the case of single HYPs: the NF-κB-direct and the TNF HYPs contains approximately 4 % and 7 % ambiguous downstream genes, respectively. Their actual impact on the NPA results is expected to remain limited (see below).

Additionally, resolution with regard to which individual entities of the network model are being perturbed is also diluted in the overall network score. Thus, when generating an aggregated HYP for a network model, key information about the network is not explicitly available, and it is important to keep these features in mind when interpreting NPA scores (see below for a further discussion of the methodological perspectives). However, despite these caveats, the IKK/NF-κB signaling HYP produced a near-identical pattern of response as the NF-κB-direct HYP, and thus is also correlated with the measured physiological endpoint, NF-κB activation.

This finding that similar NPA results were obtained using HYPs featuring different characteristics highlights an essential aspect of this work: using the same network model for calculating the NPA scores of the various experimental conditions to be compared (e.g., for all TNFα doses and post-treatment times) provides results that are robust against having exhaustively captured the perturbed biology in the network model used for NPA scoring. This aspect is fundamental, especially given the practical impossibility of constructing networks models capturing all of the biological processes potentially perturbed in a given experiment. It is also exploited when constructing network models describing processes that are sufficiently generic, e.g., cell proliferation or cellular stress

The robustness of the results also preserves the validity of the NPA approach against the possibility of HYPs and network models evolving slightly due to the constantly improving understanding of the biological processes they describe. This property was concretely tested with a simple step-back calculation consisting of randomly removing edges to the HYPs and comparing the corresponding NPA results to the original ones. The results demonstrated a remarkable robustness: typically, after removal of 20 % of their downstream genes, the four HYPs used in this work returned GPI profiles that correlated extremely well with their original values shown on Figure

NPA scoring methods accurately assess biological activation

Four different algorithms were developed to quantify the amplitude of perturbation of a HYP. Each method employs a different approach to evaluate the degree of perturbation between two experimental measures for a given HYP (see Table

Future work will confirm the circumstances in which each method is expected to be the most appropriate. For example, the MASS algorithm was developed to use absolute measurement technologies such as an absolute transcript count offered by quantitative next generation sequencing. Given more appropriate measurement technology, the MASS method may be more applicable in circumstances where small differential expressions in a set of highly expressed genes have a dominant effect over large differential expressions in a set of weakly expressed genes. On the other hand, the GPI algorithm, and to an even larger extent the EPI algorithm, down-weight the contributions from genes with poor statistical significance, favoring small sets of strongly differentially expressed genes rather than large sets of weakly differentially expressed ones. From this point of view, the Strength algorithm is unbiased since it contains no weighting factors. However, because an NPA score represents a condensed view of the biology underlying a HYP, the ability to assess the amplitude of its perturbation with complementary NPA methods also highlights which conclusions are robust versus which conclusions may be specific to a particular NPA method. For example, the four NPA methods supported the same time- and dose- dependent NF-κB activation in response to TNFα (Figure

There are some important considerations when using NPA scoring methods to evaluate HYPs. Scores are meant to be directly compared between different treatment versus control contrasts when using the identical HYP. Scores cannot be quantitatively compared between two different HYPs without first verifying that the relationship between a change in the activity of the HYP’s upstream entity and the resulting change in the expression of downstream genes is conserved between the two HYPs. In general, this relationship is not expected to be conserved due to differences in the dynamic range of expression of individual genes that compose each HYP. Additionally, a HYP with a higher number of downstream gene expressions may be expected to represent broader biology than a smaller HYP, and thus in any given experiment, a smaller fraction of genes in the larger HYP may be perturbed, resulting in a lower score than a smaller HYP. However, additional statistical power is gained in the Uncertainty and Specificity statistics with increasing number of downstream gene expressions in a HYP, such that the weaker scores of larger HYPs can be just as significant and meaningful as higher scores from smaller HYPs.

Although scores cannot be directly compared between two different HYPs, the pattern of scores across a set of contrasts can be qualitatively compared. Likewise, the absolute magnitude of the NPA scores should not be directly compared between two amplitude scoring methods, but the pattern of scores across NPA scoring methods can be qualitatively compared, keeping in mind that the scoring methods may be assessing different aspects of the contrasts.

The NPA score represents an abstracted measure of the data represented in the HYP. The score captures the amplitude of the perturbation of a HYP, but does not capture which genes in the HYP most strongly contribute to the score. For example, of the 20 genes that contribute most to the IKK/NF-κB signaling HYP score upon TNFα treatment (100 ng/mL, 24 hour), only one is in common with the 20 genes that contribute most to the IKK/NF-κB signaling HYP upon CDK inhibition (0.6 μM, 24 hour). Given that the NF-κB-direct HYP consists of only 155 genes, this suggests that there is a significant difference in the biology represented by the NF-κB-direct HYP scores in these two cases.

Uncertainty and specificity of NPA scores

Uncertainty estimates the confidence interval of each NPA statistic, and therefore also tests the nullity of the score accounting for the experimental error. The Specificity statistic gives a measure of whether the score is dependent on the expression of specific genes in the HYP, or is instead dependent on a particular property (the likelihood of modulation) of the ensemble of gene expressions in the HYP. Although this definition of Specificity is useful, there are some important points to ensure that Specificity is interpreted appropriately. First, a weak Specificity does not mean that the score fails to accurately characterize the amplitude of the process described by the HYP. Rather, it means that many other

Together, the Uncertainty and Specificity statistics enable the identification of non-specific and non-significant scores in HYPs when scored against unrelated perturbations. These statistics demonstrate that TNFα treatment of NHBE cells only has a significant effect on cell cycle progression when the dose is above 0.1 ng/mL, and that this effect takes two-to-four hours to appear. Also, these statistics support the conclusion that some NF-κB-regulated genes are upregulated at 6 and 24 hours after CDK-inhibition in HCT116 cells, but likely not at 4 hours or earlier.

Potential applications beyond the comparative assessment of biological impact

The NPA approach developed in this work aims at quantifying the treatment-induced perturbations of the biological processes described by causal network models. It enables the comparative assessment of the biological impact from high-throughput data in response to given stimuli. However the NPA approach could be also used in more exploratory perspectives. For instance, by applying NPA scoring to each HYP in a causal network model, rather than constructing and scoring a single aggregated HYP for the model (Figure

Finally, NPA scores could be used as a supplementary source of information in studying different types of mathematical models of regulatory networks. The fact that NPA scores provide quantitative measurements for the response of entities that are not explicitly measured or measureable can be exploited in the construction, calibration, or evaluation phases of such models. For instance, in the case of TNFα-treated NHBE cells considered in this study, the NPA scores provide direct quantitative measurements of the inflammatory response of the system, a quantity that would be difficult to access in the absence of the NF-κB nuclear translocation measurements performed in this work.

Conclusions

NPA is an integrated approach that combines high-throughput experimental data and a knowledge-driven HYP, which provides measurable quantities causally affected by a targeted biological process, to quantify the activity changes of that process relative to a control (non-perturbed) state of the system. The utility of the NPA method lies in the synergy of on-demand HYP generation from an extensive causal knowledgebase with a continuous measure of its activity change. Four NPA scoring methods with complementary strengths performed similarly in this study, but individually have the potential to wield distinct advantages for specific circumstances. Additionally, qualifying NPA statistics enabled effective use of these scoring metrics and can be applied to similar methods developed elsewhere. When applied to TNFα- and CDK inhibitor-treated cell microarray data, NPA scores for NF-κB and cell cycle networks correlated with expected dose response relationships and specific measured pathway outputs. NPA scoring also suggested possible cross-talk between NF-κB activation and the cell cycle that could be investigated experimentally. With a broad spectrum of biology available to score within the Selventa Knowledgebase, NPA metrics and statistics can be used to assess amplitude of perturbation on many orders, from a single molecule to that of a complex, higher-order causal network model representing complex biological processes. This approach enables a quantitative, systems-wide understanding of the biological mechanisms leading to diseases. This is the first step towards the development of computational tools designed to comparatively measure any perturbation, including exposure to toxic substances, the effects of drug treatment, or patient stratification by individual biology.

Methods

Experimental procedure

Normal human bronchial epithelial cells (Lonza Walkersville, Inc.) were cultured in standard growth medium (Clonetics medium, Lonza Walkersville, Inc.). Cells were either treated with TNFα (Sigma) or a vehicle control (HBSS), and then harvested after the desired treatment length (30 minutes, 2 hours, 4 hours, or 24 hours). Cells were immediately put on ice and split into three technical replicates from which total RNA was extracted using RNeasy™ Microkit (Qiagen). The processed RNA samples are then hybridized to Affymetrix U133 Plus 2.0 microarrays. Cell viability and cell counts were controlled for all conditions after 24 hours with CellTiter-Glo® assay (Promega). NF-κB nuclear translocation was measured using Cellomics NF-κB Activation HCS Reagent Kit (Thermo Scientific).

Data processing and algorithm implementation

Data processing and NPA methods were implemented in the R statistical environment ^{a}. An overall linear model was fit to the data for all groups of replicates, and specific contrasts of interest (comparisons of “treated” and “control” conditions) were evaluated to generate raw

Probe sets were matched to RNA Abundance nodes in the Selventa Knowledgebase (see below) using the

The Selventa Knowledgebase

The Selventa Knowledgebase is a comprehensive repository containing over 1.5 million nodes (biological concepts and entities) and over 7.5 million edges (assertions about causal and non-causal relationships between nodes). The assertions in the knowledgebase are derived from peer-reviewed scientific literature as well as other public and proprietary databases (Figure

Constructing HYPs and causal networks models

In addition to edges that enable the construction of HYPs (i.e., causal relationships between one upstream entity and its downstream measurables; Figure

The principles guiding the construction of the HYPs used in this work were pragmatic (e.g., the NF-κB-direct, TNF, and E2F1-direct HYPs, see Results). The downstream measurable nodes derived from the Selventa Knowledgebase in an automated manner are gene expressions (“exp(…)” nodes on Figure

Causal network models (e.g. the IKK/NF-κB signaling network model, see Results) are constructed by manually assembling causal relationships connecting HYP upstream entities derived from the Selventa Knowledgebase (Figure

It is important to note the difference between the knowledge-based causal network models used in this work and the networks constructed from expression data using network inference approaches

Constructing a HYP from a causal network model

A causal network model is composed of multiple causally-linked nodes that are biologically related, including HYPs

For a network model to be causally consistent, for an increase in any node in the model, it must be possible to unambiguously map a sign of “positive regulation” or “negative regulation” on every other node in the model by following the causal relationships that connect the nodes. For example, any model with negative feedback loops cannot be used to construct an aggregated HYP. Biological interpretation can be used to resolve ambiguities to construct causally consistent models by considering what process is being scored by the HYP, and in what sign each node is effectively related to the reference node. For example, the node where a negative feedback connects back to the model has a particular relationship with the process being scored, and although the negative feedback may regulate this node, it should not change this relationship. Thus, the connection between the negative feedback loop and this node can be removed from the model to obtain causal consistency in a manner that is congruent with our biological expectations. Such a biology-driven procedure to build causally consistent network models implies a careful definition of the network model boundaries. Resolving the potential sign ambiguities in its nodes involves additional refinements in delimiting the biological processes described by the model.

NPA scoring algorithms

Strength

The Strength NPA scoring method was developed as the simplest measure of HYP amplitude – the weighted mean of the measurement differences (here, gene log_{2} differential expressions) where the weighting factors are the signs of the measurable downstream entities in the HYP:

where _{i} is the log_{2} differential expression (i.e., the log_{2} fold change) of the ^{th} gene in the HYP, and _{i} is the sign (+1 for positive regulation and −1 for negative regulation) of the ^{th} gene in the HYP, and _{2} differential gene expressions with positive regulation in the HYP minus the sum of the log_{2} differential gene expressions with negative regulation in the HYP, divided by the total number of gene expressions in the HYP. Thus, a positive Strength score indicates that the HYP’s downstream gene expressions and their signs of regulation are matched within the data, and the process described by the HYP is upregulated in the treated condition compared to the control condition. A zero Strength score indicates that the process is unchanged, and a negative Strength score indicates that the process is downregulated.

The Strength scoring method uses data for all measured downstream gene expressions in the HYP, regardless of data quality. However, the differential expression values used by the Strength method may be dominated by noise for low absolute measures of the control condition, and thus may provide unreliable data (which should be evidenced by high uncertainty). The Strength method assumes that this noise is evenly distributed and that there are a sufficient number of measurable downstreams in the HYP such that measurement noise is averaged out across all downstreams.

Geometric perturbation index

A HYP can be seen as a unit sign vector _{2} expression vector _{i} _{i}) _{i} are obtained from the raw ** ŝ|β**›

Note that Strength and GPI are closely related – GPI is normalized by _{2} expression with false non-discovery rate, individual differential expression values for which there is little confidence are moved closer to zero (no change), while values for which there is stronger confidence are minimally decreased. A positive GPI score indicates an upregulation of the process described by the HYP, a zero GPI score indicates that the process is unchanged along the direction

Measured abundance signal score

Both Strength and GPI quantify log_{2} differential values (i.e., log_{2} fold changes) of measurements in the HYP. However, there may be cases where an absolute change in mRNA, protein, or some other measurable physical quantity is a better measure of the biological effect on the process represented by a HYP. For example, an increase from 1 to 10 copies of an mRNA transcript (an absolute increase of 9 transcripts) may be less significant than an increase from 10 to 100 copies of the same transcript (an absolute increase of 90 transcripts). An NPA scoring method was devised to quantify absolute changes in entities that represent physical quantities, Measured Abundance Signal Score (MASS):

where _{i} is the measurement (not in log_{2} scale) for the ^{th} downstream measurable (here, gene expression) in the treated sample, and _{i} is the measurement (not in log_{2} scale) for the ^{th} downstream gene expression in the control sample. The numerator of MASS represents the change in the absolute downstream gene expression quantities in a direction supporting an increase in the process described by the HYP. The denominator of MASS represents the average of the total absolute quantity of the downstream gene expressions. Thus, MASS can be thought of as quantifying the absolute change in the downstream gene expressions (corrected for the predicted sign _{i} of each downstream in the HYP) compared to the total quantity of the downstreams. Rather than use the total quantity of the downstreams in either the treated or control condition alone, the average of the treated and control conditions was used to ensure that the MASS scoring method is symmetric about the experimental contrast (i.e., MASS(treated versus control) = −MASS(control versus treated)).

The MASS method is applicable to any measurement technique that quantifies physical measurables in a manner such that measurements are proportional to absolute quantities across all measurable entities (i.e., the measurements for different entities can be compared directly). It should be noted that for microarrays, this implementation is a proof of concept as there is only a loose correlation between the signal for different probe sets and the absolute amount of the transcripts

Expected perturbation index

The Expected Perturbation Index (EPI) can be thought of as a smoothed version of the GPI. The rationale is to consider a HYP as a random variable over a compact interval [−_{2} scale, typically _{HYP} of the HYP: for any _{i}_{i} above

The normalization of the measure _{HYP}_{HYP} between [−_{HYP} defined above:

The EPI expresses the fact that high correctly predicted differential expressions _{i}_{i} provide stronger evidence for the HYP than low ones and will therefore receive higher weights. The implementation is done by using the method of rectangle on the differential expression values, leading to the following formula to compute the EPI:

where _{i} is _{i}_{i}, _{+} and _{−} are the number of differential log_{2} expressions positively and negatively predicted by the HYP, respectively, subscripts _{i}_{i}_{,}, and _{0}

Note that the EPI is, in absolute value, bounded by

NPA scoring statistics

Uncertainty

Each NPA score is a random variable. As such, the statistical significance of the NPA scores can be assessed by estimating confidence intervals around the score, or, equivalently, by evaluating the null hypothesis of the score equaling zero at a given type-I error risk

The theoretical distribution of the differential log_{2} expression is deduced from the estimation and test procedure used. In the case of t-statistics or moderated t-statistics (e.g., produced from a linear model by the _{i}’s is assumed to be normal with variance _{i}^{2} (having _{i} degrees of freedom).

In this context, Strength becomes a random variable consisting of a weighted sum of independent (approximately) normal distributions. As a consequence, the distribution of the Strength statistic is (approximately) a normal distribution itself, with variance _{Strength}^{2} ^{2}_{i}^{2}. Hence, a t-statistic _{Strength} can be derived, whose degrees of freedom

In the case of the GPI, the situation is almost identical to the Strength, except for the additional dependence on the weighting factors _{i}. In turn, _{i} depends on the non-adjusted _{i}, _{i} _{i}/_{i}. The key step toward a test statistic for the GPI is the estimation of the variance of GPI. To this end, the variance is estimated through a first-order Taylor expansion to obtain:

where:

Using the central limit theorem, an approximate t-statistic is derived as _{GPI}, whose degrees of freedom

In the partial derivatives of the GPI with respect to _{i}, the derivative of _{i} = 1- _{i} involves two terms: one involving the derivative of the Benjamini-Hochberg adjustment factor (_{i}_{i}), multiplied by _{i} (the ^{th} gene), and the other one involving the derivative of _{i}. The former is assumed to be zero, because the adjustment factor (_{i}_{i}) is important only when the _{i} are small. The latter is computed analytically. Hence the (1-

In the case of MASS and EPI, a parametric bootstrapping of the distributions of _{i} was used, taking advantage of the normality of the estimators _{i} deduced from the statistical approach used to compute the differential gene expressions (see above). As the assumption for the application of percentile bootstrapping seems to be violated for MASS and EPI, the confidence interval was estimated using the bias-corrected percentile method

Specificity

It is important to consider whether the computed NPA score is specific to the HYP of interest, or is a general property of the entire data set. For example, a score that indicated a two-fold increase in a given process holds less meaning if all measurements in the entire data set also increased two-fold. Thus, the Specificity statistic is computed as a means to identify scores that can be attributed with high probability to the specific biological entity or process represented by the HYP. Specificity is computed by assessing the likelihood of the following null hypothesis: “The amplitude score is not representative of the specific HYP, but instead is representative of a general trend in the data set that can be measured by any HYP that is comparable to the HYP of interest.” The first step to computing Specificity is to construct a set of comparable HYPs (see below). Next, an amplitude score is computed for each of these HYPs using the same data set. Finally, Specificity is computed as a two-tailed

**Computing Specificity statistics.** The histogram of MASS scores for HYPs comparable to the NF-κB-direct HYP (1 ng/mL TNFα treatment of NHBE cells for 0.5 hour). The top histogram resulted from selecting measurables at random from all measurables (Specificity

Click here for file

The key to computing the Specificity statistic is constructing relevant comparable HYPs. A simple comparable HYP could be composed of downstream measurable entities selected at random from the set of all measurable entities to produce a HYP of the same size as the HYP of interest. However, for some data sets including gene expression microarrays, many of the measurable entities are highly unlikely to change under any given circumstance (for example, genes whose expression are measured with ineffective probe sets). The HYP of interest will likely contain fewer of these entities than any “comparable” HYP because these entities are less likely to be affected by the process of interest, and are thus unlikely to be included in the HYP of interest. Therefore, a comparable HYP constructed by random sampling from all measurable entities will be biased towards showing a weaker (or no) change. This makes the amplitude score for the HYP of interest appear more unlikely to have occurred by chance given the null hypothesis – and thus be more specific – than might have otherwise been expected (Additional file

The large body of causal knowledge available was used to construct more relevant comparable HYPs for transcriptomic data by first identifying the number of upstream controllers (distinct entities upstream of a gene in causal statements in the knowledgebase) for each gene in the entire data set, including the genes in the HYP. The number of upstream controllers reflects the number of different experiments or perturbations that caused the gene to be modulated, and acts as a naïve estimate for the likelihood of each gene being modulated in the current experiment. For example, a gene that is only regulated under very specific circumstances is unlikely to be modulated in many data sets that are curated in the knowledgebase. Thus, only a few entities that causally regulate the gene will exist in the knowledgebase. In contrast, a gene whose expression is modulated by a large number of experimental perturbations is likely to be modulated in many data sets that are represented in the knowledgebase, and thus the knowledgebase will likely contain knowledge of many entities that causally regulate the gene. Therefore, a comparable HYP is constructed by replacing each gene in the original HYP with another measured gene with a similar number of upstream controllers.

In order to accomplish this, all measured genes were ranked based on their number of upstream controllers in the knowledgebase, and divided into cadres of a fixed number of measurables. To avoid having a cadre containing only a few measurables (for example, when the number of measurables is not divisible by the desired cadre size), the cadre that contained measurables with the fewest number of upstream controllers was allowed to have more measurables than the other cadres. For example, a cadre size of 100 measurables was used, so the cadre that contained measurables with the fewest number of upstream controllers had between 100 and 199 members. Comparable HYPs were constructed by swapping each measurable in the original HYP for another measurable from within the same cadre. In this manner, the comparable HYPs were the same size as the original HYP and contained the same distribution of frequently and infrequently modulated measurables. This process enabled the construction of alternative HYPs that were as comparable as possible to the HYP of interest, given the knowledge available in the knowledgebase.

NPA scores were computed for each comparable HYP, and these scores were sorted into ascending order. The fraction of scores that were greater than and less than the score for the original HYP were counted, and the lesser of these two values was doubled to arrive at a two-tailed

Comparable HYPs constructed in this manner provide a more stringent test to assess the Specificity of amplitude scores. Given the benefits offered by this method of computing Specificity, it was carried forward for this study.

Endnotes

^{a} The gene expression data used in this publication have been deposited in ArrayExpress and are accessible through accession number E-MTAB-1027

Competing interests

The authors declare that they have no competing financial or other interests in relation to the work described in this manuscript.

Authors’ contributions

FM, TMT, AS, and DAD lead the development, implementation, and application of the methodology, the interpretation of the results, and the preparation of the manuscript. CM and DW designed and supervised the experiment, and interpreted the experimental results. DP conceived the NPA approach and developed the methodology. JH developed the overarching strategy and supervised the project. MCP conceived the NPA approach and developed the overarching strategy. All authors read and approved the final manuscript.

Acknowledgements

The authors would like to thank Stephan Gebel and Nadine Schracke and their laboratory technicians Christina Berger, Marek Franitza, Torsten Wilms and Jessica Buechel for the excellent execution of the nucleic acid research experiments, Andrea L. Matthews and Michael J. Maria for their role in project management and support with preparation of the manuscript, Renee D. Kenney for critical discussions and support with preparation of the manuscript, Dmitry Vasilyev for verifying the uncertainty calculations, Svenja Diehl, Thomas Müller, and Vincenzo Belcastro for reviewing the manuscript, Sam Ansari for reviewing the manuscript and generating the MAGE-TAB files for the ArrayExpress submission, and Lynda Conroy for editorial support.