Division Medical Biometry, Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center Johannes Gutenberg University Mainz, Obere Zahlbacher Str. 69, 55131 Mainz, Germany

Abstract

Background

In network meta-analyses, several treatments can be compared by connecting evidence from clinical trials that have investigated two or more treatments. The resulting trial network allows estimating the relative effects of all pairs of treatments taking indirect evidence into account. For a valid analysis of the network, consistent information from different pathways is assumed. Consistency can be checked by contrasting effect estimates from direct comparisons with the evidence of the remaining network. Unfortunately, one deviating direct comparison may have side effects on the network estimates of others, thus producing hot spots of inconsistency.

Methods

We provide a tool, the net heat plot, to render transparent which direct comparisons drive each network estimate and to display hot spots of inconsistency: this permits singling out which of the suspicious direct comparisons are sufficient to explain the presence of inconsistency. We base our methods on fixed-effects models. For disclosure of potential drivers, the plot comprises the contribution of each direct estimate to network estimates resulting from regression diagnostics. In combination, we show heat colors corresponding to the change in agreement between direct and indirect estimate when relaxing the assumption of consistency for one direct comparison. A clustering procedure is applied to the heat matrix in order to find hot spots of inconsistency.

Results

The method is shown to work with several examples, which are constructed by perturbing the effect of single study designs, and with two published network meta-analyses. Once the possible sources of inconsistencies are identified, our method also reveals which network estimates they affect.

Conclusion

Our proposal is seen to be useful for identifying sources of inconsistencies in the network together with the interrelatedness of effect estimates. It opens the way for a further analysis based on subject matter considerations.

Background

Evidence from various treatment comparisons in different randomized trials can be combined by a network meta-analysis. This method not only aggregates evidence from direct comparisons, but also involves indirect comparisons, i.e. relative effect inferences for previously observed or not observed contrasts. References

In this context, inconsistency means disagreement between direct and indirect evidence that can occur in addition to heterogeneity between studies with the same treatment arms. A network meta-analysis can be visualized by a graph, whereby the set of nodes corresponds to the considered treatments and the edges display the treatment comparisons of all included trials. If corresponding treatment effect estimates of various connections, or so called paths, differ between two treatments, there is inconsistency. Since the start and end point for different alternative network paths are the same, inconsistency can only be detected in such network loops

In the following, we therefore provide methods for identifying such hot spots, which might consist of loops, parts of loops or even just single comparisons. We also investigate the influence of individual comparisons on the network estimates that might drive further perturbation and invalid network estimates due to the network design.

Different approaches to assess inconsistency have been discussed. The series of Technical Support Documents produced by the NICE Decision Support Unit

Finally, consistency can be assessed by comparing a model that satisfies only some consistency restrictions (or no restrictions at all) with the consistency model. The node-splitting method

Lu and Ades

In this paper, we will define another global chi-squared test for inconsistency that results by comparing a fixed-effects model for inconsistency with a consistency model. It will emerge as a part of the decomposition of Cochran’s

Once inconsistency has been assessed globally, means are needed to find its sources. Senn et al.

In the following, we systematically develop a graphical tool for highlighting hot spots of inconsistency by considering the detailed change in inconsistency when detaching the effect of studies with the same treatment arms. Furthermore, we identify drivers for the network estimates. Highlighting of inconsistency will provide more information than just singling out inconsistent loops. We provide a matrix display that summarizes network drivers and inconsistency in two dimensions, such that it may be possible to trace inconsistency back to single deviating direct comparisons. Naturally, it is difficult to display detailed network properties in just two dimensions, but we propose a clustering approach that automatically groups comparisons for highlighting hot spots.

Section “Methods” provides a detailed description of the different building blocks of our proposal: We present a fixed-effects model for network meta-analyses within the framework of general linear models with known variances in Section “Parameterization and two-stage analysis of a fixed-effects model in network meta-analysis”. Based on this model, we discuss the resulting hat matrix in Section “Identifying drivers via the hat matrix”, which we use as an instrument for identifying drivers. We suggest using a chi-squared statistic for the heterogeneity in the network, which we decompose into a test statistic for the inconsistency and a test statistic for the heterogeneity within groups of studies, classified according to which treatments are involved. A graphical tool that visualizes the network drivers and inconsistency hot spots is given in Section “Identifying hot spots of inconsistency”. Specifically, we use the inconsistency information along with detaching of single component meta-analyses to locate inconsistency hot spots. All the steps in Section “Methods” are illustrated using artificial examples. Section “Results” then provides results for two published network analyses. Finally, we discuss our methods and results in Section “Discussion”, and we provide concluding remarks in Section “Conclusions”.

Methods

In the following, we provide a fixed-effects model for network meta-analyses, on which we base our further analysis. We present tools to identify hot spots of inconsistency in the network and drivers with a high impact on network estimates. Using these two tools, we provide a graphical display to locate potential sources of inconsistency.

Parameterization and two-stage analysis of a fixed-effects model in network meta-analysis

We consider a network meta-analysis with _{0},…,_{
T
}, under which _{0} represents a reference treatment. A total of _{
s
} and by a design index _{
d
} different treatments.

Network design and hat matrix of an illustrative network meta-analysis

**Network design and hat matrix of an illustrative network meta-analysis.** In **a**), the network design of an illustrative example is given: six treatments and eight different observed designs based on two-armed studies. The nodes correspond to the treatments, and the edges show which treatments are directly compared. The thickness of an edge represents the inverse standard error
**b**), the resulting hat matrix at the design level is given in percent, which indicates the contribution of the direct estimate in design

For a fixed-effects analysis, this network can be written in matrix notation as the following general linear model with heteroscedastic sampling variances:

^{net} (in terminology of

For exemplifying the model components, we consider a simple example of a network meta-analysis with three treatments _{0},_{1},_{2} (_{1} versus _{0} (_{2} versus _{0} (_{2} versus _{1} (_{1} versus _{0} (_{2} versus _{0} (_{
s
} be the observed effect and _{
s
} is the corresponding sampling variance in study

The vector of the basic parameters ^{net} can be estimated in a classical frequentist manner by generalized least squares as follows:

which is sometimes referred to as the Aitken estimator

This estimation can equivalently be performed in two steps (as discussed in

Thus, evidence of all studies with the same treatment arms

with _{
a
})=0 and _{
a
})=:_{
a
}. The covariance matrix is given by
_{
a
} is the compressed design matrix containing one set of rows for each design. In the case of two-armed studies, the design matrix _{
a
} is formed by stacking one row over each of the other’s rows for each type of design. In the example above we have:

Multi-armed studies

We distinguish each set of multi-armed studies sharing the same set of treatments as a different design. That means that if we add a three-armed study for _{2} versus _{1} versus _{0} to the example above, we consider a further design (

Since the effects observed in one multi-armed study cannot be inconsistent, we use one design-specific treatment as a study reference for each multi-armed study, e.g. _{0} in all studies comparing _{2} versus _{1} versus _{0}. Then, a study with _{
s
} of _{
s
}=(_{0:1},_{0:2})’ of comparison _{1} versus _{0} and comparison _{2} versus _{0}. Furthermore, the multi-armed study gives _{
s
} of size _{
a
} contains then

Identifying drivers via the hat matrix

In linear models, the hat matrix contains the linear coefficients that present each predicted outcome as a function of all observations. Its diagonal elements are known as leverages. They summarize the importance of the respective observation for the whole estimation. Observations with both high leverage and large residual are recognized as being highly influential

In the context of network meta-analyses and model (5), the hat matrix is:

Its rows are the linear coefficients of

In network meta-analyses, the diagonal elements of

As an illustration of the hat matrix, we use an example of a network meta-analysis with six treatments (_{
d
}=2 for all ^{−1/2}, which is equal to one for all _{
a
}=_{8}, where _{8} is the identity matrix of size eight). For one design there might, for example, be one study with

The diagonal squares indicate that the network estimates are predominantly driven by their corresponding direct estimates, all more than 50%. The diagonal squares are the largest for the edges 1:6 and 3:4 that intercede between the two triangles. Their direct estimates drive 70% of their network estimates. The smallest diagonal squares are seen for the edges 1:3 and 4:6 (direct estimates drive 53%), since the latter ones are paralleled by two independent indirect paths and the former ones only by one. Inspecting the off-diagonal squares, we learn that aside from its direct estimates, the network estimates

Identifying hot spots of inconsistency

Decomposition of Cochran’s Q

An important aspect in meta-analysis is to investigate statistical heterogeneity. In network meta-analysis inconsistency arises as another aspect of heterogeneity. In a classical meta-analysis comparing two treatments, Cochran’s Q

To examine the heterogeneity of the whole network in more detail, particularly considering the inconsistency in the model, we decompose the ^{net} statistic into two parts (similar to

The first is a sum of within-design Q statistics

The second is a between-designs Q statistic

The heterogeneity of the whole network can be assigned to the heterogeneity between studies by ^{het}, related to each design ^{inc}. Under the null hypothesis for both homogeneity and consistency, all Q statistics ((7), (9), (10), (11)) are approximately chi-squared distributed with respective degrees of freedom given in Table
^{inc} are identical to those defined in

**Null hypothesis**

**Q statistic**

**Degrees of freedom**

In a network with _{
s
} and _{
d
} respectively.

Homogeneity in the whole network

^{net}

Homogeneity within designs

^{het}

Homogeneity within design

Consistency between designs

^{inc}

Consistency between designs after

detaching the effect of design

For example, for the network design in Figure

In real applications, the power may be small
^{inc} or ^{het}. That is why inconsistency and heterogeneity must be considered jointly.

As network estimates, we obtain in the example
^{inc}=3.36+1.78+0.25+3.36+0.25+0.03+0.11+0.03=9.17 results that is chi-squared distributed with 8−5=3 degrees of freedom. Since there cannot be heterogeneity between studies, in this example ^{inc} and
^{net} and

If some of the component meta-analyses are heterogeneous, the others can still validly be tested by their
^{inc} has some interpretation in this case: The direct estimates are estimates of the inverse variance-weighted averages of different true but unknown study-specific treatment effects. Then, ^{inc} with the same reference distribution provides a valid test of the hypothesis of consistency of these averaged treatment effects.

Detaching a single design

Once inconsistency is indicated by a large ^{inc}, formula (11) can be used to assess the contribution of each component meta-analysis of design ^{inc} is the sum of quadratic forms of residuals over all designs. For simple comparisons between two treatments, the summands are squared Pearson residuals. Unfortunately, a deviating effect of one component meta-analysis can simultaneously inflate several residuals. Therefore, we fit a set of extended models allowing for a deviating effect of each study design in turn and recalculate the

More formally, we modify model (5) by inserting _{
d
}−1 new parameters
_{
d
}−1 columns for each design with _{
d
} treatments. Each additional column corresponds to one of the non-reference-treatments. We have the following model:

with _{
a
} as previously. In this model, the parameters ^{net} capture all network evidence without the information from studies with design

that is chi-squared distributed with

of length

For illustration purposes, we successively introduce one new parameter for each of the eight possible detachments of one component meta-analysis into the inconsistent network example from Section “Decomposition of Cochran’s Q” corresponding to Figure

Finally, to locate the inconsistency in the network, we compare the remaining inconsistency after exclusion of design ^{′}=1,⋯,

Here,

is the summand in ^{inc} belonging to design ^{′}=

In the example, holding out design 1:2 results in a perfect fit of model (12) and we obtain

The net heat plot

For a graphical inspection of network inconsistency, we use a color visualization of the quadratic matrix
^{inc} statistic, the corresponding diagonal elements of the plot have non-blue colors. Warm colors on the off-diagonal of the plot indicate that a detachment of the component meta-analysis with design

Designs where only one treatment is involved in other designs of the network (for example design 6:7 in Figure

Network design of an illustrative network meta-analysis

**Network design of an illustrative network meta-analysis.** The nodes correspond to eight treatments and the edges display observed treatment comparisons. Design 6:7 and 3:4 do not contribute to the inconsistency assessment and are not incorporated into a net heat plot.

For the arrangement of the rows and columns of the plotted matrix, we use the sum of the absolute distances between the rows and the absolute distances between the columns of

In the plot we also draw gray squares, as shown in Figure

Further illustrative examples

To illustrate the application of the net heat plot, we consider the network example from the previous sections and Figure
_{
d
}=2). These networks are displayed as graphs in Figures

Five illustrative network meta-analyses with net heat plot

**Five illustrative network meta-analyses with net heat plot.** In **a**) to **e**), the network design is shown on the left: six treatments and six, eight or fifteen different observed designs based on two-armed studies. The nodes are placed on the circumcircle and are labeled according to the treatments. The edges show which treatments are directly compared. The thickness of an edge represents the inverse standard error

Because the network structures and the assumed precisions of the direct effects are the same in scenarios a) to c), they share the same hat matrix, which is discussed in Section “Identifying drivers via the hat matrix” and illustrated in Figure

In scenario a), inconsistency is introduced through the treatment effect in design 1:2. The overall inconsistency statistic is ^{inc}=9.17 (^{inc}. The latter ones have higher residuals, although their direct estimates drive their network estimates more strongly, with 63% in contrast to 53% in the case of design 1:3. This can be seen in the hat matrix elements that are displayed here by the area of the squares. The warm-colored off-diagonal elements in the column of design 1:2 or 2:3 are equal to the colors on the diagonal, which indicates a complete elimination of inconsistency in the whole network after relaxing design 1:2 or 2:3. This is also recognizable by

**a)**

**b)**

**c)**

**d)**

**e)**

**
Q
**

**
df
**

**
p
**

**
Q
**

**
df
**

**
p
**

**
Q
**

**
df
**

**
p
**

**
Q
**

**
df
**

**
p
**

**
Q
**

**
df
**

**
p
**

The Q statistic for inconsistency as well the Q statistic after detaching the effect of design

^{inc}

9.17

3

0.027

7.50

3

0.058

11.67

3

0.009

4.17

1

0.041

16.67

10

0.082

0.00

2

1

6.82

2

0.033

6.82

2

0.033

0.00

0

1

0.00

9

1

5.36

2

0.069

5.36

2

0.069

0.00

2

1

15.62

9

0.075

15.62

9

0.075

15.62

9

0.075

8.33

2

0.016

0.00

2

1

8.33

2

0.016

0.00

0

1

15.62

9

0.075

0.00

2

1

6.82

2

0.033

6.82

2

0.033

0.00

0

1

15.62

9

0.075

15.62

9

0.075

15.62

9

0.075

15.62

9

0.075

8.33

2

0.016

0.00

2

1

8.33

2

0.016

0.00

0

1

16.67

9

0.054

16.67

9

0.054

16.67

9

0.054

9.09

2

0.011

6.82

2

0.033

11.36

2

0.003

0.00

0

1

16.67

9

0.054

8.93

2

0.012

5.36

2

0.069

10.71

2

0.005

16.67

9

0.054

9.09

2

0.011

6.82

2

0.033

11.36

2

0.003

0.00

0

1

16.67

9

0.054

In scenario b), we shifted the effect in design 1:6 analogously to scenario a) by ^{inc} of only 7.50 with a

In scenario c), we changed the effect in design 1:3 with ^{inc}=11.67 (^{inc} statistic. Smaller residuals are observed for the adjacent edges 1:2, 2:3, 1:6, and 3:4. A detachment of the effect in design 1:3 eliminates the inconsistency of the network. Relaxing other designs causes only a little change to the squared Pearson residuals and increases residuals for some designs. A hot spot of inconsistency can be seen between the effects in designs 1:3, 1:2, and 2:3. However, the effect in design 1:2 is supported by the effects in designs 1:6, 3:4, and 4:6, and vice versa, the latter ones are supported by the effects in design 1:2. The same holds for the effect in design 2:3 and the effects in the three designs. Altogether, edge 1:3 can be distinctly identified as a plausible source of inconsistency since this is nested in two loops. The squared Pearson residual for this design is higher in comparison to the residuals for the inconsistency-generating designs in the previous two scenarios, although in all scenarios an equally strong perturbation is introduced. This is because 1:3 is the least self-driving design. Since the effect of design 1:3 strongly drives the network estimates of the designs 1:2, 2:3, 1:6, and 3:4, they are also influenced by the perturbation.

In scenario d), we analyze a sparsely connected network that forms one loop. In such a network with observed inverse standard errors being the same for each direct estimate, all corresponding network estimates are composed 83% of its own and 17% balanced of all other direct estimates. So, in the net heat plot we see only large squares on the diagonal. A perturbation of the effect at design 1:2 results in a network inconsistency statistic of ^{inc}=4.17 (

In network scenario e), all fifteen possible pairwise comparisons are observed with same precision. Because of this tight linkage, each network estimate is driven one-third by its corresponding direct estimate. The remaining two-thirds of indirect estimation is based on all eight adjacent edges in a balanced way. The disturbance of the network consistency by adding a ^{inc}=16.67 with

The examples show that perturbation of a single design may have side effects on residuals, more or less spread out in the network. Our clustering proved successful in grouping together designs with interrelated residuals that were simultaneously affected by one perturbation. The resulting hot spots facilitate the identification of sources of inconsistency, which may or may not be uniquely identifiable. While related large residuals are obviously grouped together, it may also occur that large residuals emerging from two independent perturbations are also grouped in proximity. In this case we expect to find two diagonal blocks, each signaling the local side effects of one perturbation and each representing one hot spot of inconsistency.

Software

We implemented our methods in the open-source statistical environment

Results

An example of a network meta-analysis in diabetes

We applied our methods to a network meta-analysis example by Senn at al.

The ten different treatment groups are abbreviated as follows by their first four letters: acar: Acarbose, benf: Benfluorex, metf: Metformin, migl: Miglitol, plac: Placebo, piog: Pioglitazone, rosi: Rosiglitazone, sita: Sitagliptin, SUal: Sulfonylurea alone, vild: Vildagliptin. This network meta-analysis involved 26 randomized controlled trials including one three-armed trial for plac:acar:metf and 15 different designs, of which ten are used in only one study. In the network, 15 out of 45 possible different pair-wise contrasts are observed, of which eight involve a placebo (see Figure

Network design in the diabetes example

**Network design in the diabetes example.** The nodes are placed on the circumcircle and are labeled according to the treatments. The edges display the observed treatment comparisons. The thickness of the edges is proportional to the inverse standard error of the treatment effects, aggregated over all studies including the two respective treatments. The network includes 25 two-armed studies on fourteen different designs and one three-armed study of design plac:acar:metf.

Across the entire network (analogues to the result of Senn at al.

**Q statistic**

**Number of studies**

**Degrees of freedom**

**p value**

The decomposition of the Q statistics as well as the degrees of freedom of the corresponding chi-squared distributions and the p values are shown. In addition the considered number of studies are displayed. Only one study is observed for the following designs: plac:acar, plac:piog, plac:sita, plac:vild, acar:SUal, metf:piog, metf:SUal, piog:rosi, rosi:SUal, plac:acar:metf. For this reason, the corresponding

^{net}

96.98

26

27-9=18

<0.001

^{inc}

22.53

26

16-9=7

0.002

^{het}

74.45

26

27-16=11

<0.001

4.38

2

2-1=1

0.036

42.16

3

3-1=2

<0.001

6.45

3

3-1=2

0.040

21.27

6

6-1=5

0.001

0.19

2

2-1=1

0.665

To have a closer look at the inconsistency of the network, we use the net heat plot in Figure
^{inc} statistic with a p value of 0.002, which is composed of the squared Pearson residuals for the designs metf:SUal, rosi:SUal, plac:piog, metf:piog, and plac:rosi. The first two have higher residuals in comparison to plac:piog, although their direct estimates more strongly drive their network estimates, with 56% and 41% in contrast to 36% in the case of design plac:piog. We can observe a hot spot of inconsistency between the effects in designs metf:SUal, rosi:SUal, plac:piog, and metf:piog, for which only one study is observed in each case. The effects in designs plac:piog and metf:piog as well as, in particular, the designs metf:SUal and rosi:SUal are especially inconsistent. Although the direct estimate in design plac:rosi is hampered with large heterogeneity (

Net heat plot in the diabetes example

**Net heat plot in the diabetes example.** The area of the gray squares displays the contribution of the direct estimate in design ^{∗}.

The strongest reduction in the whole network inconsistency is achieved with a detachment of the effect in design rosi:SUal. In this case, the net heat plot in Figure
^{inc} statistic, there is no longer strong evidence for inconsistency. The hot spot of inconsistency detected included designs with only one study. Indeed, one or a few biased studies may either cause heterogeneity when paralleled by other studies of the same design (which is observed within the plac:rosi studies) or may cause inconsistency when solely representing a design.

Net heat plot in the diabetes example after exclusion of the study with design rosi:SUal

**Net heat plot in the diabetes example after exclusion of the study with design rosi:SUal.** The area of the gray squares displays the contribution of the direct estimate in design ^{∗}.

An example of a network meta-analysis in antidepressants

Cipriani et al.

Network design in the antidepressants example

**Network design in the antidepressants example.** The nodes are placed on the circumcircle and are labeled according to the treatments. The edges display the observed treatment comparisons. The thickness of the lines is proportional to the inverse standard error of the treatment effect, aggregated over all studies including these two respective treatments. The network includes 109 two-armed studies with 42 different designs and two three-armed studies, both with design fluo:paro:sert.

Analogous to

**Q statistic**

**Number of studies**

**Degrees of freedom**

**p value**

The Q statistic for the whole network, for inconsistency and for heterogeneity within designs are shown. In addition, the number of studies on which they are based, the degrees of freedom of the corresponding chi-squared distributions and the corresponding p values are displayed.

^{net}

119.6

111

113-11=102

0.113

^{inc}

36.9

111

44-11=33

0.293

^{het}

82.7

111

113-44=69

0.125

The net heat plot presented in Figure
^{inc}. There is a small hot spot of inconsistency between the effects in designs cita:esci and cita:paro as well as between the effects in fluo:bupr and bupr:sert. The largest squared Pearson residual is observed for design cita:esci, although the direct estimate in this design drives the corresponding network estimate comparatively strongly with 51% (maximum self-driving is observed in design dulo:esci with 61%). In contrast to the other four designs mentioned, the direct estimate of cita:esci also strongly drives network estimates for some other designs in the network, which can be seen from the square sizes in the corresponding column. A detachment of the effect in design cita:esci results in the strongest reduction of the inconsistency in the whole network (resulting in

Net heat plot in the antidepressants example

**Net heat plot in the antidepressants example.** The area of the gray squares displays the contribution of the direct estimate in design ^{∗}.

Discussion

To ensure the validity and robustness of the conclusion from a network meta-analysis, it is important to assess the consistency of the network and the contribution of each component meta-analysis to the estimates. Our intention was to develop a sensitivity analysis tool that allows the identification of which component meta-analyses drive which network estimates and to locate the drivers that may have generated a hot spot of inconsistency. The net heat plot serves both purposes simultaneously: the first one by graphically showing elements of the hat matrix and the latter one by colored block structures in the plot. We have shown that the net heat plot allows the identification of a single deviating design that induces inconsistency in artificial examples. In the case of stronger network connectivity, increased location specificity might be possible. In networks that only include one loop, it is not possible to trace inconsistency back to a single design, but designs that are part of several loops may be identifiable as a unique source for a hot spot of inconsistency. We also demonstrated the applicability of the plot in two published network meta-analyses.

It is well known in regression diagnostics (see for example

Overall, inconsistency testing has also been discussed in large complex networks by comparing a consistency model with an unrestricted inconsistency model

The recently published design-by-treatment interaction model by

However, the fact that failure to detect heterogeneity does not constitute proof of homogeneity must be taken into account in the assessment of inconsistencies in network meta-analyses. This already holds for a simple meta-analysis and is even more relevant for network meta-analyses. In a network without loops, inconsistency cannot be detected at all. In this context, we point to the importance of the hat matrix. It allows for the assessment of the contribution of each component meta-analysis to a network estimate and directs attention to the crucial components. We have illustrated that often only a few components are important.

Often, when inconsistency is observed, some component meta-analyses are heterogeneous, too. We point out that the inconsistency assessment is still valid in this context. However, then the direct effect estimates are no longer estimates of a single parameter, but are rather weighted averages of estimates of different parameters: the study-specific treatment effects. Nevertheless, inconsistency assessment and the investigation of heterogeneity within component meta-analyses may interfere in this case, and it may be necessary to exclude single studies and repeat the net heat plot in order to find satisfactory explanations of overall heterogeneity. In fact, inspection of both coefficients (entries of the hat matrix) and of residuals was proposed by Senn et. al

Heterogeneity and inconsistency can be broadly viewed as different aspects of heterogeneity, the latter being understood as any discrepancy between results of single studies and predictions based on a consistency model for a network. This fact is not only reflected in the decomposition of the Q statistic, but also underlines that our tools can be applied either at an aggregate level or at a study level. We presented the aggregate level approach here for its parsimony. The study level approach may be more appropriate, particularly if component meta-analyses are strongly heterogeneous. In fact, a visual display of the hat matrix at study level has been proposed and discussed in

Some caution is due when interpreting a net heat plot. Different from usual regression diagnostics, a single component meta-analysis may stand for a large body of evidence in network meta-analyses. If a component meta-analysis is recognized as deviating from the rest or is identified as a major source of heterogeneity, it may or may not provide the more reliable part of the whole body of evidence. Song et al.

More than in classical regression diagnostics, there are model diagnostic challenges in network meta-analyses: Masking, a phenomenon already known, may be more pronounced here because we have inherently small numbers of observations: the component meta-analyses. Masking may occur if more than one observation deviates from the true model. In this case, parameter estimates are affected by outliers even after holding out one observation, and outliers may be obscured, i.e. masked

Searching for influential component meta-analyses or influential studies is not the only way for responding to inconsistency and heterogeneity. As mentioned in

One core component of our approach is to allow component meta-analyses to have deviating treatment effects. This idea of extending the model by relaxing parameter constraints is easily extended to generalized linear models for binary outcomes as well as to random-effects models. The approach is not confined to withholding the effects of one design, but is naturally applicable to allowing for an arbitrary number of designs to have specific deviating effects, e.g. all designs containing a specific treatment. In all types of generalization, the challenge remains to perform these model relaxations in a systematic way and to provide tools to transparently display the multitude of results, for which our presented net heat can be a useful starting point.

Conclusions

We have illustrated the importance of assessing consistency in network meta-analyses, where, for example, one deviating component meta-analysis may induce a hot spot of inconsistency. As a tool for this task, we have developed the net heat plot that displays drivers of the network estimates, plausible sources for inconsistency, and possible disturbed network estimates, illustrating its usefulness in several artificial and real data examples.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

UK, HB and JK developed the method. UK produced the results and wrote the first draft of the manuscript. HB and JK contributed to the writing. All authors read and approved the final manuscript.

Acknowledgements

This work contains part of the PhD thesis of UK. A grant from the Mainzer Forschungsförderungsprogramm (MAIFOR) supported UK.

We thank Katherine Taylor for proofreading and Nadine Binder for pointing us to the halo visualization.

We thank all reviewers for their numerous comments and suggestions that greatly helped to improve the paper.

Pre-publication history

The pre-publication history for this paper can be accessed here: