Abstract
Background
The DerSimonian and Laird approach (DL) is widely used for random effects metaanalysis, but this often results in inappropriate type I error rates. The method described by Hartung, Knapp, Sidik and Jonkman (HKSJ) is known to perform better when trials of similar size are combined. However evidence in realistic situations, where one trial might be much larger than the other trials, is lacking. We aimed to evaluate the relative performance of the DL and HKSJ methods when studies of different sizes are combined and to develop a simple method to convert DL results to HKSJ results.
Methods
We evaluated the performance of the HKSJ versus DL approach in simulated metaanalyses of 2–20 trials with varying sample sizes and betweenstudy heterogeneity, and allowing trials to have various sizes, e.g. 25% of the trials being 10times larger than the smaller trials. We also compared the number of “positive” (statistically significant at p < 0.05) findings using empirical data of recent metaanalyses with > = 3 studies of interventions from the Cochrane Database of Systematic Reviews.
Results
The simulations showed that the HKSJ method consistently resulted in more adequate error rates than the DL method. When the significance level was 5%, the HKSJ error rates at most doubled, whereas for DL they could be over 30%. DL, and, far less so, HKSJ had more inflated error rates when the combined studies had unequal sizes and betweenstudy heterogeneity. The empirical data from 689 metaanalyses showed that 25.1% of the significant findings for the DL method were nonsignificant with the HKSJ method. DL results can be easily converted into HKSJ results.
Conclusions
Our simulations showed that the HKSJ method consistently results in more adequate error rates than the DL method, especially when the number of studies is small, and can easily be applied routinely in metaanalyses. Even with the HKSJ method, extra caution is needed when there are = <5 studies of very unequal sizes.
Keywords:
Metaanalysis; Clinical trial; Trial size; Heterogeneity; Type I error; Random effects; Cochrane Database of Systematic ReviewsBackground
The commonly used method for a random effects metaanalysis is the DerSimonian and Laird approach (DL method) [1]. It is used by popular statistical programs for metaanalysis, such as Review Manager (RevMan [2]) and Comprehensive Metaanalysis [3]. However, it is well known that the method is suboptimal and may lead to too many statistically significant results when the number of studies is small and there is moderate or substantial heterogeneity [410]. If a treatment is inefficacious and testing is done at a significance level of 0.05, the error rate should be 5%, i.e. only one in 20 tests should result in a statistically significant result. For the DL method, the error rate can be substantially higher, unless the number of studies is large (≫ 20) and there is no or only minimal heterogeneity [410].
Given this deficiency, alternative methods for random effects metaanalysis have been proposed. In particular, the method described by Hartung and Knapp [46] and by Sidik and Jonkman [11,12] (HKSJ method) is claimed to be simple and robust [13]. Simulations have shown that the HKSJ method performs better than DL, especially when there is heterogeneity and the number of studies in the metaanalysis is small [414]. This means that for most metaanalyses the HKSJ method might be more appropriate than the conventional DL method. In a sample of 22453 metaanalyses, Davey et al. show that the number of studies in a metaanalysis is often relatively small, with a median of 3 studies (Q1Q3: 2–6), and only 1% of metaanalyses containing 28 studies or more [15]. Some detectable heterogeneity is present in about half of metaanalyses of clinical studies [1518].
Based on earlier results that showed that the results of a single large trial were unreliable [19], we hypothesized that the metaanalyses methods, including HKSJ, would perform less adequately when the metaanalysis is carried out on a mixture of very unequalsized studies, e.g. one large and several small trials. Such a situation is not uncommon. In a random sample of 186 systematic reviews of the Cochrane Database [18] the ratio between large and small trial sizes ranged between 1 and 1650, with a median of 5 and an interquartile range from 3 to 10. Sixty per cent of the reviews contained no large trials, but 40% had one trial that was at least twice as large as the median trial size, 25% had one trial that was at least five times larger, and 10% had one trial that was even 10 times larger.
Although several simulations have shown that the HKSJ method performs better than the DL method, the focus in these studies was not on a systematic evaluation of the effects of specific trial size mixtures in combination with low trial numbers. They either only reported the overall results of various mixtures combined or they studied only a limited number of combinations. In order to investigate the impact of unequal study sizes, we used simulations, mimicking such realistic conditions rather than situations where trials have implausibly similar sample sizes. We focused on metaanalyses with small numbers of studies (up to 20) with a dichotomous outcome (odds ratio, relative risk) or a continuous outcome. To mimic the variation in trial sizes, we explicitly varied the sample sizes of the trials within the simulated metaanalyses, varying from scenarios where all trials in a metaanalysis were of equal size, to scenarios with only one large trial, 10 times as large as the other trials, or one small trial, 10 times smaller than the other trials.
In order to complement the simulations, empirical data, based on recent metaanalyses  added or updated in 2012  from the Cochrane Database of Systematic Reviews (CDSR) of interventions were used to assess the number of nominally statistically significant findings (with p < 0.05) of both methods in practice. This allows to examine whether inferences would be very different based on these two models.
Currently not all standard software packages like Review Manager provide an option to perform an HKSJ analysis, although the HKSJ method is computationally not complicated and the importance of suitable methods for metaanalyses with small numbers of trials is apparent. Version 3.0 of Comprehensive Metaanalysis [3] will contain the HKSJ method (personal communication by Julio SánchezMeca, September 2013). Also the R package metafor [20] and the metareg command in Stata [21] include the HKSJ method. However, not everybody will be acquainted with the use of R or Stata. Moreover, use of these packages is not straightforward when a posthoc conversion is desired, i.e. when the results of a DL random effects analysis must be converted to the HKSJ approach. In order to fill this gap, we show step by step how the HKSJ analysis can be performed without the use of these packages, when the results of a common random effects (DL) metaanalysis are available, e.g. from a systematic review. This conversion is applicable for continuous outcomes and for outcomes where metrics are logtransformed, like the risk ratio (RR), odds ratio (OR), hazard ratio (HR) or Poisson rate. This simple modification of the common random effects analysis will improve the summary results, and it can be done through some basic calculations or a few statements in Excel. An Excel file is available as Additional file 1 web material. R code for the metafor package is provided in Appendix 3.
Additional file 1. Excel template for the conversion of DL to HKSJ results (Web material).
Format: XLSX Size: 23KB Download file
The simulations, the selection of empirical data and the statistical analysis are described in the Methods section. In the Results section the error rates for the DL and HKSJ methods for several realistic simulated scenarios are provided. For the Cochrane metaanalyses, we present the number of nominally statistically significant findings with the DL and HKSJ methods. The conversion of DL results into HKSJ results is illustrated, including examples from systematic reviews as presented in the Cochrane Library.
Methods
We used simulated data as well as empirical data of the Cochrane 2012 Issues to evaluate the DL and HKSJ approaches. The pooled effect estimate is equal for both approaches, but the methods differ with respect to the calculation of the confidence interval and the statistical test. For DL, these are based on the normal distribution, whereas for the HKSJ method, they are based on the tdistribution with the degrees of freedom equal to the number of trials minus one, and a weighted version of the DL standard error. Detailed statistical methods are presented in Appendix 1.
Methods  simulations
Our first aim was to investigate the error rates of the HKSJ metaanalysis method in comparison to the common (DL) method for various realistic scenarios, i.e. combinations of study sizes, study size mixtures and heterogeneity in series of just a few trials. Therefore we simulated series of trials with two up to 20 studies, where each series provided the data for one metaanalysis. First, we considered series that consisted of equally sized trials, each with two groups of 25, 50, 100, 250, 500 or 1000 subjects. Second, we looked into series of trials with different trial sizes, i.e. the percentage of large trials was 25%, 50% or 75%, e.g. a series of one large trial and three small trials. Average group sizes were 100, 250, 500 or 1000 subjects, and the large trials had 10 times more subjects than the small trials. For example, a series of six small (normal) and two large trials, with an average group size of 100, has group sizes of 31 and 308 in the small and large trials, respectively. Third, we simulated extreme scenarios, in which a series had only small trials, except for one large one, or only large trials, except for one small one. Both continuous and dichotomous outcomes were evaluated. For continuous outcomes, a normally distributed overall mean difference between the group means was simulated. In the trials with a dichotomous outcome, the event rates in the groups varied between scenarios and ranged from 0.1 to 0.9, in steps of 0.2. The heterogeneity was superimposed and set at I^{2} = 0, 0.25, 0.50, 0.75 and 0.9. I^{2} represents the heterogeneity, i.e. the degree of inconsistency in the studies’ results, in comparison to the total amount of variation [16,22]. The levels correspond to no, low, moderate, high and very high heterogeneity, respectively [16].
Our aim was to evaluate the error rate, i.e. the percentage of statistically significant metaanalyses when the overall mean treatment difference was zero. Hence we simulated series with an overall treatment difference equal to zero and performed on each series a DL [1] and an HKSJ [11] random effects metaanalysis. The twosided significance level was 0.05. For each scenario, we simulated 10,000 series of trials. In the ideal situation, 5% of the 10,000 metaanalyses should have a statistically significant result when the significance level is 0.05. For the scenarios with the dichotomous outcome we determined the error rate when the OR was evaluated (logistic model) and when the RR was estimated. In these cases, metaanalysis was done on the logarithmic scale, and the error rates were determined for OR = 1 or RR = 1. More details can be found in Appendices 1 and 2.
Methods  empirical data from the 2012 Cochrane Database of Systematic Reviews
Cochrane Reviews are systematic reviews of primary research in human health care and health policy, and are internationally recognised as the highest standard in evidencebased health care [23]. The aim of the Cochrane collaboration is to provide accessible and credible evidence to guide decision making in medicine and public health. We were very fortunate that the UK Cochrane Editorial Unit provided us with the statistical data added to the CDSR in 2012, which allowed us to assess the number of statistically significant results in real data.
Many Cochrane reviews include multiple metaanalyses. Many of those overlap or are based on correlated data. Usually, the first analysis is the primary analysis. Hence, we decided to use per review only the first metaanalysis that was based on at least three studies. In order to maximize the number of metaanalyses, we used both the first continuous and the first binary outcome metaanalysis, whenever possible. Thus some systematic reviews provided none, and some provided one or two metaanalyses for our research. We always performed a random effects metaanalysis, even when the authors originally performed a fixedeffects analysis. Details can be found in Appendix 1.
It is impossible to determine which of the Cochrane reviews compared treatments that truly had equal efficacy. It is thus unknown which of the statistically significant results were in fact false positive findings, so we could not determine the false positive error rate. Hence we decided to present the total number of significant findings of the DL and HKSJ methods instead of the error rates. This provides an indication of the impact a change from DL to HKSJ would have in practice.
Results
Error rates for continuous outcomes
The left side of Figure 1 shows the error rates for the DL method for the simulated mixtures of trial sizes. In general with unequalsized trials, the type I error of DL was substantially inflated even with minimal heterogeneity, while with equalsized trials minimal or modest heterogeneity did not inflate the type I error substantially. Figure 1A shows the error rates for a setting with studies of equal size, Figure 1B for one small trial, Figure 1C for equal numbers of large and small trials, and Figure 1D for a setting with one large trial, 10 times as large as the other trials. The heterogeneity levels are I^{2} = 0, 0.25, 0.5, 0.75 and 0.9, and the average study group sizes range between 25 and 1000. Vertical bars refer to the minimum and maximum error rates over the group sizes. The lines connect the means of these error rates. The error rates should all have been 5% (0.05), but for I^{2} ≥ 0.25, DL error rates were too large, even for series of 20 trials. For example, DL error rates for metaanalyses of five studies ranged between 5.7% for equally sized trials and 14.7% for mixtures of trial sizes (Table 1). In contrast, the error rates were too low (about 34%) when the I^{2} was 0. DL results for other, less extreme, mixtures of trial sizes were in between the results shown.In Figure 1 on the right side results for the HKSJ approach are presented. For equal trial sizes, the error rates of the HKSJ method were very appropriate. When the series contained only one small trial, the HKSJ error rates were approximately correct if the series consisted of more than five studies (Figure 1B). For series containing fewer trials, the error rates were higher, but not as high as the respective DL values. They were also too high when the percentage of small trials increased (Figure 1C). When there was only one large trial, the HKSJ error rates sometimes almost doubled (Figure 1D). When there was no heterogeneity, HKSJ error rates were roughly 5%. As expected, the group sizes had no impact on the error rates.Figure 1 shows that the HKSJ method always outperformed the common random effects DL method. The HKSJ error rate was usually roughly 5%. However, some mixtures of sizes, especially when there is only one large trial, lead to a doubling of the error rate to 10%. This occurred especially when heterogeneity was only moderate.
Figure 1. DerSimonianLaird and HartungKnappSidikJonkman error rates for continuous outcomes, for various I^{2 }and mixtures of trial sizes. Legend: A: Equally sized trials; B: One small trial, 1/10th of other trials; C: 50–50 small and large trials (ratio 1:10); D: one large trial (10 times larger than other trials). Vertical bars refer to the minimum and maximum error rates over the group sizes. The lines connect the means of these error rates. DL: DerSimonianLaird metaanalysis method. HKSJ: HartungKnappSidikJonkman metaanalysis method.
Table 1. Minimum and maximum error rates of DerSimonianLaird and HartungKnappSidikJonkman methods for mixtures of trial sizes
Error rates for risk ratio outcomes
The results of the simulations for studies with a risk ratio outcome were quite similar to the error rates for the continuous outcomes, but there was more variation in the error rates: they depended on the group sizes and the risks (from 0.1 to 0.9). For low heterogeneity (I^{2} = 0.25), the DL error rates ranged from 2.2% to 15.5%, whereas the HKSJ rates were slightly better: 2.8–10.6%. However for I^{2} = 0.9 the DL rates ranged from 6.4% to 33.7%, compared to HKSJ rates of 2.7% to 10.2%. When there was no heterogeneity (I^{2} = 0), the DL error rates ranged between 0.9% and 4.3%, and the HKSJ rates between 2.1% and 6.9%. For odds ratios, the results were again quite similar. See Table 1 for a selection of results, and the Additional file 2: Figure S1 and Additional file 3: Figure S2.
Additional file 2: Figure S1. DerSimonianLaird and HartungKnappSidikJonkman error rates for Risk Ratios, for various I^{2} and mixtures of trial sizes. A: Equally sized trials; B: One small trial, 1/10^{th} of other trials; C: 50–50 small and large trials (ratio 1:10); D: one large trial (10 times larger than other trials). Vertical bars refer to the minimum and maximum error rates over the group sizes. The lines connect the means of these error rates. DL: DerSimonian & Laird metaanalysis method. SJ: HartungKnappSidikJonkman metaanalysis method.
Format: DOCX Size: 184KB Download file
Additional file 3: Figure S2. DerSimonianLaird and HartungKnappSidikJonkman error rates for Odds Ratios, for various I^{2} and mixtures of trial sizes. A: Equally sized trials; B: One small trial, 1/10^{th} of other trials; C: 50–50 small and large trials (ratio 1:10); D: one large trial (10 times larger than other trials). Vertical bars refer to the minimum and maximum error rates over the group sizes. The lines connect the means of these error rates. DL: DerSimonian & Laird metaanalysis method. HKSJ: HartungKnappSidikJonkman metaanalysis method.
Format: DOCX Size: 182KB Download file
Empirical results for CDSR 2012
Selection of the first metaanalyses in the systematic reviews added in 2012 to the CDSR and based on at least three studies resulted in 689 metaanalyses (255 metaanalyses with a continuous outcome and 434 metaanalyses with a dichotomous outcome).
The continuous outcome metaanalyses were based on a median of five trials (Q1Q3: 3–9) with a median ratio between the largest and the smallest trial of 5 (Q1Q3: 3–10). Using the DL method, 130 (51.0%) of the 255 metaanalyses were nominally statistically significant compared to 102 (40.0%) when the HKSJ method was used (Table 2). Of the 130 metaanalyses that were significant with the DL method, 31 (23.8%) were not significant with the HKSJ method, while three metaanalyses were significant with the HKSJ method but not with the DL method. In the selection of metaanalyses based on at most five studies and with large ratios between the study sizes (ratio > 5) 13 (59.1%) of the 22 metaanalyses significant with the DL method were not significant with the HKSJ method and none of the metaanalyses was only significant with the HKSJ method.
Table 2. Number (%) of statistically significant Cochrane metaanalyses according to the DerSimonianLaird and HartungKnappSidikJonkman methods
The 434 dichotomous metaanalyses were based on a median of six trials (Q1Q3: 4–10) with a median ratio between the largest and the smallest trial of 6 (Q1Q3: 3–16). Of the 434 metaanalyses, 185 (42.6%) were nominally statistically significant with DL and 147 (33.9%) with HKSJ (Table 2). Of the 185 metaanalyses that were significant with the DL method, 48 (25.9%) were not significant with the HKSJ method, while the opposite scenario was seen in 10 cases. In the selection of small metaanalyses with large ratios between the study sizes 14 (50.0%) of the 28 metaanalyses significant with the DL method were not significant with the HKSJ method, while the opposite scenario occurred once.
Summarizing, the DL method resulted in statistically significant results in 315/689 (45.7%) of the metaanalyses; 79 of these 315 “positive” DL results (25.1%) were not significant with the HKSJ method, while the opposite scenario (significant only by HKSJ) was rarely seen (14 metaanalyses). In the selection of small metaanalyses (< = 5 studies) with large ratios between the study sizes (ratio > 5), the difference between the DL and HKSJ results was even larger.
Easy method for the conversion of DL into HKSJ results
We present two examples to illustrate how DL results can be used to carry out an HKSJ analysis, resulting in HKSJconfidence intervals and pvalues. An Excel file is available as Additional file 1 (web material). The results can also be created with R, Appendix 3.
Example 1: conversion to HKSJ for a continuous outcome
The first three columns of Table 3 show the results of a metaanalysis on the effect of zinc for the treatment of a common cold, published in a Cochrane review [24]. The outcome was severity of cold symptoms scoring, and was based on a total of 513 participants. The first column shows the identifiers of the studies, the second column the results y_{i} of the individual studies and the third column contains the weights w_{i} from the DL analysis, copied from the review. Only these three columns are needed for the posthoc calculations.
Table 3. Conversion of DerSimonianLaird results into HartungKnappSidikJonkman results for a continuous outcome: severity of cold symptoms
The following steps carry out an HKSJ analysis:
1. Determination of the standard error:
a. Based on the overall summary difference y = −0.39, calculate the HKSJ factors
w_{i}×(y_{i}y)^{2} for each of the studies (see the fifth column for the results).
b. Add the HKSJ factors and divide them by the sum of the weights. This results in 20.31/100 = 0.2031.
c. Divide by k1, whereby k is the number of studies. In this situation k = 5 and 0.2031/4 = 0.0508. This is the weighted variance of the pooled treatment effect according to the HKSJ approach.
d. Taking the square root leads to the standard error: SE = √0.0508 = 0.225.
2. Determination of the 95% confidence interval (CI):
a. To determine the halfwidth of the 95% CI, the SE must be multiplied with the 97.5%quantile of the tdistribution with k  1 degrees of freedom. Its value can be obtained through Excel: TINV(0.05, k1), where k is the number of studies. This results in 2.78, so the halfwidth of the 95% CI is 2.78*0.225 = 0.63. The tvalue can also be found on the internet, for example at http://www.danielsoper.com/statcalc3/calc.aspx?id=10 webcite.
The quantiles of the tdistribution can be found through statistical packages as well. In SPSS: select ‘compute variable’, function group ‘Inverse DF’, function IDF.T(.975,k1), or in SAS: tinv(.975,k1).
b. The HKSJ 95% CI then is y ± halfwidth of the CI, i.e. 0.39 ± 0.63 or [1.02; 0.24].
3. Determination of the pvalue:
a. Calculate the tstatistic: t = y/SE = −0.39/0.225 = −1.73. If the result is negative, as in this situation, simply change the sign, so t = 1.73.
b. Determine the corresponding twosided pvalue with Excel: TDIST(1.73,4,2), or with the internet site http://www.danielsoper.com/statcalc3/calc.aspx?id=8 webcite. The twosided Pvalue according to the HKSJ method then is 0.16.
This pvalue can also be obtained through SPSS: ‘compute variable’, function group ‘CDF & noncentral CDF’, function ‘CDF.T’. This yields CDF.T(1.73, 4), similar to SAS, cdf(‘T’, 1.73, 4) = 0.92066. The twosided HKSJ pvalue then is 2×(1–0.92066) ~0.16.
In this example on the efficacy of zinc, based on only five trials and high heterogeneity (I^{2} = 75%), the results of the DL and HKSJ analyses differ substantially.
Example 2: conversion to HKSJ for outcomes that require a log transformation
When the outcome of the metaanalysis is a risk ratio (RR), odds ratio (OR), hazard ratio (HR) or Poisson rate, the analysis has to be conducted on the natural logarithm (ln) of the treatment effect. In all other aspects the procedure is exactly the same as for a continuous outcome. As an example we show the overall survival for postremission therapy for adult acute lymphoblastic leukemia, comparing patients with and without a donor, as presented in a Cochrane Review [25]. The first three columns of Table 4 show the results of a metaanalysis with the HR as outcome.
Table 4. Conversion of DerSimonianLaird results into HartungKnappSidikJonkman results for a logarithm based outcome: hazard ratios
1. Determination of the standard error:
a. Calculate the natural logarithm of the pooled estimate: ln(y) = ln(0.86) = −0.15. Calculate the natural logarithms of the study outcomes (column 4) and use these to calculate the HKSJ factors w_{i}×(ln(y_{i})ln(y))^{2} for each of the studies (column 6).
b. Add the HKSJ factors and divide them by the sum of the weights. This leads to 1.99/100 = 0.0199.
c. As there are 10 studies, divide by k1 = 9: 0.0199/9 = 0.0022.
d. Taking the square root leads to the standard error: SE = √0.0022 = 0.047.
2. Determination of the 95% CI:
a. On the ln scale, the halfwidth of the 95% CI is TINV(0.05, 9) × 0.047 = 2.26 × 0.047 = 0.106 (Excel).
b. The 95% CI for the ln HR is −0.15 ± 0.106, i.e. [−0.26; 0.04].
c. The HKSJ 95% CI for the HR is [e^{0.26}; e^{0.04}], i.e. [0.77; 0.96].
3. Determination of the pvalue:
a. Calculate the tstatistic: t = ln(y)/SE = −0.15/0.047 = −3.19. Neglecting the negative sign, we obtain t = 3.19.
b. Use Excel, Internet or a statistical package to calculate the twosided pvalue according to the HKSJ method, see Example 1. Excel: pvalue = TDIST(3.19,9,2) = 0.011; SPSS: CDF.T(3.19, 9) = 0.995, so that the pvalue is 2×(1–0.995) = 0.011.
In this example, results of the DL and HKSJ analyses hardly differ.
Discussion
The DL approach to random effects metaanalysis is still the standard method, almost to the exclusion of all other methods. This might be considered remarkable, bearing in mind the high false positive rates of the DL method which have been shown repeatedly with simulations [414] and also an empirical study suggesting that results are sensitive to the choice of random effects analysis method [26]. Thorlund et al. did an empirical assessment in 920 Cochrane primary outcome metaanalyses of > = 3 studies of methodrelated discrepancies [26]. In total, 326 (35.4%) metaanalyses were statistically significant when the analysis was based on a tdistribution – as in the HKSJ method – and 414 (45%) when it was based on the normal distribution as in the DL method. Our evaluation of Cochrane metaanalyses of interventions resulted in a similar result: a substantially larger amount of significant findings with the DL method than with the HKSJ method. Our simulations suggest that among the DL significant findings in the Cochrane reviews there may be a considerable number of false positives.
DL results can easily be converted into HKSJ results, which have a much better performance. We confirmed this with simulations, for mixtures of trial size distributions in settings with up to 20 trials per metaanalysis. When there was heterogeneity, the mean error rates of the DL approach were consistently higher than those of the HKSJ approach, although also the latter doubled to 10% in scenarios with only one large trial. When there was no heterogeneity, the DL error rates were lower than 5%, and the HKSJ rates were approximately 5%.
However, there are some limitations with respect to the HKSJ analysis method. Although the error rates of the HKSJ method were closer to the 5% level than those of the DL method, our simulations showed that in some scenarios the HKSJ error rates more or less doubled, although the DL error rates could be more than four times too high in these same settings. Hence, the results of the HKSJ analysis are also not perfect. Like we hypothesized, the error rates were maximal if one of the trials in the metaanalysis was substantially larger than the other ones.
Further, when study numbers are small, the distribution of the treatment effects is unknown and does not necessarily follow the normal or tdistribution. Kontopantelis and Reeves [27] showed that with slight heterogeneity the coverage of the HKSJ method was consistently 94% when the true effects were not distributed according to the normal or tdistribution, but with larger heterogeneity the nonparametric permutation (PE) method of Follmann and Proschan [7] performed better than the HKSJ method. However, the PE method can only be performed when the number of studies is larger than five, whereas many metaanalyses are smaller [15]. Several other methods have been developed, like the Quantile Approximation (QA) method [28], the Profile Likelihood approach [29], natural weighting instead of empirically based weighting of studies [30], use of fixed effects estimates with a random effects approach to heterogeneity [31] and more recently, higherorder likelihood inference methods [32]. However, most of these methods are based on asymptotic statistics and they may therefore be less robust in case of a limited number of trials, or they remain difficult to use in practice, because no statistical packages are available to perform them and it is very difficult to carry out the calculations with standard software. Regarding the nonasymptotic, computationally straightforward QA method, SánchezMeca and MarínMartínez [13] have already shown that it was outperformed by the HKSJ method. It would require a very extensive evaluation to investigate the performance of all of these methods. We restricted ourselves to the HKSJ method, because of its computational simplicity and we show that HKSJ results can easily be derived from DL results.
As far as we know, we are the first to present systematically the error rates in relation to explicit trial size mixtures when the numbers of trials range from 2 to 20. Follmann and Proschan [7] show that for certain trial size mixtures and low numbers of trials the DL error rates can be highly increased, however, they did not evaluate the HKSJ method. The results reported by Hartung, Knapp and Makambi [46,8,9] imply that for metaanalyses of three, six or twelve studies the DL error rates for studies with similar sizes were closer to 5% than for studies of different sizes, and that the HKSJ method performed much better than DL in the latter situation. However they did not report the explicit relationship between the trial size mixtures and error rates as we do (Table 1). SánchezMeca and MarínMartínez [13] also varied the sample size ratios in their simulations. They concluded that the average sample size scarcely affected the performance of the different methods, but this was based on the combined results of 5–100 studies and they presented no results of particular trial size mixtures.
As all studies show that in settings with few studies the HKSJ method always resulted in error rates closer to 5% than the DL method, the latter method should not be used and the HKSJ method should be the standard approach. To facilitate its more widespread application, the conversion of DL results into HKSJ results is presented step by step. At the same time, we urge caution when any random effects model, including HKSJ, is applied to situations where there are very few studies, and even more so when the sample sizes of the combined studies are very different. Even the HKSJ confidence intervals may be conservatively narrow in these situations and inferences may be spurious, if the confidence intervals are taken at face value.
Conclusions
Our simulations showed that the HKSJ method for random effects metaanalysis consistently results in more adequate error rates than the common DL method, especially when the number of studies is small. The HKSJ method can easily be applied routinely in metaanalyses. However, even with the HKSJ method, extra caution is needed when there are = < 5 studies of very unequal sizes.
Appendix 1: Statistical details
Random effects metaanalysis model
For k studies, let the random variable y_{i} be the effect size estimate from the i^{th} study. The random effect model can be defined as follows:
for i =1, . . ., k, where δ_{i} = δ + d_{i}; e_{i} and d_{i} independent, and d_{i} ~ N(0, τ^{2}). is the withinstudy variance, describing the extent of estimation error of δ_{i}, and the parameter τ^{2} represents the heterogeneity of the effect size between the studies.
For studies with dichotomous outcomes where no events were observed in one or both arms, the computation of the random effects model yields a computational error. In these cases, before performing any metaanalysis, we added 0.5 to all cells of such a study.
Random effects analysis
Let w_{i} be the fixed effects weights, i.e. the inverse of the withinstudy variance , and let be the fixed effects estimate of δ.
Let Q be the heterogeneity statistic .Then
is an estimate of the variance τ^{2}.
The random effects estimate for the average effect size δ is
where
The DerSimonian and Laird method estimates the variance of by
and uses the normal distribution to derive Pvalues and confidence intervals.
In contrast, the Hartung, Knapp, Sidik and Jonkman method estimates the variance of by
and uses the tdistribution with k1 degrees of freedom to derive Pvalues and confidence intervals, with k the number of studies in the metaanalysis.
Heterogeneity estimates
Although or Q can be used as measures of the heterogeneity, Higgins and Thompson [16] propose
I^{2} is a relative measure. It compares the variation due to heterogeneity (τ^{2}) to the total amount of variation in a ‘typical’ study (τ^{2} +∈^{2}), where ∈ is the standard error of a typical study of the review [33]:
Appendix 2: The simulations
The parameters in the scenarios for the simulations
– The number of trials per series k = 2 – 20;
– The average group size in a series of trials: 25, 50, 100, 250, 500 or 1000 subjects per group per trial;
– The trial size mixtures: we simulated series with 25, 50 or 75% large trials, series with exactly one large or one small trial, and series where all trials were of equal size;
– The ratio of the study sizes: for the series with small and large studies, the large study was 10 times the size of a small study.
The simulations were programmed in SAS, version 9.2. The scenarios were evaluated 10,000 times, for heterogeneity levels I^{2} = 0, 0.25, 0.5, 0.75, and 0.9, and at a nominal significance level α = 0.05 (twosided).
A. The simulation for normally distributed outcomes
1. For each scenario, and each value of I^{2}, we used eq. (1) to calculate the variance τ^{2}. So
where , with n_{i} the groupsize of trial i (i = 1…k) and σ the standard deviation of the outcome variable of the trials. As σ is only a scaling factor and the results only depend on the ratio τ/σ, we have set σ = 1 in the simulations.
2. For each trial i:
a. We determined the ‘true’ trial effect size δ_{i}, where δ_{i} was a random draw from the normal distribution with mean 0 and variance τ^{2}.
b. We generated the trial outcome based on a normal distribution with mean δ_{i} and variance 2σ^{2}/n_{i} = 2/n_{i}.
c. We generated the variance of the trial outcome based on a χ^{2} distribution with 2n_{i}2 degrees of freedom, divided by n_{i}1.
3. For each series:
A DerSimonian and Laird analysis and an HKSJ analysis were carried out.
4. For each scenario, I^{2} and each metaanalysis method, we calculated the error rate, i.e. the percentage of series that had a statistically significant (p<0.05) outcome.
B. The simulations for the odds ratio
1. When the outcome was dichotomous, we had to choose an additional parameter: the overall event rate p_{0}. We varied the p_{0} between 0.1 and 0.9 and for each value we used (2) to calculate τ^{2}, with
2. For each trial i:
a. We determined the ‘true’ trial effect size ln(odds ratio_{i}) = δ_{i}, where δ_{i} was a random draw from the normal distribution with mean 0 and variance τ^{2}.
b. We calculated the event rates p_{a} and p_{b} in the two groups, such that: ln(p_{a} /(1p_{a})) = ln(p_{0} /(1p_{0}))  δ_{i}/2, and ln(p_{b} /(1p_{b})) = ln(p_{0} /(1p_{0})) + δ_{i}/2.
c. We generated the observed event rates P_{a} and P_{b} in each group based on Bernouilli distributions with event rates p_{a} and p_{b}, respectively.
d. Based on P_{a} and P_{b}, we calculated the natural log of the odds ratio and its variance (1/P_{a} +1/(1 P_{a}) +1/P_{b} +1/(1 P_{b}))/n_{i}.
Steps 3 and 4 were the same as for a continuous outcome.
C. The simulations for the risk ratio
The risk ratio simulation was similar to the odds ratio simulation, but the variance was different:
Furthermore, for each trial:
a. We determined the ‘true’ trial risk ratio ln(risk ratio_{i}) = δ_{i} , where δ_{i} was a random draw from the normal distribution with mean 0 and variance τ^{2}.
b. We calculated the event rates p_{a} and p_{b} in the two groups, such that:
ln(p_{a}) = ln(p_{0})  δ_{i}/2 and ln(p_{b}) = ln(p_{0}) + δ_{i}/2. Event rates below 0.01 or above 0.99 were replaced by 0.01 or 0.99, respectively.
c. We generated the observed event rates P_{a} and P_{b} in each group based on Bernouilli distributions with event rates p_{a} and p_{b}, respectively.
d. This led to the natural log of the risk ratio and its variance (1/P_{a} +1/P_{b}  2)/n_{i}.
Appendix 3: R code for the conversion of DL to HKSJ results
The R package metafor [20] can also be used to perform an HKSJ analysis. The implementation is based on the metaregression paper by Knapp and Hartung [34]: when no covariates or moderator variables are used, the metaregression reduces to a random effects metaanalysis as proposed by Hartung/Knapp and Sidik/Jonkman.
The usual approach to perform an HKSJ analysis with metafor is based on study effects combined with fixed effects weights or standard errors. In our examples the HKSJ method must be applied on random effects weights instead of fixed effects weights. This can be done by choosing a fixed effects analysis (method=“FE”) in combination with the HKSJ method. This will result in warnings, because in general the HKSJ adjustment is not meant to be used in combination with a fixed effects analysis. In this case, the warnings can be neglected. The code is kindly provided by G Knapp.
Code for HKSJ conversion in R
library(metafor)
First example
y <− c(−0.04, 0.07, 0.31, 1.36, 0.54)
w <− c( 24.0, 22.2, 21.3, 15.5, 17.0)
rma.uni(y, vi = 1/w, method="FE", knha=TRUE)
Output is presented in Table 5.
Table 5. R output for first example (HartungKnappSidikJonkman method)
Second example (ln HR)
y <− c(0.81, 0.67, 0.80, 0.91, 0.56, 0.98, 1.24, 0.75, 0.95, 0.66)
w <− c( 5.0, 2.1, 11.5, 46.7, 2.9, 9.3, 3.9, 12.7, 3.9, 2.0)
# metaanalysis on log scale (ln HR). Note the brackets around the following syntax!
(hr <− rma.uni(log(y), vi=1/w, method="FE", knha=TRUE))
# backtransformation:
exp(hr$b)
exp(c(hr$ci.lb, hr$ci.ub)) (Table 6).
Table 6. R output for second example (HartungKnappSidikJonkman method)
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
GB conceived the idea. JIH contributed substantially to the study design, developed the software and performed the statistical analyses. JIH, GB and JPAI drafted the manuscript, and read and approved the final manuscript.
Acknowledgements
We would like to thank The Cochrane Library, London, UK for supplying the statistical data of intervention studies added in 2012 to the Cochrane Database of Systematic Reviews. In addition we would like to thank the reviewers Guido Knapp and Julio SánchezMeca for their helpful comments that improved this manuscript.
References

DerSimonian R, Laird N: Metaanalysis in clinical trials.
Control Clin Trials 1986, 7(3):177188. PubMed Abstract  Publisher Full Text

The Cochrane Collaboration: Review Manager (RevMan) 5.1.4. Copenhagen: The Nordic Cochrane Centre; 2011.

Borenstein M, Hedges L, Higgins J, Rothstein H: Comprehensive Metaanalysis Version 2. Englewood NJ: Biostat; 2005.

Hartung J, Knapp G: A refined method for the meta analysis of controlled clinical trials with binary outcome.
Stat Med 2001, 20(24):38753889. PubMed Abstract  Publisher Full Text

Hartung J, Knapp G: On tests of the overall treatment effect in meta analysis with normally distributed responses.
Stat Med 2001, 20(12):17711782. PubMed Abstract  Publisher Full Text

Follmann DA, Proschan MA: Valid inference in random effects metaanalysis.
Biometrics 1999, 55(3):732737. PubMed Abstract  Publisher Full Text

Hartung J, Makambi KH: Reducing the number of unjustified significant results in metaanalysis.
Commun Stat Simul Comput 2003, 32(4):11791190. Publisher Full Text

Makambi KH: The effect of the heterogeneity variance estimator on some tests of treatment efficacy.
J Biopharm Stat 2004, 14(2):439449. PubMed Abstract  Publisher Full Text

Sidik K, Jonkman JN: Robust variance estimation for random effects metaanalysis.
Comput Stat Data Anal 2006, 50(12):36813701. Publisher Full Text

Sidik K, Jonkman JN: A simple confidence interval for metaanalysis.
Stat Med 2002, 21(21):31533159. PubMed Abstract  Publisher Full Text

Sidik K, Jonkman JN: On constructing confidence intervals for a standardized mean difference in metaanalysis.
Commun Stat Simul Comput 2003, 32(4):11911203. Publisher Full Text

SánchezMeca J, MarínMartínez F: Confidence intervals for the overall effect size in randomeffects metaanalysis.

Sidik K, Jonkman JN: Simple heterogeneity variance estimation for meta analysis.
J Roy Stat Soc 2005, 54(2):367384. Publisher Full Text

Davey J, Turner R, Clarke M, Higgins J: Characteristics of metaanalyses and their component studies in the Cochrane Database of Systematic Reviews: a crosssectional, descriptive analysis.
BMC Med Res Methodol 2011, 11(1):160. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Higgins J, Thompson SG, Deeks JJ, Altman DG: Measuring inconsistency in metaanalyses.
Bmj 2003, 327(7414):557. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ioannidis JP, Patsopoulos NA, Evangelou E: Uncertainty in heterogeneity estimates in metaanalyses.
Bmj 2007, 335(7626):914916. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

IntHout J, Ioannidis JP, Borm GF: Obtaining evidence by a single wellpowered trial or several modestly powered trials.
Stat Methods Med Res 2012.
[Epub ahead of print]

Borm GF, Lemmers O, Fransen J, Donders R: The evidence provided by a single trial is less reliable than its statistical analysis suggests.
J Clin Epidemiol 2009, 62(7):711715.
e711
PubMed Abstract  Publisher Full Text 
Viechtbauer W: Conducting metaanalyses in R with the metafor package.

Higgins JPT, Thompson SG: Quantifying heterogeneity in a metaanalysis.
Stat Med 2002, 21(11):15391558. PubMed Abstract  Publisher Full Text

The Cochrane Collaboration.
http://www.cochrane.org/cochranereviews webcite

Singh M, Das RR: Zinc for the common cold.
Cochrane Database Syst Rev 2011, 2:CD001364. PubMed Abstract  Publisher Full Text

Pidala J, Djulbegovic B, Anasetti C, Kharfan‒Dabaja M, Kumar A: Allogeneic hematopoietic cell transplantation for acute lymphoblastic leukemia (ALL) in first complete remission.

Thorlund K, Wetterslev J, Awad T, Thabane L, Gluud C: Comparison of statistical inferences from the DerSimonian–Laird and alternative randomeffects model metaanalyses – an empirical assessment of 920 Cochrane primary outcome metaanalyses.
Res Synth Meth 2011, 2(4):238253. Publisher Full Text

Kontopantelis E, Reeves D: Performance of statistical methods for metaanalysis when true study effects are nonnormally distributed: a simulation study.
Stat Methods Med Res 2012, 21(4):409426. PubMed Abstract  Publisher Full Text

Brockwell SE, Gordon IR: A simple method for inference on an overall effect in metaanalysis.
Stat Med 2007, 26(25):45314543. PubMed Abstract  Publisher Full Text

Hardy RJ, Thompson SG: A likelihood approach to metaanalysis with random effects.
Stat Med 1996, 15(6):619629. PubMed Abstract  Publisher Full Text

Shuster JJ: Empirical vs natural weighting in random effects metaanalysis.
Stat Med 2010, 29(12):12591265. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Henmi M, Copas JB: Confidence intervals for random effects meta analysis and robustness to publication bias.
Stat Med 2010, 29(29):29692983. PubMed Abstract  Publisher Full Text

Guolo A: Higherorder likelihood inference in metaanalysis and metaregression.
Stat Med 2012, 31(4):313327. PubMed Abstract  Publisher Full Text

Borenstein M, Hedges LV, Higgins JPT, Rothstein HR: Introduction to MetaAnalysis. Chichester, UK: Wiley; 2009.

Knapp G, Hartung J: Improved tests for a random effects metaregression with a single covariate.
Stat Med 2003, 22:26932710. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: