Abstract
Background
When multiple endpoints are of interest in evidence synthesis, a multivariate metaanalysis can jointly synthesise those endpoints and utilise their correlation. A multivariate randomeffects metaanalysis must incorporate and estimate the betweenstudy correlation (ρ_{B}).
Methods
In this paper we assess maximum likelihood estimation of a general normal model and a generalised model for bivariate randomeffects metaanalysis (BRMA). We consider two applied examples, one involving a diagnostic marker and the other a surrogate outcome. These motivate a simulation study where estimation properties from BRMA are compared with those from two separate univariate randomeffects metaanalyses (URMAs), the traditional approach.
Results
The normal BRMA model estimates ρ_{B }as 1 in both applied examples. Analytically we show this is due to the maximum likelihood estimator sensibly truncating the betweenstudy covariance matrix on the boundary of its parameter space. Our simulations reveal this commonly occurs when the number of studies is small or the withinstudy variation is relatively large; it also causes upwardly biased betweenstudy variance estimates, which are inflated to compensate for the restriction on _{B}. Importantly, this does not induce any systematic bias in the pooled estimates and produces conservative standard errors and meansquare errors. Furthermore, the normal BRMA is preferable to two normal URMAs; the meansquare error and standard error of pooled estimates is generally smaller in the BRMA, especially given data missing at random. For metaanalysis of proportions we then show that a generalised BRMA model is better still. This correctly uses a binomial rather than normal distribution, and produces better estimates than the normal BRMA and also two generalised URMAs; however the model may sometimes not converge due to difficulties estimating ρ_{B}.
Conclusion
A BRMA model offers numerous advantages over separate univariate synthesises; this paper highlights some of these benefits in both a normal and generalised modelling framework, and examines the estimation of betweenstudy correlation to aid practitioners.
Background
Traditionally, metaanalysis models combine summary measures of a single quantitative endpoint, taken from different studies, to produce a single pooled result. However, multiple pooled results are required whenever there are multiple outcomes [1] or multiple treatment groups [2]. A multivariate metaanalysis model uses the correlation between the endpoints and obtains the multiple pooled results collectively [3,4]. For example, Reitsma et al. [5] have suggested a bivariate randomeffects metaanalysis (BRMA) to jointly synthesise logitsensitivity and logitspecificity values from diagnostic studies. Multivariate metaanalysis has been quite widely used, including application to genetic associations [6], surrogate endpoints [7,8], psychological findings [9] and prognostic markers [10].
Often the advantage of a multivariate randomeffects metaanalysis lies in its ability to use the betweenstudy correlation of the multiple endpoints of interest. For example, in diagnostic studies the sensitivity and specificity are usually negatively correlated across studies due to the use of different thresholds [5]. Van Houwelingen et al. [4] use the betweenstudy correlation to describe the shape of the bivariate relationship between the true logodds in a treatment group and the true logodds in a control group (baseline risk). Riley et al. [10] algebraically assess BRMA and show that the correlation allows a 'borrowing of strength' across endpoints. This leads to pooled estimates that have a smaller standard error than those from corresponding univariate randomeffects metaanalyses (URMAs), especially when some endpoints are missing at random across studies.
Some recent articles have indicated the betweenstudy correlation may often be estimated at the end of its parameter space, as +1 or 1. Thompson et al. [6] apply a normal BRMA model to genetic studies of coronary heart disease and report that the betweenstudy correlation was 'poorly estimated' with the likelihood peaking at +1, and an estimate of +1 has also been reported in other applications [4,11]. To aid practitioners, in this paper we analytically consider why this occurs, and then explore the impact, if any, this has on the pooled estimates and betweenstudy variance estimates. This investigation also allows us to examine the general benefits of BRMA over URMA, to build on a number of other recent articles [5,10], and encourage a greater use and appreciation of BRMA in practice.
The outline of the paper is as follows. We begin by introducing the general normal model for BRMA and discuss analytically why the betweenstudy correlation can be estimated at the edge of its parameter space. We then apply the model to two real examples from the literature, one involving a diagnostic marker and one involving a surrogate outcome, and these both give a betweenstudy correlation estimate of 1. This then motivates a realistic simulation study to examine how the estimate of betweenstudy correlation affects the statistical properties of the pooled and betweenstudy variance estimates. It also allows us to compare the performance of the normal BRMA model to two separate URMAs, the more common approach in practice [10]. We then extend our work to consider metaanalysis of proportions, and highlight why a generalised model for BRMA is preferred to the general normal BRMA model [12], and also two separate generalised URMA models. We conclude by summarising the broad benefits of BRMA for practitioners, and discuss future research priorities.
Methods
In this section we introduce a hierarchical normal framework for BRMA and URMA. We describe how the BRMA normal model is estimated, and analytically consider the estimation of betweenstudy correlation. We then describe the rationale and methodology for our simulation study of BRMA versus URMA. To ensure a real world context, this section also includes two motivating examples from the medical literature where a BRMA is potentially important.
Motivating example 1 – the telomerase data
Glas et al. [13] systematically review the sensitivity and specificity of tumour markers used for diagnosing primary bladder cancer. One of these markers was telomerase, a ribonucleoprotein enzyme, evaluated in 10 studies as shown in Table 1. Rather than applying a URMA independently for sensitivity and specificity, the authors jointly synthesise the logittransformed sensitivity and the logittransformed specificity in a normal BRMA model as described below and recently proposed elsewhere [5].
Table 1. The telomerase data taken from the bladder cancer review of Glas et al. [13]
A general normal model for bivariate randomeffects metaanalysis (BRMA)
Suppose that i = 1 to n studies are identified by a systematic review, and that two endpoints (j = 1 or 2) are available from each study. Each study supplies summary measures, Y_{ij}, and associated standard errors, s_{ij}, for each endpoint. For instance, for diagnostic studies Reitsma et al. [5] suggest the logitsensitivity is Y_{i1 }and the logitspecificity is Y_{i2}. Each summary statistic (Y_{ij}) is assumed to be an estimate of a true value (θ_{ij}) in each study, and in a hierarchical structure each θ_{ij }is assumed to be drawn from a distribution with mean (or 'pooled') value β_{j }and betweenstudy variance . If Y_{ij}/θ_{ij }and θ_{ij }are both assumed normally distributed then the BRMA can be specified as:
This model is the general normal model for BRMA [4], where δ_{i }and Ω are the withinstudy and the betweenstudy covariance matrices respectively. Usually of key interest from the analysis are the pooled estimates of β_{1 }and β_{2}, although sometimes an estimated function of these may be desired; for instance in the telomerase example an estimate of the log of the diagnostic odds ratio is given by _{1 }+ _{2}. The BRMA differs from two independent URMAs by the inclusion of the withinstudy correlations (i.e. the ρ_{Wi }s) and the betweenstudy correlation (ρ_{B}). Equation (2) is equivalent to two independent URMAs when ρ_{Wi }= ρ_{B }= 0 for all i, i.e. there is zero correlation:
In equation (1) it is common to assume the s and ρ_{Wi }s are known [1,4]. Including the uncertainty of the s is unnecessary for URMA [14], but whether the uncertainty of the s and ρ_{Wi }s should be incorporated in BRMA is yet to be examined. This issue is outside the scope of this paper but we note that a Bayesian framework is particularly flexible for incorporating such uncertainty [15].
Within and betweenstudy correlation
The withinstudy correlation, ρ_{Wi}, indicates whether Y_{i1 }and Y_{i2 }are correlated within a study, and these ρ_{Wi }s are usually assumed known. For the telomerase data the ρ_{Wi }s might be assumed to be zero because sensitivity and specificity values are calculated independently in a study using different sets of patients. In other BRMA applications the ρ_{Wi }s can be nonzero, for example where the two endpoints are overall and diseasefree survival [10]. In practice it may be difficult to obtain the value of nonzero ρ_{Wi }s, although it can be done as evident in Berkey et al. [1] and the 'motivating example 2' below [7]. Suggestions for limiting the problem of unavailable ρ_{Wi }s have been proposed [1517], and this issue is considered further in the discussion.
The betweenstudy correlation, ρ_{B}, is not generally known and has to be estimated when fitting the BRMA. It indicates how the underlying true values, i.e. the θ_{i1}s and the θ_{i2 }s, are related across studies. It may be caused by differences across studies in patientlevel characteristics, such as age and stage of disease, or changes in studylevel characteristics, such as the threshold level in diagnostic studies
Motivating example 2 – the CD4 data
Daniels and Hughes [7] assess whether the change in CD4 cell count is a surrogate for time to either development of AIDS or death in drug trials of patients with HIV. They consider betweentreatmentarm loghazard ratios of time to onset of AIDS or death (Y_{i1}), and betweentreatmentarm differences in mean changes in CD4 count (Y_{i2}) from pretreatment baseline to about six months. Fifteen relevant trials were identified. Some of the trials involved three or four treatment arms, but to enable application to BRMA here we only consider outcome differences between the control arm and the first treatment arm in the reported dataset [7]. All fifteen trials provided complete data, including the withinstudy correlations which were quite small, varying between 0.22 and 0.17 with a mean of 0.08.
Estimation
In our analyses of equation (1) in this paper the betweenstudy parameters (i.e. , and ρ_{B}) and the two pooled values (β_{1 }and β_{2}) are estimated iteratively using restricted maximum likelihood (REML) in SAS Proc Mixed, as described elsewhere [4]. Unless otherwise stated, we also use Cholesky decomposition [18] of Ω to ensure that this matrix is estimated to be positive semidefinite and therefore that the betweenstudy correlation estimate, _{B}, is in the range [1,1]. Cholesky decomposition of Ω also helps ensure convergence when _{B }is very close to 1 or 1.
Analytic consideration of the betweenstudy covariance parameters
Estimation and inference in classical linear mixed models are based on the marginal model, which for equation (1) is the bivariate normal model with variancecovariance δ_{i }+ Ω. Assume for the sake of simplicity that the withinstudy covariance matrix δ_{i }= δ for all studies. Then the covariance matrix of the observed Y_{i1 }s and Y_{i2 }s is given by V = δ + Ω, where δ is known. Now, this puts (severe) restrictions on V, namely that V  δ is a covariance matrix, that is nonnegative definite. If the estimated V does not satisfy this restriction, the maximum likelihood estimate of Ω will be truncated on the boundary of its parameter space. This means that if Ω is diagonal (as for URMA) the maximum likelihood estimate on the boundary will have one or both of the betweenstudy variances equal to zero; else if Ω is nondiagonal (as for BRMA) then either one or both of the betweenstudy variances equals zero or else the betweenstudy correlation equals 1 or +1. In metaanalysis a betweenstudy variance estimate of zero is wellunderstood, but a betweenstudy correlation estimate of 1 or +1 is likely to be less familiar to practitioners. Thompson et al. [6] refer to this issue as 'poor estimation' but it perhaps should rather be considered a natural consequence of the sensible restrictions imposed on V, and one that prevents a variance estimate < 0 or a correlation estimate > +1 or < 1, as might otherwise be obtained.
Rationale for the research in this paper
In this paper, to aid practitioners we will further assess the normal BRMA model and the role of betweenstudy correlation. In particular, we aim to identify the situations when _{B }is likely to be +1 or 1 and examine the impact this has on the pooled and betweenstudy variance estimates. We will also evaluate the benefits of BRMA over two separate URMAs, the more common approach in practice, and explore extensions to a generalised BRMA model for metaanalysis of proportions. To achieve these goals we firstly apply the normal BRMA of equation (1) to the two motivating examples. We then perform a simulation study of the normal BRMA model, as described below. The generalised BRMA model is then introduced and assessed in relation to the normal BRMA model and two separate generalised URMAs (see Results).
A simulation study to assess BRMA and the betweenstudy correlation
We carried out a simulation study of the general normal model for BRMA in 11 scenarios, labelled (i) to (xi) (Table 2). Each of scenarios (i) to (xi) relates to a different but realistic specification of equation (1). Scenarios (i) to (vi) consider complete data, as in the telomerase example, whereas scenarios (vii) to (vi) consider when some data are missing at random across studies, as assumed in the BRMA of Thompson et al. [6]. The scenarios also vary in the relative sizes of the within and betweenstudy correlations, and also the within and betweenstudy variation. For example, scenarios (i) to (iv) involve withinstudy variances similar in size to the betweenstudy variances, as observed in prognostic studies [10], and in scenario (vi) there is one relatively low and one relatively high betweenstudy variance, as for the CD4 dataset. The sizes of the metaanalysis were either n = 5 or n = 50 studies for complete data, and either n = 10 or n = 50 for missing data. Our method of simulation was deliberately chosen to be similar to that previously used by Berkey et al. [1] and Sohn [19]. As an example, we now describe the simulation procedure for scenario (i) with n = 50.
Table 2. Scenarios used in the simulations based on equation (1)
Description of the simulation procedure for scenario (i) with n = 50
Generation of a dataset of 1000 metaanalyses
We chose β_{1 }= 0 in order to reflect little clinical benefit (e.g. a sensitivity of 50%) and in contrast β_{2 }= 2 (e.g. a specificity of 88%). The within study variances required, i.e. the 50 s and 50 s, were each found by sampling from a N(0.25,0.50) distribution and squaring the value obtained. This produced a median of 0.25 and an interquartile range of 0.7, values similar to those for the telomerase data (median = 0.21) and for the BRMA of Thompson et al. [6] (median = 0.28, interquartile range = 0.76). The betweenstudy variances, and , were chosen to be 0.25 in scenario (i), which meant that they were similar in size to the median withinstudy variances. The within and betweenstudy correlations were both set to zero for this scenario. All these choices were substituted into equation (1) and 1000 metaanalyses each of 50 studies were generated. Calculations were performed in SPlus using the 'rmvnorm' function for generating bivariate normal values (code available upon request).
Estimation using the dataset of 1000 metaanalyses
Each of the 1000 metaanalyses in scenario (i) were analysed separately by:
• fitting two separate URMAs as in equation (2) (where ρ_{B }= 0) using REML to estimate β_{1}, β_{2}, and
• fitting a BRMA as in equation (1) using REML to estimate β_{1}, β_{2}, , , and ρ_{B}
The 1000 BRMA estimates and the corresponding 1000 URMA estimates from scenario (i) were then compared by calculating:
• average parameter estimates across all the simulations (to check for bias)
• coverage of the 95% confidence intervals for β_{1 }and β_{2}
• average standard error and meansquare error (MSE) of β_{1 }and β_{2}
• the number of occasions _{B }was equal to +1 or 1 in the BRMA
To assess coverage, the 95% confidence intervals for β_{j }were calculated using:
with n_{j }the number of studies providing endpoint j. This tdistribution is commonly used in the metaanalysis literature, although it is only an approximation [20].
Description of the simulation procedure for scenarios (ii) to (xi)
Simulations in the other scenarios followed in the same manner as described above but with the data generated from the parameter values specific to each scenario as given in Table 2. For those missing data simulations of scenarios (viii) to (xi) we simulated data as described for complete data, except that for each generated metaanalysis we removed the data for the second endpoint in a randomly selected 50% of studies. So, for example, with n = 50 in scenario (viii) each of the 1000 simulated metaanalyses contained 25 studies with complete data and 25 studies with data for the first endpoint only.
Results
Application to the telomerase and CD4 data
The normal model for BRMA (equation (1)) and then two separate URMAs (equation (2)) were applied to the telomerase data. Both approaches gave a pooled sensitivity of about 76% and a pooled specificity of about 88% (Table 3). The BRMA gave a betweenstudy correlation estimate of 1 but the profile likelihood reveals that there is little information regarding ρ_{B }with the loglikelihood gradually increasing as _{B }approaches 1 (Figure 1), the end of its parameter space. Interestingly, the betweenstudy variances were estimated to be somewhat larger in the BRMA than the URMA, and the standard errors of the pooled estimates were also slightly larger in the BRMA; just the opposite of what one might expect from a bivariate analysis utilising large correlation [10]. A similar finding was observed upon application of BRMA and URMA to the CD4 data. The BRMA again gave a betweenstudy correlation estimate of 1 and both betweenstudy variances were estimated somewhat larger in the BRMA than the URMA, as were the standard errors of the pooled estimates.
Table 3. URMA and BRMA results for the telomerase and CD4 datasets
Figure 1. Profile loglikelihood for the betweenstudy correlation from the general normal BRMA of the telomerase data.
The question is thus posed: is the estimation of ρ_{B }at the boundary of its parameter space adversely influencing the other BRMA parameter estimates and, if so, how (e.g. does it introduce bias)? Also, in terms of the individual pooled estimates, the telomerase and CD4 examples indicate little benefit of BRMA over two separate URMAs, despite the utilisation of correlation; but is this generally true and in what situations should a BRMA be preferred? To understand the answers to these important questions, it is helpful to now consider the results from our simulation study.
We note at this point that, for both telomerase and CD4, we also tried estimation of BRMA using an unstructured form of Ω, rather than using Cholesky decomposition of Ω as previously. Interestingly, this approach produced nonsensical betweenstudy correlation estimates of 1.12 and 1.074 for the telomerase and CD4 datasets respectively. This emphasises the importance of a boundary constraint on ρ_{B }as imposed by the Cholesky decomposition.
Simulation results
Table 4 includes the simulation results for scenarios (i), (ii), (viii) and (ix). The results for all other scenarios are provided in Appendix 1 (see 1). We now discuss the key findings.
Table 4. Simulation results of the normal BRMA and URMA models for scenarios (i), (ii), (viii), and (ix)
Additional file 1. Appendix 1. Simulation results of the normal BRMA and URMA models for scenarios (iii) to (vii), (x) and (xi)
Format: DOC Size: 151KB Download file
This file can be viewed with: Microsoft Word Viewer
Betweenstudy correlation
One can see from Table 4 that the normal model for BRMA often estimates the betweenstudy correlation, ρ_{B}, as either +1 or 1, especially when the number of studies in the metaanalysis is small. For example, with n = 5 in scenario (ii), where the true ρ_{B }was 0.8, 605 of the 1000 simulations (60.5%) gave _{B }equal to +1 and 103 simulations (10.3%) gave _{B }equal to 1. This led to a mean value of _{B }equal to 0.639, a downward bias of about 20%. However, this downward bias is not in itself a concern because it is simply caused by the maximum likelihood estimator sensibly truncating _{B }at +1 and 1, which improves the meansquare error of _{B}. Furthermore, _{B }is clearly asymptotically unbiased, with the occurrence of _{B }equal to 1 or +1 and thus the bias in mean _{B }becoming increasingly less as the number of studies in the metaanalysis increases (Table 4). Interestingly though, the number of studies required to reduce the occurrence of _{B }equal to 1 or +1 was far greater when the withinstudy variation was large relative to the betweenstudy variation. For example, even with a large n = 50 studies in scenario (v), where the withinstudy variation was relatively large, 58% of the simulations gave a betweenstudy correlation of 1 or +1.
These findings indicate why _{B }equals 1 in the BRMAs of the telomerase and CD4 datasets. For the telomerase data there are 10 studies; the simulations show that this magnitude of studies will often provide little information about ρ_{B}, causing _{B }to often be constrained at 1 or +1 so that the restrictions imposed on V are met. For the CD4 data, even though there are five more studies than telomerase, the mean withinstudy variance for endpoint j = 1 is 0.15 and this is large relative to the betweenstudy variation ( = 0.048). In such situations where the withinstudy variation dominates, the simulations again show that _{B }will often require truncation at the end of its parameter space to ensure V is nonnegative definite.
Betweenstudy variance estimates
Our simulation results show that the betweenstudy variance estimates were less frequently truncated at zero in the BRMA than the URMA (Table 4). For example, in scenario (ii) with n = 5 was zero for 104 of the URMA simulations and none of the BRMA simulations. Furthermore, in those scenarios where the betweenstudy correlation was often +1 or 1 (e.g. scenario (ii) with n = 5), the normal BRMA model produces a noticeable upward bias in the betweenstudy variance estimates. For example, in scenario (ii) with n = 5 there was an upward bias of 0.024 in and , about 10% above their true value. To understand analytically why this occurs, we need to consider that the betweenstudy covariance (τ_{12}) is formulated by τ_{12 }= ρ_{B }τ_{1 }τ_{2}. Now, if _{B }is constrained at 1 or +1, then to obtain the necessary solution for _{12 }the maximum likelihood estimator can only increase the _{j }s, which do not have an upper bound constraint. Thus the betweenstudy variance estimates are inflated to compensate for the constraint on _{B}. This explains why the BRMAs of the telomerase and CD4 data, where _{B }was truncated at 1, give _{j }s that are noticeably larger than those from two separate URMAs. Practitioners need to be aware of this issue; however, we do not consider it a major concern as the maximum likelihood estimator for is still asymptotically unbiased (the bias decreases as the number of studies increases) and the inflation is simply caused by the sensible and necessary constraint on _{B}. Furthermore, the inflation is essentially conservative, leading to a larger standard error and meansquare error of pooled estimates as now discussed.
Pooled estimates
For all complete and missing data scenarios, the pooled estimates were approximately unbiased for both BRMA and URMA. Even in those scenarios where _{B }was often + 1 or 1 it is encouraging that, despite the upward bias in betweenstudy variances, there was no systematic bias in the pooled estimates from the BRMA (Table 4 and Appendix 1 – see 1). The main affect on the pooled estimates was an inflated standard error and meansquare error. This is a conservative property, but meant that in some complete data scenarios the BRMA performed slightly worse than URMA, despite the utilisation of correlation in the BRMA that might be expected to improve efficiency [10]. For example, in the n = 5 results for scenario (iii), where about 56% of simulations gave _{B }equal to +1 or 1 and there was an upward bias in betweenstudy variances, the standard error/meansquare error of _{1 }was larger in the BRMA (0.272/0.083) than the URMA (0.268/0.081). This explains the BRMA results for telomerase and CD4, where the standard errors of pooled estimates were larger in the BRMA than the URMA due to the inflated betweenstudy variances.
The coverage of β_{1 }and β_{2 }was between 93% and 98% in most scenarios assessed, and was often similar in URMA and BRMA. It is hard, though, to make general conclusions regarding coverage, as those situations where _{B }is often +1 or 1 are the same situations where the tdistribution with n1 degrees of freedom is a poor approximation to the true sampling distribution. The true degrees of freedom to use here are complex and account for the withinstudy variances [12]; however this is rarely done in metaanalysis and is beyond the scope of this paper.
BRMA versus URMA for estimating the pooled values
For complete data, in most scenarios the BRMA was marginally superior to URMA as the pooled estimates had slightly smaller standard errors and meansquare errors, especially given large correlations (Table 4 and Appendix 1 – see 1). However, the URMA sometimes performed equally well, and occasionally even better in those scenarios where _{B }was often +1 or 1 as discussed above. We also compared the subset of BRMA results where _{B }did not equal +1 or 1 with the corresponding URMA results, and again found that BRMA was generally slightly superior to URMA. This finding agrees with previous algebraic results [10], that given complete data there is generally a very small benefit of BRMA over URMA for estimating β_{1 }and β_{2 }themselves. Our focus here is on the individual pooled estimates, but we note that there are also broader reasons why a BRMA may be preferred over URMA for complete data. These are summarised in the Discussion to ensure a more complete picture for practitioners considering BRMA.
For the missing data simulations, the pooled estimate for endpoint j = 2 was of particular interest because of the missing data for this endpoint. Encouragingly, the meansquare error and mean standard error of _{2 }were much smaller in the BRMA than the URMA, although the coverage was comparable (Table 4 and Appendix 1 – see 1). For example, in the n = 10 simulations of scenario (xi) the mean standard error was 0.225 in the BRMA compared to 0.262 in the URMA, and the MSE was 0.0708 in the BRMA compared to 0.0921 in the URMA. The reduction in standard error and MSE was larger when both the within and betweenstudy correlations were high. Even when ρ_{Wi }was zero there was still a reasonable benefit if ρ_{B }was high; for example, in the n = 10 simulations of scenario (viii) the mean standard error of _{2 }was 0.174 in the BRMA compared to 0.209 in the URMA. This finding agrees with algebraic work regarding the benefits of BRMA for when there are data missing at random [10]. Practitioners should again consider this benefit alongside the other broader reasons for using BRMA rather than URMA (see Discussion).
Extended simulations of the normal BRMA model
In our above simulations of the normal BRMA model we used nonnegative within and betweenstudy correlations; however, in reality negative correlations may arise as in the telomerase and CD4 examples. Also, our simulations took the withinstudy correlations to be the same in each study, while in reality their value may vary. Further simulations were thus performed to assess negative correlation and discrepant withinstudy correlations. These gave findings consistent with those identified previously (Appendix 2 – see 2); the BRMA was still beneficial over URMA for estimating the pooled endpoints, and where the betweenstudy correlation estimate was often +1 or 1 there was again an upward bias in the BRMA betweenstudy variance estimates.
Additional file 2. Appendix 2. Simulation results for the normal BRMA and URMA models for some scenarios involving negative correlation
Format: DOC Size: 87KB Download file
This file can be viewed with: Microsoft Word Viewer
For simplicity, in all our simulations and were generated independently but in reality they are likely to be correlated due to the sample size being similar for both endpoints. We also generated the s independent to the Y_{ij }s, yet in many situations, such as the synthesis of logodds ratios, the size of may be related to the size of Y_{ij}. To address this, we also performed further simulations where we firstly generated individual binary data for diagnostic studies, using simulation code kindly provided by Chu and Cole [12]. From this realistic raw data we then calculated the Y_{ij }s and their s, before then fitting the normal BRMA model as before. The results again show that the betweenstudy correlation estimate is often +1 or 1 and the BRMA is still preferable to URMA, with improved meansquare error, coverage and, especially, bias of estimates (Table 5). However, the results also revealed some severe limitations of a normal model for metaanalysis of binary data, as now discussed.
Table 5. simulation results for metaanalysis of proportions
A generalised model for BRMA of proportions
So far in this paper we have modelled the summary statistics across studies, i.e. the Y_{ij }s, and assumed they are normally distributed. Indeed, in our main simulations we generated the Y_{ij }s directly from the normal BRMA model of equation (1); thus, our conclusions are only valid for Y_{ij }s that can truly be assumed normally distributed. This normality assumption is common in the metaanalysis field, and will often be suitable (see Discussion). However, in our two motivating examples it is more plausible for the CD4 data than the telomerase data as the latter involves a metaanalysis of proportions, for which the normality assumption is not appropriate when some studies have a small number of patients or the proportions are close to 0 or 1. For this reason recent articles [12] suggest that, rather than modelling the logitproportions using the normal distribution, one should directly model the binary data using a binomial distribution. This approach also avoids the use of ad hoc continuity corrections in those studies which have zero cells. In terms of diagnostic studies, this generalised model for BRMA of sensitivity and specificity can be written as follows [12]:
no. testing positive_{i}~Binomial (total no. true positives_{i}, sensitivity_{i}) logit(sensitivity_{i}) = β_{1 }+ u_{1}
no. testing negative_{i}~Binomial (total no. true negatives_{i}, specificity_{i}) logit(specificity_{i}) = β_{2 }+ u_{2}
Equation (3) can be fitted using maximum likelihood estimation in SAS NLMIXED. Chu and Cole [12] show that where the true sensitivity and specificity are large, this generalised BRMA model produces close to unbiased pooled and betweenstudy correlation estimates, whereas the general normal BRMA model produces somewhat biased estimates. This can also be seen in our simulations results for metaanalysis of proportions in Table 5. The meansquare error, coverage, and, most noticeably, bias of estimates are far superior in the generalised BRMA than the normal BRMA. Furthermore, the generalised BRMA is also marginally superior in terms of bias to two separate generalised URMAs (i.e. equation (3) where ρ_{B }= 0), emphasising that the BRMA is also beneficial over URMA in the generalised model framework. Note though that, although it is the best method, the generalised BRMA model is itself not without bias (Table 5); to rectify this, extension to REML or other estimation techniques is potentially important.
To conclude our research we applied the generalised BRMA model to the telomerase data. Unfortunately the model would not converge appropriately; different starting values all produced a betweenstudy correlation estimate of 1 but gave markedly different parameter estimates and caused spurious standard errors. For example, for one set of starting values the model gave the standard error of _{1 }as 30.5, whereas for another set the standard error was close to zero. Indeed SAS provides the following warning: 'the final Hessian matrix is not positive definite, and therefore the estimated covariance matrix is not full rank and may be unreliable'. The problem here is again due to the betweenstudy correlation of estimate 1 in Ω, as this causes the determinant of to be zero. This has greater implications in equation (3) than for the normal BRMA model of equation (1). The maximum likelihood estimator for equation (1) involves the determinant of δ_{i }+ in each study, which will not be zero unless the withinstudy correlations are also +1 or 1. However, in equation (3) the maximum likelihood estimator involves the determinant of itself, which causes problems akin to dividing by zero, which is why spurious estimates and standard errors are produced. It is clear that there is simply little information to estimate the betweenstudy correlation for the telomerase dataset, due to the small number of studies. This issue is also evident in our n = 10 simulations of the generalised BRMA model (Table 5), where 397 of the 1000 simulations did not converge appropriately. In such situations where estimating ρ_{B }is difficult, application of two generalised URMAs may be the most appropriate option available (Table 3), although specifically for diagnostic studies other methods may also be valuable [21].
Discussion
Multivariate metaanalysis models are increasingly used to synthesise multiple, correlated endpoints of interest, especially in studies of diagnosis [5,21] and surrogate outcomes [7,8]. The Campbell Collaboration suggests that metaanalysts 'should not ignore the dependence among study outcomes'; however, they also note that 'the consequences of accounting for (modelling) dependence or ignoring it are not well understood' [22]. To therefore aid practitioners considering the approach, in this paper we have examined two models for BRMA and compared them to separate univariate syntheses, the traditional approach. We now discuss the main conclusions from our work and suggest future research priorities.
The general normal model for BRMA
A normal metaanalysis model is appropriate when the Y_{ij }s can be assumed normally distributed; this assumption is commonly used, for example where the Y_{ij }s are logodds ratios [15], loghazard ratios [10], mean differences [1] and logevent rates [11].
Betweenstudy covariance parameters
It is clear from our work that maximum likelihood estimation of a normal randomeffects metaanalysis model will often truncate the betweenstudy covariance matrix, Ω, on the boundary of its parameter space. For URMA this is observed by a betweenstudy variance estimate of zero, whilst in BRMA it is more likely observed by a betweenstudy correlation of +1 or 1. Practitioners are likely to be familiar with the concept of zero variance, but are perhaps less likely to appreciate why a correlation is estimated at unity. However, both arise for the same reason, namely that Ω must be a nonnegative definite matrix such that is not < 0 and ρ_{B }is not > +1 or < 1. Our simulations show that, especially when the number of studies is small and/or the withinstudy variance is large relative to the betweenstudy variance, such truncation is often necessary to ensure the sensible restrictions are met. In the normal BRMA, we have also shown that a consequence of _{B }being truncated at +1 or 1 is an upward bias in betweenstudy variance estimates, which are inflated upwards to compensate for the restriction on _{B}. Practitioners should not, though, be overly concerned by this. We have shown it does not cause any systematic bias in the pooled estimates from BRMA, and it leads to conservative standard errors and meansquare errors.
The benefits over URMA for the pooled estimates
Our simulation results highlight that a normal model for BRMA is preferable to two separate URMAs for estimating the pooled endpoints, and our results are consistent with previous findings that show how the inclusion of correlation allows the 'borrowing of strength' across endpoints [1,10,19]. We thus recommend practitioners use a BRMA rather than two separate URMAs where possible. In particular, when some data are missing at random the BRMA is likely to produce a much smaller standard error and meansquare error of pooled estimates than URMA, even for moderate correlations. Riley et al. [10] give an applied example that shows this. For complete data, practitioners should not expect to see much gain in statistical efficiency over URMA; the meansquare error and standard error of pooled estimates are generally only marginally smaller in BRMA than URMA, and on the occasion of _{B }= +1 or 1 they may even be slightly worse in BRMA (due to the inflated betweenstudy variances, as in the telomerase and CD4 examples). However, there are broader reasons why BRMA may still be preferable in this situation (see below).
The generalised model for BRMA
In equation (3) we extended our work to a generalised BRMA model for metaanalysis of two proportions. For synthesis of two proportions, like sensitivity and specificity, this approach is preferable to the general normal BRMA model (Table 5) because the normality assumption breaks down when the proportions are close to 0 or 1 and when there are small patient numbers [12]. It also avoids the use of adhoc continuity corrections when there are zero cells in some studies. Practitioners synthesising diagnostic studies are thus encouraged to use the generalised BRMA model, rather than the normal model or indeed two separate generalised URMAs. However, they should also be aware that a betweenstudy correlation estimate of +1 or 1 in the generalised BRMA model is likely to be associated with nonconvergence and unstable pooled estimates, as discussed for the telomerase data. In such situations there may be little information to estimate the correlation, and so practitioners may wish to consider other methods for synthesising diagnostic studies, such as the hierarchical summary receiver operating characteristic (HSROC) method [21]; if this is also not possible then the best option may be a generalised URMA for sensitivity and specificity separately (Hamza et al., personal communication).
The broader benefits of BRMA
Our simulations focused mainly on the benefits of BRMA over URMA for estimating the pooled endpoints. However, there are also broader reasons why a BRMA may be preferable to URMA, for either complete or missing data. For example, BRMA allows one to describe the bivariate relationship between endpoints [4,5], model, test or make predictions from their association [7], and estimate some function of the two pooled endpoints, like β_{1 } β_{2 }[10], β_{1 }+ β_{2 }[5], or β_{1}/β_{2 }[6]. For instance, for the telomerase data, a BRMA enables a single framework to estimate the pooled sensitivity, pooled specificity, and the pooled diagnostic odds ratio (exp(β_{1 }+ β_{2})) [21]. Furthermore, Reitsma et al. [5] show that a BRMA of diagnostic studies enables the correlation between pooled endpoints to be estimated, which allows one to measure the shape of their bivariate relation and construct confidence ellipses. It also allows calculation of the conditional variance in one parameter given a fixed value of the other parameter, and allows drawing of the summary ROC curve. In terms of the CD4 data, the estimated correlation between pooled endpoints from a BRMA enables one to predict the time of onset of AIDS or death from a future patient's CD4 level. Thus the BRMA can help establish whether CD4 should be used as a surrogate of diseasefree survival [7,8], and further research of multivariate metaanalysis in this context is recommended. A BRMA may also be extended to a bivariate metaregression by including additional studylevel covariates that explain the betweenstudy heterogeneity. For example, one may wish to include a covariate for study quality in metaanalysis of diagnostic studies [23]. Berkey et al [1] show that a bivariate metaregression is more efficient than separate univariate metaregressions for assessing such studylevel covariates, again due to the inclusion of correlation.
Further research suggestions
For the general normal BRMA model, the role of estimation techniques other than REML would be interesting to consider, especially as other potentially better options have just been proposed [24]. For the generalised BRMA model, SAS NLMIXED currently only allows maximum likelihood estimation and so extension to REML is required, especially as the maximum likelihood estimates are not without bias (Table 5). Further research of multivariate metaanalysis within a Bayesian framework is also potentially important, as it would enable the incorporation of prior knowledge about the parameters, which may be valuable when the number of studies is small [15,25]. It would also allow the uncertainty of the s, s and ρ_{Wi }s to be taken into account, as in practice they will only be estimates themselves as mentioned by Daniels and Hughes [7]. Further assessment of the role of the withinstudy correlations is also required, in particular what should we do when they are nonzero but unavailable? For the metaanalysis of surrogate endpoints it has been suggested that the withinstudy correlations are likely to be small (between 0 and 0.2) and can plausibly be considered constant across studies [7], or even zero [26]. However, this is not necessarily true in other fields; for example, in a multivariate metaanalysis of longitudinal data the withinstudy correlations varied between 0.48 and 0.97 (Jones et al., personal communication).
The use of individual patient data (IPD) in multivariate metaanalysis should also be considered, especially as IPD is the goldstandard for metaanalysis [27] and it would allow any unavailable withinstudy correlations to be calculated directly [7]. In practice though, IPD may only be available for a proportion of studies, and so methods for multivariate metaanalysis are required that combine IPD and aggregate data [27,28]. There has also be little consideration of how to assess dissemination bias using a multivariate metaanalysis framework [29,30], and this warrants attention as metaanalysis datasets are often fraught with such issues as publication bias [31] and withinstudy selective reporting [32,33]. In such scenarios some of the missing endpoints may not be missing at random, and so sensitivity analyses to assess how the metaanalysis results change under a variety of missing data assumptions would be potentially valuable [34].
Conclusion
In this paper we have used analytic reasoning, two applied examples and a realistic simulation study to highlight the benefits of a normal model for BRMA over two separate URMAs, and explain why the betweenstudy correlation is often estimated as +1 or 1. For metaanalysis of proportions, we also extended our work to a generalised model for BRMA, to ensure the binary data is modelled correctly. Our work adds to a growing body of literature indicating the rationale and benefits of a multivariate approach to metaanalysis, and we encourage metaanalysts to consider the approach in practice.
Abbreviations
BRMA – Bivariate randomeffects metaanalysis
IPD – Individual patient data
URMA – Univariate randomeffects metaanalysis
REML – Restricted maximum likelihood
Competing interests
The author(s) declare that they have no competing interests.
Authors' contributions
RDR, KRA, AJS and PCL developed the study. RDR performed analysis of the telomerase and CD4 data. RDR, PL and JRT performed the normal model simulations; RDR performed the generalized model simulations. All authors examined the simulation results and discussed their interpretations. RDR wrote the paper with contributions from all authors.
Acknowledgements
Richard Riley is funded by the Department of Health National Coordinating Centre for Research Capacity Development as a Postdoctoral Research Scientist in Evidence Synthesis. We would like to thank the three reviewers of our manuscript, whose comments have substantially improved the content of this paper. We also greatly thank Haitao Chu, who kindly provided SAS simulation code that enabled us to assess the generalised bivariate model.
References

Berkey CS, Hoaglin DC, AntczakBouckoms A, Mosteller F, Colditz GA: Metaanalysis of multiple outcomes by regression with random effects.
Stat Med 1998, 17(22):25372550. PubMed Abstract  Publisher Full Text

Hasselblad V: Metaanalysis of multitreatment studies.
Med Decis Making 1998, 18(1):3743. PubMed Abstract  Publisher Full Text

Becker BJ, Tinsley HEA, Brown S: . In Multivariate Metaanalysis. San Diego , Academic Press; 2000.

Van Houwelingen HC, Arends LR, Stijnen T: Advanced methods in metaanalysis: multivariate approach and metaregression.
Stat Med 2002, 21(4):589624. PubMed Abstract  Publisher Full Text

Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH: Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews.
J Clin Epidemiol 2005, 58(10):982990. PubMed Abstract  Publisher Full Text

Thompson JR, Minelli C, Abrams KR, Tobin MD, Riley RD: Metaanalysis of genetic studies using Mendelian randomizationa multivariate approach.
Stat Med 2005, 24(14):22412254. PubMed Abstract  Publisher Full Text

Daniels MJ, Hughes MD: Metaanalysis for the evaluation of potential surrogate markers.
Stat Med 1997, 16(17):19651982. PubMed Abstract  Publisher Full Text

Gail MH, Pfeiffer R, Van Houwelingen HC, Carroll RJ: On metaanalytic assessment of surrogate outcomes.
Biostatistics 2000, 1(3):231246. PubMed Abstract  Publisher Full Text

Kalaian HA, Raudenbush SW: A Multivariate Mixed Linear Model for MetaAnalysis.
Psychological Methods 1996, 1(3):227235. Publisher Full Text

Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR: An evaluation of bivariate randomeffects metaanalysis for the joint synthesis of two correlated outcomes.
Statistics in Medicine 2007, 26:7897. PubMed Abstract  Publisher Full Text

Arends LR, Voko Z, Stijnen T: Combining multiple outcome measures in a metaanalysis: an application.
Stat Med 2003, 22(8):13351353. PubMed Abstract  Publisher Full Text

Chu H, Cole SR: Bivariate metaanalysis for sensitivity and specificity with sparse data: a generalized linear mixed model approach (letter to the Editor).

Glas AS, Roos D, Deutekom M, Zwinderman AH, Bossuyt PM, Kurth KH: Tumor markers in the diagnosis of primary bladder cancer. A systematic review.
J Urol 2003, 169(6):19751982. PubMed Abstract  Publisher Full Text

Hardy RJ, Thompson SG: A likelihood approach to metaanalysis with random effects.
Stat Med 1996, 15(6):619629. PubMed Abstract  Publisher Full Text

Nam IS, Mengersen K, Garthwaite P: Multivariate metaanalysis.
Stat Med 2003, 22(14):23092333. PubMed Abstract  Publisher Full Text

Berrington A, Cox DR: Generalized least squares for the synthesis of correlated information.
Biostatistics 2003, 4(3):423431. PubMed Abstract  Publisher Full Text

Raudenbush SW, Becker BJ, Kalaian H: Modeling multivariate effect sizes.
Psychological Bulletin 1988, 103(1):111120. Publisher Full Text

Gentle JE, Gentle JE: Cholesky Factorization. In Numerical Linear Algebra for Applications in Statistics. Berlin , SpringerVerlag; 1998:9395.

Sohn SY: Multivariate metaanalysis with potentially correlated marketing study results.
Naval Research Logistics 2000, 47:500510. Publisher Full Text

Follmann DA, Proschan MA: Valid inference in random effects metaanalysis.
Biometrics 1999, 55(3):732737. PubMed Abstract  Publisher Full Text

Harbord RM, Deeks JJ, Egger M, Whiting P, Sterne JA: A unification of models for metaanalysis of diagnostic accuracy studies.
Biostatistics 2006. PubMed Abstract  Publisher Full Text

Becker BJ, Hedges LV, Pigott TD: Campbell Collaboration Statistical Analysis Policy Brief.
A Campbell Collaboration resource document (available at http://wwwcampbellcollaborationorg/ECG/policy_statasp) 2004.

Westwood ME, Whiting PF, Kleijnen J: How does study quality affect the results of a diagnostic metaanalysis?
BMC Med Res Methodol 2005, 5(1):20. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Sidik K, Jonkman JN: A comparison of heterogeneity variance estimators in combining results of studies.

Abrams KR, Sutton AJ, Cooper NJ, Sculpher MPS, Ginlley L, Robinson M: Populating economic decision models using metaanalyses of heterogenously reported studies augmented with expert beliefs.
Technical Report 0506, Centre for Biostatistics and Genetic Epidemiology, University of Leicester 2005.

Korn EL, Albert PS, McShane LM: Assessing surrogates as trial endpoints using mixed models.
Stat Med 2005, 24(2):163182. PubMed Abstract  Publisher Full Text

Riley RD, Look MP, Simmonds MC: Combining individual patient data and aggregate data in evidence synthesis: a systematic review identified current practice and possible methods.

Goldstein H, Yang M, Omar RZ, Turner RM, Thompson SG: Metaanalysis using multilevel models with an application to the study of class size effects.

Riley RD, Sutton AJ, Abrams KR, Lambert PC: Sensitivity analyses allowed more appropriate and reliable metaanalysis conclusions for multiple outcomes when missing data was present.
Journal of Clinical Epidemiology 2004, 57(9):911924. Publisher Full Text

Jackson D, Copas J, Sutton AJ: Modelling reporting bias: the operative mortality rate for ruptured abdominal aortic aneurysm repair .
Journal of the Royal Statistical Society, Series A 2005, 168:737752.

Sterne JA, Egger M, Smith GD: Systematic reviews in health care: Investigating and dealing with publication and other biases in metaanalysis.
BMJ 2001, 323(7304):101105. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Hahn S, Williamson PR, Hutton JL, Garner P, Flynn EV: Assessing the potential for bias in metaanalysis due to selective reporting of subgroup analyses within studies.
Stat Med 2000, 19(24):33253336. PubMed Abstract  Publisher Full Text

Hutton JL, Williamson PR: Bias in metaanalysis due to outcome variable selection within studies.

Copas J, Shi JQ: Metaanalysis, funnel plots and sensitivity analysis.
Biostatistics 2000, 1(3):247262. PubMed Abstract  Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: