Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada

McMaster University Evidence-based Practice Center, Hamilton, Ontario, Canada

Biostatistics Unit/FSORC 3rd Floor Martha, Room H325, St. Joseph's Healthcare Hamilton, 50 Charlton Avenue East, Hamilton, Ontario, L8N 4A6, Canada

Centre for Evaluation of Medicines, St Joseph’s Healthcare, Hamilton, Ontario, Canada

Population Health Research Institute, Hamilton Health Sciences, Hamilton, Ontario, Canada

Abstracts

Background

The objective of this simulation study is to compare the accuracy and efficiency of population-averaged (i.e. generalized estimating equations (GEE)) and cluster-specific (i.e. random-effects logistic regression (RELR)) models for analyzing data from cluster randomized trials (CRTs) with missing binary responses.

Methods

In this simulation study, clustered responses were generated from a beta-binomial distribution. The number of clusters per trial arm, the number of subjects per cluster, intra-cluster correlation coefficient, and the percentage of missing data were allowed to vary. Under the assumption of covariate dependent missingness, missing outcomes were handled by complete case analysis, standard multiple imputation (MI) and within-cluster MI strategies. Data were analyzed using GEE and RELR. Performance of the methods was assessed using standardized bias, empirical standard error, root mean squared error (RMSE), and coverage probability.

Results

GEE performs well on all four measures — provided the downward bias of the standard error (when the number of clusters per arm is small) is adjusted appropriately — under the following scenarios: complete case analysis for CRTs with a small amount of missing data; standard MI for CRTs with variance inflation factor (VIF) <3; within-cluster MI for CRTs with VIF≥3 and cluster size>50. RELR performs well only when a small amount of data was missing, and complete case analysis was applied.

Conclusion

GEE performs well as long as appropriate missing data strategies are adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR does not perform well when either standard or within-cluster MI strategy is applied prior to the analysis.

Background

Cluster randomized trials (CRTs) are randomized controlled trials in which clusters of subjects rather than independent subjects are randomly allocated to trial arms and outcomes are measured for individual subjects or clusters. CRTs increasingly are being used in health services research and primary care. Reasons for adopting cluster randomization as a more appropriate design include: 1) administrative convenience; 2) ethical considerations; 3) intervention is naturally applied at the cluster level; 4) to enhance the subject compliance; and 5) to minimize the potential treatment “contamination” between the intervention and control subjects ^{2} in a CRT can be expressed as the sum of between-cluster variance _{
B
}
^{2} and within-cluster variance _{
W
}
^{2}. Correspondingly, the ICC is defined as _{
B
}
^{2}/(_{
B
}
^{2} + _{
W
}
^{2}), which is interpreted as the amount of variation that can be explained by variation between clusters. The reduction in efficiency is a function of the variance inflation due to clustering, also known as the design effect or variance inflation factor (VIF), given by

Missing data may be a serious problem in some CRTs due to the lack of direct contact with individual subjects and lengthy follow-up

A key property of CRTs is that inferences or analyses are frequently done to apply at the individual level while randomization is at the cluster level, thus the unit of randomization may be different from the unit of inference or analysis. In this case, the lack of independence among individuals in the same cluster, i.e. the between-cluster variation, presents special methodological challenges that affect both the design and analysis of CRTs. Consequently, standard approaches for statistical analysis do not apply because they may result in severely underpowered studies and spuriously elevated Type I error rates

Some attention has been paid in the literature to the performance of GEE approach and RELR in the analysis of binary outcomes in CRTs. Austin

Methods

The rest of this section is organized as follows: First, the statistical analysis methods (i.e. GEE and RELR) used to analyze binary outcomes in CRTs are described. Second, the missing data strategies used in this study for handling missing binary outcomes are briefly introduced. Third, the method for combining the results across multiply imputed datasets is described.

Statistical analysis methods

Generalized estimating equations

The GEE approach for fitting the logistic regression developed by Liang and Zeger

where _{
ijl
} denotes the binary outcome of patient _{
ijl
} = 1) denotes the corresponding probability of success, _{
ijl
} denotes the corresponding vector of individual-level or cluster level covariates. _{
m arg inal
} denotes the marginal regression coefficients, and

To analyze the data from CRTs, an exchangeable correlation matrix is usually specified to account for potential within-cluster homogeneity in outcomes, and the robust standard error method is used to obtain the improved standard error for estimation of _{marginal}. In this paper, we only include one covariate, treatment group, in the model fitting.

It has been recommended that at least 40 clusters need to be included in a study to ensure the GEE method produces reliable standard errors

Random-effects logistic regression

RELR incorporates cluster-specific random effects into the logistic regression and assumes that the random effects follow a normal distribution. The model can be formulated as

where _{
ij
} ~ _{
B
}
^{2}) represent the random effects, which vary independently from one cluster to another according to a common Normal distribution with a mean of zero and variance of _{
B
}
^{2} , which represents the between-cluster variance. _{conditional} denotes the conditional regression coefficients. Model parameters can be estimated using maximum likelihood

Both GEE and RELR are commonly used statistical analysis methods for analyzing binary outcomes in CRTs _{
m arg inal
} = _{
conditinal
}(1 −

Missing data strategies

In this paper, we consider three strategies to handle missing binary outcomes in CRTs: 1) complete case analysis; 2) standard MI using logistic regression; and 3) within-cluster MI using logistic regression. The performance of GEE method and RELR is compared after missing data are handled by the above strategies.

Complete case analysis has been an attractive method to handle the missing data due to its simplicity. In adopting this strategy, only subjects with complete data are included for analysis, while subjects with missing data are excluded.

MI is widely applied to missing data problems. Rubin

The standard MI using logistic regression method is now described in detail. The Within-cluster MI strategy is consists of applying the standard MI method to impute missing data for each cluster independently.

Standard multiple imputation using logistic regression is implemented through the following steps:

First, fit a logistic regression using the observed outcome and covariates to obtain the posterior predictive distribution of the parameters:

where _{
obs
} is the observed binary outcome of a subject, _{
i
}, ^{
th
} individual or cluster-level covariate of the corresponding subject (two covariates are included in this study: treatment group and the variable associated with the missingness), _{0}, _{1}, …, _{
k
}) denotes the regression coefficients. The regression parameter estimates

Second, draw new parameters _{
h
}
^{′}
_{
h
}, and

Third, for each subject with a missing outcome _{
mis
} and observed covariates _{1}, …, _{
k
}, compute _{
mis
} = 1.

Fourth, draw a random Uniform variate _{
mis
} = 1, otherwise, impute _{
mis
} = 0.

The above steps imply two assumptions: first, subjects are independent, which essentially ignores the similarity of subjects from the same cluster and; second, the missing data are imputed based on the PA treatment effect.

Combination of results from different imputed data sets

Suppose ^{(1)}, ^{(2)}, …, and^{(M)} with corresponding variance estimates ^{(1)}, ^{(2)}, …, and ^{(M)} are obtained after GEE or RELR are applied to the multiple imputed datasets. The pooled treatment effect estimate from MI is calculated as _{
com
} is the degree of freedom for the complete data test; for example, if there are _{
com
} = 2(

Simulation study

The schematic overview of the simulation study is illustrated in Figure

Schematic overview of the simulation study

**Schematic overview of the simulation study.** Abbreviations: MI, multiple imputation; GEE, generalized estimating equations; RELR, random-effects logistic regression; SB, standardized bias; ESE, empirical standard error; RMSE, root mean square error; obs., observations.

According to the review of CRTs in primary care by Eldrige

(1) For CRTs with 5 clusters per arm (S-design) and 500 subjects per cluster, ICC was set to be 0.001, 0.01 or 0.05.

(2) For CRTs with 20 clusters per arm (L-design) and 50 subjects per cluster, ICC was set to be 0.01, 0.05, or 0.1.

(3) For CRTs with 30 clusters per arm (L-design) and 30 subjects per cluster, ICC was set to be 0.05, 0.1, or 0.2.

Only two-arm, balanced, and completely randomized CRTs are considered in this study. The clustered binomial responses are generated using a beta-binomial distribution

Four quantities are chosen to evaluate the performance of GEE method and RELR: 1) standardized bias calculated as

Results

Empirical standard error

The empirical standard errors from GEE method and RELR for different design scenarios are presented in Table

**Design of CRTs**

**VIF**
^{
4
}

**% of missing data**

**Complete case analysis**

**Standard MI**
^{
5
}

**Within-cluster MI**
^{
6
}

**m**
^{
1
}

**n**
^{
2
}

**ρ**
^{
3
}

**GEE**
^{
7
}

**RELR**
^{
8
}

**GEE**

**RELR**

**GEE**

**RELR**

Empirical standard error is defined as the average of standard errors of the estimated treatment effects across all simulation replications. The empirical standard errors obtained when 0% data are missing are considered as references for comparing with those obtained when 15% or 30% data are missing.

Note: 1. m: Number of clusters per trial arm. 2. n: Number of subjects per cluster.

3. ρ: intracluster correlation coefficient; 4. VIF: Variance inflation factor, i.e. 1+(m-1)ρ; 5. Standard MI: Standard multiple imputation using logistic regression method.

6. Within-cluster MI: Within-cluster multiple imputation using logistic regression method, which is not applicable (NA) for some L-design of cluster randomized trials.

7. GEE: Generalized estimating equations. 8. RELR: Random-effects logistic regression.

9. For CRTs with 5 clusters per arm, modified standard errors are provided.

5 ^{
9
} (S-Design)

500

0.001

1.499

0%

0.07

0.10

15%

0.08

0.11

0.08

0.07

0.08

0.08

30%

0.08

0.12

0.08

0.08

0.10

0.09

0.01

5.99

0%

0.15

0.12

15%

0.15

0.13

0.13

0.12

0.16

0.14

30%

0.15

0.15

0.12

0.11

0.16

0.15

0.05

25.95

0%

0.30

0.15

15%

0.30

0.16

0.26

0.24

0.30

0.28

30%

0.30

0.16

0.22

0.20

0.30

0.29

20 (L-Design)

50

0.01

1.49

0%

0.11

0.17

15%

0.11

0.17

0.12

0.12

0.13

0.13

30%

0.12

0.19

0.12

0.13

0.15

0.16

0.05

3.45

0%

0.17

0.31

15%

0.17

0.34

0.16

0.16

0.18

0.19

30%

0.18

0.39

0.15

0.16

0.20

0.21

0.1

5.90

0%

0.22

0.18

15%

0.23

0.21

0.20

0.22

0.23

0.26

30%

0.23

0.22

0.18

0.19

NA

NA

30 (L-Design)

30

0.05

2.45

0%

0.15

0.28

15%

0.16

0.33

0.15

0.15

0.17

0.18

30%

0.17

0.37

0.15

0.15

NA

NA

0.1

3.90

0%

0.19

0.33

15%

0.20

0.38

0.18

0.19

NA

NA

30%

0.20

0.42

0.17

0.18

NA

NA

0.2

6.80

0%

0.26

0.38

15%

0.26

0.40

0.23

0.27

NA

NA

30%

0.26

0.44

0.21

0.23

NA

NA

Comparison of empirical standard error

**Comparison of empirical standard error.**

When standard MI was used to impute missing data, empirical standard errors from the GEE method were acceptable for CRTs with VIF<3 in terms of yielding similar or slightly larger empirical standard errors compared to those obtained from analyzing the complete data. However, they were underestimated for CRTs with VIF≥3. This is because standard MI strategy assumes data are independent, and cluster effect may be safely ignored for CRTs with VIF<3 when imputing missing data. In contrast, empirical standard errors from RELR were not similar as those obtained from analyzing complete data. This is because that the imputed datasets were obtained based on the estimated PA treatment effect and corresponding underestimated standard error, which led to a difference between the standard error estimated from RELR based on the imputed datasets and that based on the complete data.

Within-cluster MI was not applicable for L-design of CRTs, which usually had a small cluster size, since all outcomes in a cluster were missing or all observed outcomes had identical values, which caused the imputation procedure to fail. In the cases when within-cluster MI was applicable and used to impute the missing data, empirical standard errors from GEE method were acceptable for CRTs with VIF≥3; however, for CRTs with VIF<3, empirical standard errors were inflated. This is because when within-cluster MI was used to impute the missing data, the clustering effects were accounted for by imputing missing data based on the observed information within the same cluster as the missing data, therefore, the empirical standard errors for GEE were acceptable for CRTs with VIF≥3. The empirical standard errors from RELR were acceptable only when the cluster size is large (>50) and the ICC is small (≤0.01).

Standardized bias

The standardized biases from GEE method and RELR for different design scenarios are presented in Table

**Design of CRTs**

**VIF**^{4
}

**% of missing data**

**Complete case analysis**

**Standard MI**^{5
}

**Within-cluster MI**^{6
}

**m**^{1
}

**n**^{2
}

**ρ**^{3
}

**GEE**^{7
}

**RELR**^{8
}

**GEE**

**RELR**

**GEE**

**RELR**

Standardized bias is defined as the difference between the expectation of the estimator and the parameter, divided by the standard deviation of the estimator. Standardized biases obtained when 0% data are missing are considered as references for comparing with those obtained when 15% or 30% data are missing.

Note: 1. m: Number of clusters per trial arm. 2. n: Number of subjects per cluster.

3. ρ: Intracluster correlation coefficient; 4. VIF: Variance inflation factor, i.e. 1+(m-1)ρ; 5. Standard MI: Standard multiple imputation using logistic regression method.

6. Within-cluster MI: Within-cluster multiple imputation using logistic regression method, which is not applicable (NA) for some L-design of cluster randomized trials.

7. GEE: Generalized estimating equations. 8. RELR: Random-effects logistic regression.

9. For CRTs with 5 clusters per arm, modified standard errors are provided.

5 ^{
9
} (S-Design)

500

0.001

1.499

0%

0.02

0.73

15%

0.03

0.71

0.02

0.17

0.03

0.15

30%

0.01

0.63

0.00

0.18

0.00

0.08

0.01

5.99

0%

0.01

0.34

15%

0.00

0.33

0.00

0.02

0.00

0.03

30%

0.00

0.32

0.00

0.01

0.00

0.03

0.05

25.95

0%

0.02

0.15

15%

0.02

0.15

0.02

0.08

0.03

0.10

30%

0.02

0.14

0.01

0.05

0.02

0.09

20 (L-Design)

50

0.01

1.49

0%

0.04

0.38

15%

0.04

0.37

0.03

0.04

0.06

0.01

30%

0.04

0.36

0.03

0.05

0.08

0.01

0.05

3.45

0%

0.01

0.26

15%

0.00

0.24

0.00

0.09

0.03

0.11

30%

0.02

0.13

0.01

0.06

0.03

0.12

0.1

5.90

0%

0.02

0.20

15%

0.01

0.19

0.01

0.15

0.05

0.16

30%

0.01

0.19

0.01

0.10

NA

NA

30 (L-Design)

30

0.05

2.45

0%

0.02

0.33

15%

0.02

0.32

0.02

0.12

0.02

0.15

30%

0.01

0.14

0.00

0.06

NA

NA

0.1

3.90

0%

0.01

0.23

15%

0.01

0.23

0.01

0.18

NA

NA

30%

0.02

0.23

0.02

0.13

NA

NA

0.2

6.80

0%

0.01

0.16

15%

0.00

0.15

0.00

0.14

NA

NA

30%

0.01

0.15

0.00

0.16

NA

NA

Comparison of standardized bias

**Comparison of standardized bias.**

The magnitude of standardized bias was dependent on the original data structure, i.e. how the data were generated, how the missing data were handled, and which statistical model was used for analysis. As described in the previous section, the clustered binary data were generated using a beta-binomial distribution, which assumed a PA treatment effect. Since complete case analysis did not change the original data structure under the assumption of CDM, the PA and CS treatment effects estimated from the GEE and RELR were quite consistent with those estimated based on complete data (i.e. datasets without missing values). The relationship between the PA and the CS treatment effects estimated from GEE method and RELR respectively still held; however, when either standard MI or within-cluster MI was used, the imputed values were obtained based on the estimated PA treatment effect and corresponding underestimated standard error, which largely distorted the CS treatment effects estimated from RELR compared with those estimated based on complete data.

Root mean squared error

The RMSE incorporates both the variance of the estimator and its bias, and measures the overall accuracy of the point estimator. RMSEs from GEE method and RELR for different design scenarios are presented in Table

**Design of CRTs**

**VIF**^{4
}

**% of missing data**

**Complete case analysis**

**Standard MI**^{5
}

**Within-cluster MI**^{6
}

**m**^{1
}

**n**^{2
}

**ρ**^{3
}

**GEE**^{7
}

**RELR**^{8
}

**GEE**

**RELR**

**GEE**

**RELR**

Root mean squared error is defined as the square root of the mean squared error, which is the average squared difference between the estimated treatment effect and the true parameter. The root mean squared errors obtained when 0% data are missing are considered as references for comparing with those obtained when 15% or 30% data are missing.

Note: 1. m: Number of clusters per trial arm. 2. n: Number of subjects per cluster.

3. ρ: Intracluster correlation coefficient; 4. VIF: Variance inflation factor, i.e. 1+(m-1)ρ; 5. Standard MI: Standard multiple imputation using logistic regression method.

6. Within-cluster MI: Within-cluster multiple imputation using logistic regression method, which is not applicable (NA) for some L-design of cluster randomized trials.

7. GEE: Generalized estimating equations. 8. RELR: Random-effects logistic regression.

9. For CRTs with 5 clusters per arm, modified standard errors are provided.

5 ^{
9
} (S-Design)

500

0.001

1.499

0%

0.07

0.10

15%

0.08

0.10

0.08

0.06

0.08

0.06

30%

0.08

0.11

0.08

0.07

0.09

0.08

0.01

5.99

0%

0.14

0.17

15%

0.14

0.17

0.15

0.15

0.15

0.15

30%

0.15

0.17

0.15

0.15

0.15

0.15

0.05

25.95

0%

0.31

0.34

15%

0.31

0.34

0.31

0.32

0.31

0.33

30%

0.31

0.34

0.31

0.32

0.31

0.33

20 (L-Design)

50

0.01

1.49

0%

0.11

0.13

15%

0.11

0.13

0.12

0.12

0.12

0.12

30%

0.12

0.14

0.14

0.12

0.13

0.13

0.05

3.45

0%

0.18

0.20

15%

0.18

0.21

0.18

0.19

0.18

0.19

30%

0.19

0.20

0.19

0.20

0.19

0.20

0.1

5.90

0%

0.24

0.26

15%

0.24

0.27

0.24

0.26

0.24

0.27

30%

0.25

0.27

0.25

0.26

NA

NA

30 (L-Design)

30

0.05

2.45

0%

0.15

0.17

15%

0.16

0.18

0.16

0.16

0.15

0.17

30%

0.16

0.17

0.16

0.17

NA

NA

0.1

3.90

0%

0.20

0.21

15%

0.20

0.22

0.20

0.22

NA

NA

30%

0.20

0.23

0.21

0.22

NA

NA

0.2

6.80

0%

0.27

0.30

15%

0.27

0.30

0.28

0.33

NA

NA

30%

0.28

0.30

0.28

0.31

NA

NA

Comparison of root mean squared error

**Comparison of root mean squared error.**

When standard MI was used to impute missing data, RMSEs from GEE method increased with the percentage of missing data. With no larger than 15% missing data, the increase of RMSEs from GEE compared to those obtained based on complete data was not substantial. When the amount of missing values increased to 30%, RMSEs from the GEE method increased substantially for CRTs with a small design effect (VIF<3). In contrast, RMSEs from RELR method were much smaller than those obtained from analyzing complete data for most of the design scenarios. We should note that the small RMSE for RELR here was not an indication of more accurate or precise estimate for the treatment effect, but rather a result of biased CS treatment effects and the corresponding underestimated standard error.

When within-cluster MI was used to impute missing data, the same pattern for RMSEs from both GEE and RELR was observed as when standard MI was used to impute missing data.

Coverage probability

Table

**Design of CRTs**

**VIF**^{4
}

**% of missing data**

**Complete case analysis**

**Standard MI**^{5
}

**Within-cluster MI**^{6
}

**m**^{1
}

**n**^{2
}

**ρ**^{3
}

**GEE**^{7
}

**RELR**^{8
}

**GEE**

**RELR**

**GEE**

**RELR**

Coverage probability is defined as the proportion of times that the nominal 95% confidence interval contains the true treatment effect across all simulation replications. Coverage probabilities obtained when 0% data are missing are considered as references for comparing with those obtained when 15% or 30% data are missing.

Note: 1. m: Number of clusters per trial arm. 2. n: Number of subjects per cluster.

3. ρ: Intra-cluster correlation coefficient; 4. VIF: Variance inflation factor, i.e. 1+(m-1)ρ; 5. Standard MI: Standard multiple imputation using logistic regression method.

7. GEE: Generalized estimating equations. 8. RELR: Random-effects logistic regression.

9. For CRTs with 5 clusters per arm, modified standard errors are provided.

5 ^{
9
} (S-Design)

500

0.001

1.499

0%

0.91

0.96

15%

0.92

0.97

0.93

0.97

1.00

0.99

30%

0.93

0.97

0.95

0.98

1.00

0.99

0.01

5.99

0%

0.92

0.79

15%

0.92

0.81

0.90

0.87

0.95

0.91

30%

0.94

0.84

0.88

0.84

0.98

0.93

0.05

25.95

0%

0.91

0.49

15%

0.91

0.52

0.89

0.83

0.93

0.89

30%

0.93

0.52

0.83

0.77

0.96

0.90

20 (L-Design)

50

0.01

1.49

0%

0.94

0.98

15%

0.94

0.98

0.93

0.95

0.96

0.97

30%

0.94

0.98

0.92

0.96

0.98

0.98

0.05

3.45

0%

0.93

0.91

15%

0.93

0.92

0.90

0.89

0.94

0.94

30%

0.93

0.93

0.87

0.88

0.95

0.96

0.1

5.90

0%

0.93

0.78

15%

0.93

0.82

0.89

0.88

0.93

0.93

30%

0.92

0.83

0.85

0.85

NA

NA

30 (L-Design)

30

0.05

2.45

0%

0.95

0.95

15%

0.96

0.96

0.93

0.93

0.97

0.96

30%

0.95

0.96

0.91

0.92

NA

NA

0.1

3.90

0%

0.95

0.91

15%

0.95

0.93

0.92

0.92

NA

NA

30%

0.95

0.94

0.89

0.90

NA

NA

0.2

6.80

0%

0.94

0.79

15%

0.94

0.81

0.90

0.89

NA

NA

30%

0.94

0.85

0.85

0.85

NA

NA

Comparison of coverage probability

**Comparison of coverage probability.**

When standard MI was used to impute missing data, coverage probabilities from GEE method increased for CRTs with a small design effect but decreased for CRTs with a large design effect. Coverage probabilities from RELR increased for almost all designs of CRTs compared to those obtained by analyzing complete data using the same statistical analysis method. When within-cluster MI was used to impute missing data, the same pattern for the coverage probabilities from both GEE and RELR was observed as when standard MI was used to impute missing data. It should be noted that the higher coverage from RELR when either standard or within-cluster MI strategy was applied prior to the analysis was not an indication of high efficiency, but rather a result of biased CS treatment effects and the corresponding underestimated standard effort.

We noticed that the coverage probabilities from GEE were larger than the nominal level when within-cluster MI is applied prior to the analysis for CRTs with a small design effect and a large percentage of missing data. This is because within-cluster MI tends to provide larger standard errors of the estimated treatment effects (i.e. wider 95% confidence interval).

Convergence problems

For the GEE method, at most 1 out of 1000 simulated datasets with S-design could not converge to a solution because they either encountered a non-positive definite matrix in the iterations or because there was no variation between the clusters in each arm. No convergence problems occurred for the simulated datasets based on the L-design. Lack of convergence was encountered more often for RELR than GEE. About 10 out of 1000 simulated datasets for some designs of CRTs could not converge for RELR due to negative estimates of between-cluster variance component during iteration.

Discussion

In this paper, we compared the accuracy and efficiency of PA and CS models through a simulation study, in particular, the GEE method and the RELR respectively, for analyzing binary outcomes in CRTs with missing data. Results from the present simulation study show that under the assumption of CDM, the GEE method performs well as long as an appropriate strategy is applied to handle missing data based on the percentage of missing data and the design of CRTs. The appropriate strategy in this instance is using complete case analysis for any CRTs with a small percentage of missing outcomes (<15%), using standard MI to impute missing outcomes for CRTs with a small design effect (VIF<3), or within-cluster MI to impute missing outcomes for CRTs with a large design effect (VIF≥3) and cluster size (>50). In contrast, the RELR performs poorly when either standard or within-cluster MI strategy is used to impute missing data prior to the analysis.

Results from the present comprehensive simulation study also imply that MI using random-effects logistic regression may not appropriate for imputing binary outcomes in CRTs. This is because that if the underlying data structure assumes a PA treatment effect, the MI using random-effects logistic regression, which impute missing data based on the CS treatment effect, may distort the original data structure and lead to invalid inference. Moreover, the convergence problems will greatly hinder the application of this method for imputing missing binary data. This implication seems to be in contradiction with current literature: for example, Taljaard

MI has been accepted as a solution for missing data problems in many settings. Both GEE and RELR are commonly used for analyzing binary data in CRTs

There are certain limitations to the current study. First, performance of the marginal model and cluster-specific model was assessed only for CRTs with a completely randomized design. Other designs such as the matched pairs design and stratified randomized design are also used for CRTs but were not considered in this study. Second, only CRTs with balanced design were considered; however, settings found more often in empirical situations, such as unequal numbers of subjects per cluster, or unequal number of clusters in each trial arm, were not considered in this study. These design restrictions were made to understand the performance of the methods in simple scenarios. Further research is required to assess whether our findings are relevant to more general settings. Third, there are two main approaches in handling missing data: likelihood based analyses and imputation

Conclusions

Under the assumption of CDM, GEE method performs well as long as an appropriate missing data strategy is adopted based on the design of CRTs and the percentage of missing data. In contrast, RELR dose not perform well when either standard or within-cluster MI strategy is applied to impute missing data prior to the analysis.

Abbreviation

CRTs: Cluster randomized trials; ICC: Intracluster correlation coefficient; VIF: Variance inflation factor; PA: Population-averaged; CS: Cluster-specific; GEE: Generalized estimating equations; RELR: Random-effects logistic regression; RMSE: Root mean squared error; CDM: Covariate dependent missingness.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JM, PR, JB, and LT conceived the research question. JM conducted literature review, designed and implemented the simulation study, composed the initial draft of the manuscript, and revised the manuscript. LT oversaw the design and implementation of the study, revised the manuscript. PR and JB provided assistance with design of the simulation study. All authors read and approved the final manuscript.

Acknowledgement

This study was supported in part by funds from the Canadian Network and Centre for Trials Internationally (CANNeCTIN) program and the Drug Safety and Effectiveness Cross-Disciplinary Training (DSECT) program in the form of training awards for the first author. No additional external funding received for this study. Dr. Lehana Thabane is a clinical trials mentor for the Canadian Institutes of Health Research (CIHR). Dr. Parminder Raina holds a Raymond and Margaret Labarge Chair in Research and Knowledge Application for Optimal Aging, and the Canada Research Chair in GeroScience.

Pre-publication history

The pre-publication history for this paper can be accessed here: