Department of Pediatrics, The Children's Hospital of Philadelphia and the University of Pennsylvania School of Medicine, 3535 Market St, Room 1531, Philadelphia, PA 19104, USA

Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, Center for Education and Research on Therapeutics, University of Pennsylvania School of Medicine, Blockley Hall, Room 611, Philadelphia, PA 19104, USA

Department of Medical Education and Biomedical Informatics, University of Washington School of Medicine, 1959 NE Pacific St, Room E-312, Seattle, WA 98195, USA

Abstract

Background

Meta-analysis can be used to pool rate measures across studies, but challenges arise when follow-up duration varies. Our objective was to compare different statistical approaches for pooling count data of varying follow-up times in terms of estimates of effect, precision, and clinical interpretability.

Methods

We examined data from a published Cochrane Review of asthma self-management education in children. We selected two rate measures with the largest number of contributing studies: school absences and emergency room (ER) visits. We estimated fixed- and random-effects standardized weighted mean differences (SMD), stratified incidence rate differences (IRD), and stratified incidence rate ratios (IRR). We also fit Poisson regression models, which allowed for further adjustment for clustering by study.

Results

For both outcomes, all methods gave qualitatively similar estimates of effect in favor of the intervention. For school absences, SMD showed modest results in favor of the intervention (SMD -0.14, 95% CI -0.23 to -0.04). IRD implied that the intervention reduced school absences by 1.8 days per year (IRD -0.15 days/child-month, 95% CI -0.19 to -0.11), while IRR suggested a 14% reduction in absences (IRR 0.86, 95% CI 0.83 to 0.90). For ER visits, SMD showed a modest benefit in favor of the intervention (SMD -0.27, 95% CI: -0.45 to -0.09). IRD implied that the intervention reduced ER visits by 1 visit every 2 years (IRD -0.04 visits/child-month, 95% CI: -0.05 to -0.03), while IRR suggested a 34% reduction in ER visits (IRR 0.66, 95% CI 0.59 to 0.74). In Poisson models, adjustment for clustering lowered the precision of the estimates relative to stratified IRR results. For ER visits but not school absences, failure to incorporate study indicators resulted in a different estimate of effect (unadjusted IRR 0.77, 95% CI 0.59 to 0.99).

Conclusions

Choice of method among the ones presented had little effect on inference but affected the clinical interpretability of the findings. Incidence rate methods gave more clinically interpretable results than SMD. Poisson regression allowed for further adjustment for heterogeneity across studies. These data suggest that analysts who want to improve the clinical interpretability of their findings should consider incidence rate methods.

Background

Meta-analysis has become recognized as an objective means of summarizing evidence from disparate clinical trials

At times, data from clinical trials may conform to continuous rate measures (events per person-time) in which the numerator represents a count of total events "x" and the denominator represents a given time duration multiplied by the number of subjects, e.g. health care visits per person-year. Data such as these are being reported more frequently in clinical trials as evidenced by inclusion of rate measures in recent Cochrane Systematic Reviews

In this paper, we examined data from a recently published Cochrane Systematic Review that included continuous rate measures as outcomes. We compared different statistical approaches to pooling continuous rate measures when they were reported with varying follow-up time. Specifically, we examined the SMD, considered the standard approach, to two alternative methods, incidence rate differences and incidence rate ratios. We examined the results from the different approaches in terms of the point estimates of treatment effect, their precision, and clinical interpretability. We are unaware of previously published studies that have attempted to address this problem.

Methods

Data were taken from a recently published Cochrane systematic review on the effects of asthma self-management education in children

The standardized weighted mean difference (SMD) represents a weighted average of the per study difference in mean events per person between treatment and control groups. We first calculated standardized effect sizes for each study by subtracting the reported mean number of events in the control group from the reported mean number of events in the treatment group and dividing by the pooled standard deviation

To estimate stratified incidence rate differences (IRD) and stratified incidence rate ratios (IRR), we calculated incidence rates taking time explicitly into account. For each study, we knew the mean number of events (days absent or emergency room visits) and the number of months of observation according to the reported study design. We multiplied the mean by the sample size for each treatment arm to get the total number of events observed in each arm, e.g. the total number of days absent for all participants in the control group. We rounded this to the nearest whole number of events. To obtain the total person-time of follow-up, we assumed that there was no loss to follow-up during the study, i.e. all participants were observed for the entire length of the study. We multiplied the number of months of follow-up by the sample size for each arm to obtain the total number of person-months of follow-up. The study-specific rate of events per person-month for each arm was then the total number of events (days absent or emergency room visits) divided by the total number of person-months of observation for each arm.

The analysis of the rates used stratified IRD and IRR methods estimated in STATA (version 7). To obtain a summary stratified IRD, we used a program, co-written by one of us (JAB) to implement a fixed-effects Mantel-Haenszel (M-H) procedure in STATA. Specifically, the program produced the estimates of IRD and its variance described in Rothman and Greenland's textbook

To obtain a summary stratified IRR, we used a fixed-effects M-H type procedure as implemented in the "ir" command in STATA, which should give results similar to fitting a Poisson regression model with indicator variables for "study." This M-H approach produces a summary estimate stratified on study. To take study-to-study variability into account, we also fit Poisson regression models allowing for clustering of the data by study, both with and without study indicator variables. The inclusion of indicator variables forces the comparison between treatments to be made within study, thereby mimicking the stratified analysis. In STATA, we also fit Poisson regression models using the "cluster" option, which uses a robust (Huber-White "sandwich") estimator of the variance

Our interest was in comparing the qualitative and, where possible, the quantitative results across the different methods. We were interested in differences in inference that could be made from the various models, which integrate information about the point estimates of treatment effects and the precision of their estimation but may vary in their assumptions. We also compared conclusions as to the heterogeneity of effects across studies. The methods based on weighted averages use a test of heterogeneity similar in principle to the Cochrane Q-statistic. The test for heterogeneity in the Poisson regression models is based on the interactions between the treatment variable and the study indicator variables. Most importantly, we were concerned with the clinical interpretability of the results. All p-values reported are two-sided and all confidence intervals are calculated at the 95% level.

Results

We illustrate the use of SMD, IRD, and IRR methods for pooling continuous rate measures using data from a published Cochrane systematic review and meta-analysis that examined the effect of self-management education on morbidity and health services outcomes in children and adolescents with asthma

Table

Characteristics of Studies Reporting on School Absences.*

**Study**

**
**

**
**

**Duration (Months)**

**Standardized Effect Size****

N

Mean ± SD

Rate

N

Mean ± SD

Rate

Charlton

42

2.10 ± 11.40

0.18

37

4.70 ± 15.50

0.39

12

-0.19

Christiansen

27

2.39 ± 2.90

0.20

15

2.98 ± 3.29

0.25

12

-0.19

Colland

45

0.98 ± 1.56

0.16

34

0.53 ± 1.08

0.09

6

0.32

Dahl

9

0.80 ± 0.32

0.8

10

0.90 ± 0.32

0.9

1

-0.30

Deaves

32

3.69 ± 4.80

0.31

31

5.19 ± 4.80

0.43

12

-0.31

Evans

117

19.40 ± 13.90

1.62

87

19.70 ± 12.60

1.64

12

-0.02

Fireman

13

0.50 ± 5.06

0.04

13

4.60 ± 5.06

0.38

12

-0.78

Hill

211

5.43 ± 4.07

1.36

193

6.23 ± 4.72

1.56

4

-0.18

Hughes

44

10.70 ± 6.90

0.89

45

16.00 ± 15.40

1.33

12

-0.44

Mitchell

133

7.92 ± 16.48

1.32

126

8.48 ± 26.69

1.41

6

-0.03

Perrin

29

0.24 ± 0.90

0.24

27

0.22 ± 1.00

0.22

1

0.02

Persaud

18

6.40 ± 4.60

1.28

18

7.60 ± 5.30

1.52

5

-0.24

Rubin

29

11.90 ± 7.80

0.99

25

15.40 ± 15.00

1.28

12

-0.30

Talabere

25

1.36 ± 2.52

0.45

25

2.60 ± 3.75

0.87

3

-0.38

Toelle

63

2.62 ± 3.28

0.44

51

2.67 ± 3.21

0.45

6

-0.02

Wilson

30

0.80 ± 2.29

0.80

29

1.40 ± 3.23

1.40

1

-0.21

* N refers to the sample size, Mean ± SD refers to the mean number of events ± standard deviation, and rate refers to the total events per person-month. ** Standardized effect size was calculated for each study by subtracting control group mean from intervention group mean and dividing by the pooled SD.

Similarly, table

Characteristics of Studies Reporting on Emergency Room Visits.*

**Study**

**
**

**
**

**Duration (Months)**

**Standardized Effect Size****

**N**

**Mean ± SD**

**Rate**

**N**

**Mean ± SD**

**Rate**

Alexander

11

0.60 ± 0.90

0.05

10

2.40 ± 2.10

0.20

12

-1.09

Christiansen

27

0.30 ± 1.20

0.03

15

0.20 ± 0.43

0.02

12

0.10

Clark

159

1.72 ± 4.20

0.14

73

2.49 ± 6.26

0.21

12

-0.16

Fireman

13

0.08 ± 1.14

0.01

13

1.00 ± 1.14

0.08

12

-0.78

Hughes

44

0.45 ± 1.05

0.04

45

0.60 ± 1.05

0.05

12

-0.14

Lewis

48

2.30 ± 2.98

0.19

28

3.71 ± 2.98

0.31

12

-0.47

McNabb

7

1.90 ± 4.72

0.16

7

7.40 ± 4.72

0.62

12

-1.09

Persaud

18

0.27 ± 0.57

0.05

18

1.00 ± 1.20

0.20

5

-0.76

Ronchetti

114

0.07 ± 0.32

0.01

95

0.23 ± 0.78

0.02

12

-0.28

Shields

101

0.54 ± 1.68

0.05

104

0.38 ± 1.68

0.03

12

0.09

Talabere

25

0.44 ± 0.77

0.15

25

1.08 ± 1.32

0.36

3

-0.58

Toelle

63

1.51 ± 2.31

0.25

51

1.67 ± 2.40

0.28

6

-0.07

* N refers to the sample size, Mean ± SD refers to the mean number of events ± standard deviation, and rate refers to the total events per person-month. ** Standardized effect size was calculated for each study by subtracting control group mean from intervention group mean and dividing by the pooled SD.

Table

Summary Outcome Measures for Days of School Absence.

**Measure**

**Effect Size**

**Confidence Interval**

**Effect Size P-value**

**Homogeneity Test P-value**

SMD^{a}

Fixed-effects

-0.14

-0.23, -0.04

0.006

0.61

Random-effects

-0.14

-0.23, -0.04

0.006

0.61

IRD^{b}

Fixed-effects M-H

-0.15

-0.19, -0.11

<0.001

<0.001

Random-effects

-0.17

-0.25, -0.08

<0.001

<0.001

IRR^{c}

Fixed-effects M-H

0.86

0.83, 0.90

<0.001

<0.001

PR + study indicators

0.86

0.77, 0.97

0.011

<0.001

PR - study indicators

0.86

0.75, 0.99

0.044

N/A

^{a }SMD refers to standardized mean difference and was obtained using both fixed effects and random effects models. ^{b }IRD refers to the incidence rate difference, and was obtained using a Mantel-Haenszel procedure to estimate a fixed-effects model and an inverse-variance method to estimate a random-effects model. ^{c }IRR refers to the incidence rate ratio and was obtained using Mantel-Haenszel procedure to estimate a fixed effects model and Poisson regression models with Huber-White sandwich estimators with and without study indicators which is equivalent to a random-effects model.

Table

Summary Outcome Measures for Emergency Room Visits.

**Measure**

**Effect Size**

**Confidence Interval**

**Effect Size P-value**

**Homogeneity Test P-value**

SMD^{a}

Fixed-effects

-0.21

-0.33, -0.09

<0.001

0.05

Random-effects

-0.27

-0.45, -0.09

0.003

0.05

IRD^{b}

Fixed-effects M-H

-0.04

-0.05, -0.03

<0.001

<0.001

Random-effects

-0.05

-0.08, -0.03

<0.001

<0.001

IRR^{c}

Fixed-effects M-H

0.66

0.59, 0.74

<0.001

<0.001

PR + study indicators

0.66

0.54, 0.81

<0.001

<0.001

PR – study indicators

0.77

0.59, 0.99

0.039

N/A

^{a }SMD refers to standardized mean difference and was obtained using both fixed effects and random effects models. ^{b }IRD refers to the incidence rate difference, and was obtained using a Mantel-Haenszel procedure to estimate a fixed-effects model and an inverse-variance method to estimate a random-effects model. ^{c }IRR refers to the incidence rate ratio and was obtained using Mantel-Haenszel procedure to estimate a fixed effects model and Poisson regression models with Huber-White sandwich estimators with and without study indicators which is equivalent to a random-effects model.

Discussion

This paper presented three statistical methods of pooling continuous rate measures in which the denominator reflects varying duration of observation. All methods were fairly easy to implement using standard statistical software. Results were statistically consistent regardless of the method employed and suggested a significant treatment effect on average. All methods allowed for explicit adjustment for individual studies. Failure to take stratification by study into account, as illustrated in the Poisson models without study indicators, resulted in a different estimate for one outcome, ER visits, but not the other, school absences.

IRD methods gave clinically interpretable results on an absolute scale. These results suggest that treatment results in an average reduction of 0.15 school absences per person-month or roughly 2 days per person-year. These results also suggest that treatment results in an average of 0.04 fewer ER visits per person-month or roughly 1 fewer visit per person every 2 years. IRR methods gave clinically interpretable results on a relative scale. These results suggest that treatment results in a 14% reduction in school absences and a 34% reduction in ER visits.

The SMD results were not immediately clinically interpretable. On a standard deviation scale, these results suggest that treatment results in a modest reduction in school absences and ER visits. Conversion back to the original scale would allow for more clinically interpretable results but would require making an assumption about the size of the standard deviation and the event rate in the control group across studies. For standard deviations, it is not clear whether one should use a study-specific estimate of the standard deviation or an estimate pooled across studies. Additionally, the data can be skewed, in which case mean events might not appropriately represent the central tendency of the data.

Heterogeneity was statistically present for both outcomes, suggesting variability in treatment effects across studies when incidence rate-based methods were used, and for ED visits but not school absences when SMD was used. It should be kept in mind that, although all of these analyses are attempting to address the same underlying substantive question (i.e., whether asthma education "works"), the SMD analyses address this question on a fundamentally different scale by converting measurements into standard deviation units. This difference in scale could well account for the different results of the heterogeneity tests.

Another alternative that we tried but abandoned because of its non-standard nature was simply to convert the time units from the various studies into a common scale and pool the data using WMD. We found (data not shown) slight but noticeable differences depending on whether we multiplied up for the shorter studies or down for the longer studies to achieve the common scale. For example, studies with 6-month follow-up and 12-month follow-up could be put on a common scale, by either multiplying the 6-month study means and standard deviations by 2 or dividing the 12-month study means and standard deviations by 2. These different approaches changed the per-study weights and produced slight differences in summary measures. We believe that the fundamental problem with this approach is that it rests on the assumption that the event rates stay constant over the entire time period of observations. This is also true for the rate models we did use, but unlike those models, multiplying up essentially imputes data beyond the actual period of observation. This has implications not only for the mean number of events, but possibly also for the variance estimates. For these reasons, we chose not to consider this approach any further.

There are limitations to these findings. First, we explored differences in the three approaches using only data from a single systematic review. However, the outcomes we chose had a sufficient number of contributing studies to assess for small differences among the approaches. Second, in the calculation of event rates using the incidence rate-based methods, we assumed complete follow-up of participants in each study. However, this method is robust to incomplete follow-up if the number of events and the amount of time contributed by each participant are known or it can be assumed that individuals lost to follow-up contribute no events or follow-up time and loss to follow-up is not differential between the treatment groups.

Conclusions

In this study, we demonstrated that choice of method among the ones presented here for continuous rate measures had little effect on inference. SMD, IRD, and IRR methods all gave qualitatively similar estimates of effect and suggest that the intervention was effective for both outcomes. However, choice of method clearly affected clinical interpretability. SMD, reportedly the standard method employed for analysis of rate measures of varying time duration, was not immediately interpretable. Stratified IRD allowed for clinical interpretability on an absolute scale. Stratified IRR or Poisson models allowed for clinical interpretability on a relative scale. For further discussion of the merits of absolute versus relative effects, we recommend that the reader consult additional references

Appendix

Table

Example of Confounding by Study.

**Control**

**Treated**

Study

Events

Person-time

Rate

Events

Person-time

Rate

Relative Rate

1

10

100

0.10

5

100

0.05

0.50

2

40

100

0.40

5

25

0.20

0.50

Total (ignoring "study")

50

200

0.25

10

125

0.08

0.32

Within each study, the estimate of the relative risk is 0.5. Thus, any reasonable analysis that takes stratification by study into account (and averages the within-study treatment effects) would necessarily produce an average treatment effect of 0.5. Because of the associations noted above, the analysis ignoring study produces an estimated treatment effect of 0.32. This result clearly is not at all representative of the results within either of the individual studies. Note that this concept is

Competing interests

None declared.

Authors' Contributions

JG conceived of the study, participated in the design and analysis of the study, wrote the manuscript. JB participated in the design of the study, performed the main statistical analysis, and participated in writing the manuscript. FW participated in the design and analysis of the study. All authors read and approved the final manuscript.

Acknowledgments

We would like to thank Doug Altman for his critical review of the manuscript. We would also like to acknowledge Russell Localio for sharing his STATA program on implementing stratified incidence rate differences for fixed- and random-effects models. This paper was presented at the XI Cochrane Colloquium, Barcelona, Spain, on October 31, 2003.

Pre-publication history

The pre-publication history for this paper can be accessed here: