Department of Mathematics and Statistics, Smith College, Northampton, MA, USA

Clinical Addiction Research and Education (CARE) Unit, Section of General Internal Medicine, Boston Medical Center and Boston University School of Medicine, Boston, MA, USA

Youth Alcohol Prevention Center, Boston University School of Public Health, Boston, MA, USA

Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA

Abstract

Background

Alcohol consumption is commonly used as a primary outcome in randomized alcohol treatment studies. The distribution of alcohol consumption is highly skewed, particularly in subjects with alcohol dependence.

Methods

In this paper, we will consider the use of count models for outcomes in a randomized clinical trial setting. These include the Poisson, over-dispersed Poisson, negative binomial, zero-inflated Poisson and zero-inflated negative binomial. We compare the Type-I error rate of these methods in a series of simulation studies of a randomized clinical trial, and apply the methods to the ASAP (Addressing the Spectrum of Alcohol Problems) trial.

Results

Standard Poisson models provide a poor fit for alcohol consumption data from our motivating example, and did not preserve Type-I error rates for the randomized group comparison when the true distribution was over-dispersed Poisson. For the ASAP trial, where the distribution of alcohol consumption featured extensive over-dispersion, there was little indication of significant randomization group differences, except when the standard Poisson model was fit.

Conclusion

As with any analysis, it is important to choose appropriate statistical models. In simulation studies and in the motivating example, the standard Poisson was not robust when fit to over-dispersed count data, and did not maintain the appropriate Type-I error rate. To appropriately model alcohol consumption, more flexible count models should be routinely employed.

Background

Count outcomes are common in randomized studies of alcohol treatment. Subjects may be queried about their daily consumption of alcohol, measured as a number of drinks over a recent period

A challenge in modeling consumption outcomes is to appropriately account for the distribution of drinking. These distributions are characterized by a large number of zeros (abstinent subjects) along with a long right tail (heavy drinking subjects). An extensive literature describes models for counts

Our methods are motivated by the analysis of the ASAP (Addressing the Spectrum of Alcohol Problems) study, a randomized clinical trial comparing a brief motivational interview to usual care for a sample of inpatients with unhealthy alcohol use at an urban hospital

In this paper, we will demonstrate the limitations of the standard Poisson model in the presence of over-dispersion. We begin by describing several count models for alcohol outcomes, compare their performance in a series of simulated randomized trials, apply them to the ASAP study, and conclude with some general recommendations.

Methods

Statistical methods for the analysis of count outcomes

We begin by introducing notation to be used throughout. Let _{ij }denote the number of events for the _{i}) in the _{i }is the number of subjects in the _{1 }and _{2 }are approximately equal.

The Poisson distribution is one of the simplest models for count data. Let _{ij }indicate the average number of events (in this case drinks consumed) in a given time interval for subject _{ij }= _{ij}) is the probability of observing

for _{i }where _{ij }> 0 and we assume that _{ij }= _{i }for all _{ij}] = _{ij}) = _{ij }for all _{i }is given by _{i}. In this setting, the test of randomized group effects for the Poisson model is a test of the null hypothesis that _{1 }= _{2}.

One limitation of this model is that it may be overly simplistic and may not provide an adequate fit to consumption data of the type that we consider. The constraint that the variance is equal to the mean may lead to incorrect test results.

Consider as an example the data from the ASAP study control group at 3 months. For this dataset, non-integer count values are possible. These arise when subjects consume a number of drinks not divisible by 30 (in the case of 30-day assessments). One approach in this situation would be to model the number of drinks consumed in a 30 day period, or utilize the non-integer values. Sometimes even the 30 day value is non-integer because people report a drink size that is then translated into standard drinks. The maximum likelihood estimates of the probability distributions remains the same for non-integer values, though it is necessary to move each non-integer observed value to the next integer (using a ceiling function) to be plotted. For the models that we discuss, we can plug non-integer values into the software and still get sensible results.

Figure _{1 }= _{1 }= 4.98 using the prcounts routine in Stata

Observed value of drinks per day for the control group of the ASAP study at 3 months, plus the estimated Poisson fit to these data (_{1 }= 4.98)

Observed value of drinks per day for the control group of the ASAP study at 3 months, plus the estimated Poisson fit to these data (_{1 }= 4.98).

One approach to loosen the restrictive variance assumption involves use of an empirical (or

Another approach is to fit a negative binomial (two parameter) count model (NB)

where Γ(·) denotes the Gamma function, _{i }> 0 and _{i }> 0. We note that _{ij}] = _{i }and _{ij}) = _{i }+ _{i }= _{i }* (1 + _{i }* _{i}) for all _{ij}) > _{ij}]. It can be shown that the negative binomial can be derived in terms of a Poisson random variable where the parameter _{i }varies according to a gamma distribution.

The negative binomial model is attractive because it allows the relaxation of strong assumptions regarding the relationship between the mean and the variance. This flexibility comes at some cost, since a two-parameter model is inherently more complicated to interpret.

Other models have been proposed that allow for an extra abundance of subjects with no consumption. In alcohol consumption outcomes, there may be subjects who are "non-susceptible" (e.g. abstinent). These "zero-inflation" (or "hurdle") models account for subjects who are structural zeros (e.g., abstinent subjects thought of as "non-susceptible")

Zero-inflated Poisson (ZIP) models _{i }that governs the proportion of non-susceptible subjects in the

for 0 <_{i }< 1 and _{i }> 0 where _{i}) and _{i }) * _{i})) for abstainers and drinkers who didn't drink during the reporting period, respectively, it can incorporate an overabundance of zeros

In many settings, the assumption that after accounting for the zeros the remaining counts are Poisson may not be tenable. The zero-inflated negative binomial (ZINB) allows for over-dispersion in this manner, though at the cost of more parameters.

Another approach to the modeling of count data involves use of a linear model (assuming that the observations are approximately Gaussian). While this is an extremely flexible model that is typically robust to misspecification (since the mean and variance are not linked), the linear model is less attractive because it may predict negative values of drinking given the skewness of the distribution. Use of a linear model is also inefficient if the variance is a function of the mean.

Simulation study

To better understand the behavior of these methods in a known situation, we conducted a series of simulation studies with parameters derived from the motivating example. These simulation studies were designed to address the question of whether or not the models were robust to misspecification of the underlying count distribution. More formally, we wanted to assess whether these models preserved the appropriate Type-I error rate (the probability of rejecting the null hypothesis when it is true) when there are no true differences between groups (i.e. do they reject the null at the appropriate

For each set of parameters within a simulation, 100 observations were generated in each of two groups, to mimic a randomized clinical trial setting. The amount of alcohol consumption, in drinks per day was the outcome. For each simulated dataset a series of models (Poisson, negative binomial and zero-inflated Poisson) were fit. This process was repeated 2500 times for each set of parameters, where _{i}] =

Graphical display of the five distributions, all with rate parameter 5, used in the simulations (Poisson [Var = 5], negative binomial [NB13, Var = 13], negative binomial [NB40, Var = 40], negative binomial [NB 70, Var = 70] and zero-inflated Poisson [ZIP, p = 0.2, Var = 8])

Graphical display of the five distributions, all with rate parameter 5, used in the simulations (Poisson [Var = 5], negative binomial [NB13, Var = 13], negative binomial [NB40, Var = 40], negative binomial [NB 70, Var = 70] and zero-inflated Poisson [ZIP, p = 0.2, Var = 8]).

ASAP study

The ASAP study was a randomized clinical trial of the effectiveness of a brief motivational intervention

Follow-up was planned at 3-month and 12-month timepoints. Because the subjects came from a transient and hard-to-reach population, the researchers employed exhaustive techniques to track subjects over the follow-up period. The two primary alcohol-related outcomes were measures of alcohol consumption and linkage to appropriate alcohol treatment; for these secondary analyses we focus solely on treatment differences in alcohol consumption. The outcome of interest was the average number of standard drinks consumed per day in the past thirty days as reported using the Timeline Followback method

Eight models were fit comparing treatment to control for the ASAP study:

**Poisson **standard Poisson model,

**Over-dispersed Poisson **Poisson model with empirical ("robust") variance estimator,

**NB **negative binomial,

**ZIP **zero-inflated Poisson, shared inflation parameter estimated for both randomized groups (_{1 }= _{2}),

**ZINB **zero-inflated negative binomial, shared inflation parameter estimated for both randomized groups (_{1 }= _{2}),

**TTEST **two-sample unequal variance t-test,

**WILCOXON **Wilcoxon-Mann-Whitney, a non-parametric two-sample comparison procedure suitable for ordinal data, and

**PERMUTE **two-sample permutation test.

Results

Simulation studies

In the simulation studies we assessed the behavior of models when the null hypothesis was true (there were no differences between alcohol consumption for groups 1 and 2). We note that the ZIP model failed to converge for more than a quarter of the simulations from the standard Poisson distribution. This is likely due to the fact that many datasets had no zeros whatsoever (for the Poisson distribution with ^{100 }= 0.51).

Table _{i}) > 2 * _{i}]), the Poisson model rejected more than 22% of the time. When the over-dispersion was more extreme (factor of 8 and 14), the Type I error rate was 47% and 58%, respectively. The severe lack of robustness of the Poisson model in this setting is a serious concern.

Estimated probability (and 99% CI) of rejecting the null hypothesis when there is no true difference between groups for a variety of statistical models and underlying distributions (results that do not include the alpha level of 0.05 are bolded)

Analysis model fit

True Distribution:

Poisson

ODP

NB

ZIP

Poisson (Var = 5)

.053 (.041,.064)

.054 (.042,.066)

.047 (.036,.058)

.055* (.043,.067)

NB (Var = 13)

**.225 **(.204,.247)

.049 (.038,.060)

.049 (.038,.060)

.050 (.039,.061)

NB (Var = 40)

**.467 **(.441,.493)

.047 (.036,.058)

.044 (.033,.055)

.046 (.036,.057)

NB (Var = 70)

**.584 **(.558,.609)

.052 (.041,.063)

.048 (.037,.059)

.062 (.049,.074)

ZIP (Var = 8)

**.179 **(.159,.199)

.058 (.046,.070)

**.031 **(.022,.040)

.051 (.040,.063)

all distributions except ZIP have _{i}] = _{i}] = 0.8 * 5 = 4.

ODP (over-dispersed Poisson); NB (negative binomial); ZIP (zero-inflated Poisson)

* For the true distribution under the Poisson, the ZIP model failed to converge for n = 672 of the simulations.

ASAP study

Of 341 subjects enrolled in the clinical trial, 169 subjects were randomized to the control group and the other 172 into the intervention group. The mean age of the subjects was 44.3 (SD = 10.7). Twenty-nine percent were women, 45% were Black, 39% White, 9% Hispanic, and 7% Other. Sixty-three percent were unemployed during the past three months and 25% of the subjects were homeless at one point during the past three months. Four percent of the subjects met criteria for current (past year) alcohol abuse and 77% were alcohol dependent.

We analyze the 3-month follow-up data for which 271 subjects were observed (141 control, 130 treatment), for an overall response rate of 79%. Table

Distribution of drinking outcome by timepoint and randomization group

Base line

3 Months

C (n = 169)

T (n = 72)

C (n = 141)

T (n = 130)

MIN

0.17

0

0

0

25th percentile

1.14

1.32

0.17

0.13

MEDIAN

3.47

3.85

1.8

1.6

75th percentile

8.23

9.12

6.1

5.7

MAX

61.77

60

48.6

38.43

mean (SD)

6.95 (9.58)

6.68 (8.44)

4.98 (8.47)

4.36 (6.47)

Table

p-values for the ASAP randomization group effect at 3 months for a variety of count models

MODEL

p-value

Poisson

.018

over-dispersed Poisson

.489

Negative binomial

.458

zero-inflated Poisson

.542

zero-inflated negative binomial

.489

t-test

.495

Wilcoxon

.805

Permutation

.746

Figure

Observed and predicted values from the ASAP study at 3 months for control and treatment groups for each of four models: Wilcoxon, Poisson, negative binomial and zero-inflated Poisson

Observed and predicted values from the ASAP study at 3 months for control and treatment groups for each of four models: Wilcoxon, Poisson, negative binomial and zero-inflated Poisson.

Observed minus expected values from the ASAP study at 3 months as a function of count for the Poisson, negative binomial and zero-inflated Poisson

Observed minus expected values from the ASAP study at 3 months as a function of count for the Poisson, negative binomial and zero-inflated Poisson.

In this setting, there was little indication from the observed plots that there were significant group differences. As seen in the simulation studies, the Poisson may not have preserved the appropriate Type I error rate due to the extremely large values of drinking for some subjects. The Appendix includes the Stata commands to fit these models and the output, along with the code to generate observed and predicted plots using the prcounts routine.

Discussion and conclusion

A number of models have been proposed for the analysis of count data, and these models are now available in general purpose statistical packages. We have described these methods in the context of modeling reports of alcohol consumption, where a large proportion of respondents report no drinking, and a small number of respondents typically account for an extreme amount of drinking.

For the analysis of the ASAP study, we found that the standard Poisson had an extremely poor fit, and yielded a statistically significant p-value (in contrast to all of the other models, which had highly non-significant results). The unrealistic assumption that the expected rate of drinking is the same for all subjects may partially account for the poor fit of the Poisson distribution. We caution against use of the Poisson for this analysis. The negative binomial fit particularly well, and we saw no evidence for zero-inflation.

In settings where there are excess zeros, zero-inflation models are attractive. One advantage of these models is that they can estimate the probability of being a zero as a function of covariates, as well as allowing the rate parameter to be a function of covariates. In an alcohol study, the intervention may be hypothesized to affect the abstinence proportion as well as the rate parameter for drinkers. Ad-hoc methods in this setting might involve estimating the proportion of drinkers at follow-up, and in a separate model, estimating the amount of drinking amongst the subset of subjects who reported any drinking. A more principled approach involves the simultaneous estimation of the zero-inflation factor (testing _{1 }= _{2}) and the rate parameter (testing _{1 }= _{2}). Slymen and colleagues

The results of the simulation studies and the secondary analyses of the ASAP study demonstrated the importance of appropriately modeling count outcomes. We caution against the use of the standard Poisson model when the mean and variance are not equal. Extensions of the Poisson (incorporating an over-dispersion parameter or use of the negative binomial distribution and/or zero-inflated models) are now available in general purpose statistical software, and address many of the shortcomings of the overly simplistic Poisson model.

As always, analysts are obliged to look at their data and utilize models that provide an appropriate fit in their situation. In particular, for models of alcohol consumption, attention should be paid to the functional form of the outcome to ensure that underlying assumptions of the methods utilized are met.

Authors' contributions

NH conceived of the project and provided overall guidance, in addition to reviewing and interpreting analyses, and drafting the manuscript. EK participated in the drafting of the manuscript, and carried out analyses and simulations. RS led the ASAP study and participated in the drafting of the manuscript. All authors read and approved the final version of the manuscript.

Appendix. Stata code and results for count models.

Click here for file

Acknowledgements

This research was supported in part by the National Institute on Alcohol Abuse and Alcoholism R01-AA12617, the Smith College Summer Research Program and the Howard Hughes Medical Institute. Thanks to Jessica Richardson for editorial assistance, Emily Shapiro and Min Zheng for assistance with simulations and Joseph Hilbe and Jeffrey Samet for helpful comments on an earlier draft.

Pre-publication history

The pre-publication history for this paper can be accessed here: