Abstract
Background
Healthrelated quality of life (HRQL) has become an increasingly important outcome parameter in clinical trials and epidemiological research. HRQL scores are typically bounded at both ends of the scale and often highly skewed. Several regression techniques have been proposed to model such data in crosssectional studies, however, methods applicable in longitudinal research are less well researched. This study examined the use of beta regression models for analyzing longitudinal HRQL data using two empirical examples with distributional features typically encountered in practice.
Methods
We used SF6D utility data from a German older age cohort study and strokespecific HRQL data from a randomized controlled trial. We described the conceptual differences between mixed and marginal beta regression models and compared both models to the commonly used linear mixed model in terms of overall fit and predictive accuracy.
Results
At any measurement time, the beta distribution fitted the SF6D utility data and strokespecific HRQL data better than the normal distribution. The mixed beta model showed better likelihoodbased fit statistics than the linear mixed model and respected the boundedness of the outcome variable. However, it tended to underestimate the true mean at the upper part of the distribution. Adjusted group means from marginal beta model and linear mixed model were nearly identical but differences could be observed with respect to standard errors.
Conclusions
Understanding the conceptual differences between mixed and marginal beta regression models is important for their proper use in the analysis of longitudinal HRQL data. Beta regression fits the typical distribution of HRQL data better than linear mixed models, however, if focus is on estimating group mean scores rather than making individual predictions, the two methods might not differ substantially.
Keywords:
Healthrelated quality of life; Beta regression; Longitudinal study; Mixed model; Marginal modelBackground
Healthrelated quality of life (HRQL) has become an increasingly important outcome parameter in clinical trials and epidemiological research to support clinical and policy decision making or to monitor population health [1,2]. Treatment effects on HRQL and population values are commonly estimated using regression techniques, however, HRQL scores typically exhibit specific properties that make the use of ordinary least square (OLS) regression at least doubtful for such kind of data [3,4]. In particular, they are continuous variables bounded at both ends of the distribution (e.g. at 0 and 1) and are often highly skewed. As a consequence, several alternative regression methods have been suggested such as censored least absolute deviation models [5], Tobit models [4,5] and median regression [5,6]. A regression technique that is gaining increasing attention in the analysis of doubly bounded outcome measures is the beta regression as introduced by Ferrari and CribariNeto [7]. Beta regression was first mainly used in economic and psychological applications [8,9], but has recently also been proposed to analyze generic HRQL [3,10]. In these contributions, it was shown that beta regression can have substantial advantages over OLS regression, especially in estimating covariate effects when the true incremental effect is large [3]. However, they also revealed that beta regression may perform poorly in handling observations on the boundary points [10].
While several regression models have been suggested to address the idiosyncrasies of HRQL data in a crosssectional design, research on longitudinal regression models is less well developed. This is both surprising and unfortunate given that change in HRQL over time is often the primary interest in applied work. Currently, longitudinal quality of life data are mostly analyzed using change scores [11], repeated measures ANCOVA [12,13], and linear mixed models (LMM) [14,15].
Beta regression has recently been expanded to deal with longitudinal data by introducing a betadistributed generalized linear mixed model (GLMM) [16,17]. However, a moreindepthcomparison with traditionally employed methods, especially the LMM, is still lacking. Also, to date no study has examined the applicability of longitudinal beta regression models to analyze HRQL scores over time.
An elaborate comparison between beta regression and linear regression in a longitudinal design is not only important with respect to model fit and predictive ability. It is also important to realize that in longitudinal models with nonidentity link such as beta regression, the interpretation of parameter coefficients depends on how the correlation between observations is accounted for [18]. Basically, two different approaches can be distinguished: A subjectspecific approach as implemented by the GLMM, and a populationaveraged approach using marginal models [19].
The purpose of this study was to examine the use of beta regression methods to analyze longitudinal HRQL data. We describe the conceptual differences between mixed effect models and marginal models researchers should be aware of when extending beta regression to the longitudinal case. Using two empirical datasets with both generic and diseasespecific HRQL scores, we compare estimated effects and predictive accuracy of the beta regression methods to those of the commonly used LMM.
Methods
Empirical data sets
We fitted longitudinal regression models to two empirical data sets representing different distributional features typically encountered when analyzing HRQL scores in practice. Data in the first example come from a cohort study, while data in the second example were collected alongside a randomized controlled trial (RCT). In both cases, we examined HRQL scores over time with respect to two groups of individuals.
In the first application, we examined how the generic SF6D health utility index changed over a 7year period in an older general population sample. Data come from the populationbased KORA S4/F4 cohort study conducted in the region of Augsburg in Southern Germany. The sample used in our analyses involved 1225 subjects aged 60 years and above recruited for the S4 survey in 1999. In 2006–2008, 812 of these 1225 subjects took part in the followup study F4. A detailed description of study design, sampling method and data collection can be found elsewhere [10,20]. Besides other questions, individuals were asked at both time points if they have diabetes mellitus. Also, subjects answered the 12item ShortForm Health Survey (SF12), from which the SF6D utility index was derived [21]. Health utilities can be used to calculate qualityadjusted lifeyears (QALYs) and usually range between 0 (health state similar to death) and 1 (‘perfect health’). However, due to the specific health state classification behind the SF6D, possible values only lie between 0.345 and 1 [21]. Focus in this analyses was on the question how diabetes mellitus is associated with HRQL over time.
The second application investigated diseasespecific HRQL in stroke patients over time, measured by the Stroke Impact Scale (SIS) [22]. Data were collected alongside an RCT evaluating a patient education programme for stroke survivors in neurological rehabilitation based on the conceptual framework of the International Classification of Functioning, Disability and Health (ICF). The study sample comprised 212 patients in the age range of 22 to 83 years recruited between 2008 and 2009 in seven rehabilitation clinics in Germany. Details on clinical characteristics and data collection methods can be found elsewhere [23]. Patients answered selfreport questionnaires before and after the education programme (median difference 10 days) as well as at a postal followup conducted 6 months later. At postintervention and followup, questionnaire data were available for 183, and 171 patients, respectively. Patients in the sample were assigned to two different rehabilitation phases (C and D), following the sixphase model of the German Federal Rehabilitation Council. The distinction between phase C and D contrasts patients still dependent on a high degree of nursing and medical care to those having mostly gained independence in the activities of daily life [24]. Since regaining mobility is a major goal of poststroke rehabilitation, the objective of the analyses was to analyze SIS mobility subscale (SISMob) scores over time. SISMob scores range from 0 to 100, with higher values indicating better HRQL. We divided scores by 100 in order to make them fit to the support of the beta distribution. In this analysis, we focused on the comparison of time trends between patients in phase D and those in phase C but ignored whether patients were assigned to the intervention or to the control group.
Both studies were approved by the local ethic committee.
Beta regression
Beta regression for crosssectional data
The beta distribution is a continuous probability distribution defined over the unit interval with density function
where Γ (.) denotes the gamma function [7]. The parameter μ denotes the expected value of Y, i.e. E(Y) = μ. The parameter ϕ fulfils the definition of a precision parameter since – for fixed μ – the greater the value of ϕ, the smaller the variance of the dependent variable. More specifically,
The beta distribution is part of the exponential family, but not of canonical form [18,25]. In beta regression models, the mean parameter μ ∈ (0,1) of the beta distribution is expressed as a function of covariates, while the precision parameter ϕ ∈ ℝ^{+} is treated as nuisance. To map the linear predictor into the space of observed values on the unit interval, the logit link
is commonly used as the link of choice where denotes a vector of covariates, and β refers to the vector of regression coefficients, i = 1,…,N[8,26]. The beta distribution is defined on the open unit interval only. If ones and zeros are observed, these values need to be transformed in order to fall into the open unit interval (0,1). This can be achieved by either minimally compressing the entire range of observed values, or by only transforming the boundary points to slightly smaller or greater values, respectively. The most frequently applied transformation is given by
where Y^{*} is the transformed and Y is the untransformed dependent variable [8,17]. Alternatively, it has been suggested to add a small amount ε, e.g. 0.005 or 0.01 to the lower bound, and to subtract the same amount from the upper bound [8,16]. A reasonable choice involves the following tradeoff: On the one hand, large values for ε shrink the data more toward 0.5 and may bias the estimates toward no effect; on the other hand, moving zero and onevalued observations an insufficient distance away from the boundary may lead to instable estimates because this can cause the likelihood to have a local or even global mode in this area [16,27]. Hunger et al. also observed that when the resulting values are too close to the boundary points, precision of the estimates may appreciably decrease [10]. Therefore, it has been recommended to use sensitivity analyses in order to check whether different endpoint handling methods affect parameter estimates [8,16].
Beta GLMM for longitudinal data
In longitudinal analyses or in the case that subjects are clustered within sampling units or geographical entities, measurements within the same person or unit are typically correlated, violating the assumption of conditionally independent observations in regression models [18]. One possibility to account for these dependencies is to add random cluster or subject effects into the linear predictor. Without loss of generalizability, consider the case of longitudinal designs where j = 1,…,n_{i} observations are nested within i = 1,…,N subjects. Let b_{i} denote a vector of subjectspecific random effects for individual i.
In the linear regression model, the inclusion of random effects leads to the LMM given by
Similarly, adding random effects to the beta regression model in (1) yields the beta GLMM [16,17] given by
In both cases, is a vector of covariates, and G denotes the positive definite covariance matrix of the random effects. Note that although the assumption of normality for the random effects is common and statistically convenient, other distribution assumptions are possible in principle [17]. In a longitudinal design, b_{i} typically is a scalar (for random intercept only models) or a bivariate vector (for models with random intercept and random slope). In the first case, z_{ij} = 1, while in the second case, where t_{ij} is the time of measurement j for subject i. Models with random slope allow the linear effect of time to vary across subjects. Model parameters are estimated by maximizing the marginal likelihood which is obtained by integrating out the unobserved random effects b_{i} from the likelihood function [16].
Although the inclusion of random effects in the beta GLMM is conceptually the same as in the LMM, there are important implications with regard to the interpretation of regression parameters: In the LMM, the fixed effects have both a subjectspecific (together with the random effects b_{i}) and a populationaverage interpretation. This follows directly from (3) because
In the beta GLMM, however, the regression parameters only have a subjectspecific interpretation and no longer describe the effect of the respective variable on the population in general [18]. This is due to the nonlinear transformation of the mean response (i.e. the logit link) since it can be deduced from (4) that
This individualspecific interpretation means, for example, that the parameter coefficient of the covariate ‘diabetes’ in the first empirical application refers to the difference in mean SF6D scores on the logit scale between an individual with diabetes and the same individual supposed not to have diabetes [28].
Beta GEE
If a populationaveraged interpretation of the regression coefficients is desired, for example the mean difference between the groups of individuals with and without diabetes, an alternative to the beta GLMM is the marginal model. The term ‘marginal’ means that the mean response modeled is conditional only on covariates and not on other responses or random effects [18].
Marginal models do not specify the full joint distribution of the data, but only specify a mean function, a variance function, and a correlation structure between observations within one individual. Mean and variance function (in some models together with an additional scaling factor ϕ) are often suggested by the canonical form of the exponential family [29]. For the beta distribution, it is convenient to specify following (1), and using the variance function .
Note that this specification of mean and variance structure is also commonly used in GEE models to analyze binary data. The only difference is the additional scaling parameter ϕ which is usually not used in a GEE for binary data. The inclusion of the scaling parameter in the beta GEE has no impact on the estimation of the mean model parameters, however, it has the advantage that large estimates for ϕ can indicate heterogeneity in the data that is not accounted for by the model [3]. Similarities also exist to the inclusion of an additional dispersion parameter in quasibinomial models for crosssectional data and such methods have already been used in literature to model HRQL scores [3032]. For the working correlation matrix, several choices are possible. Among them, compound symmetry, autoregressive structure, and unstructured correlation are most commonly used in longitudinal analyses [18]. Variance function and correlation matrix can then be combined into a ‘working’ covariance matrix V_{i}. Parameter estimates in the marginal model are obtained by solving the Generalized Estimating Equations (GEE) introduced by Liang and Zeger [33,34].
In general, there are no closedform solutions, so that iterative algorithms are used. Specific types of GEEs can further be distinguished according to how the covariance parameters are estimated. While the early contributions on GEEs mainly used the methods of moments, other approaches using pseudolikelihood techniques and quadratic estimation equations methods have also been suggested [35]. The latter approach, for example, is implemented in the SAS GLIMMIX procedure. Parameter coefficients in the GEE are estimated consistently even if the covariance structure is misspecified, however, a careful choice of the working correlation may improve efficiency of the estimates. Valid standard errors for can be calculated by using the so called sandwich estimator [18]. Since the full likelihood of the data is not specified in GEE models, likelihoodbased criteria to assess model fit are not available.
Missing data
Missing data are an important issue in many quality of life studies. Whether inference remains valid in the case of incomplete data depends on the underlying missing data mechanism and the statistical methods used. Estimates from the beta GLMM remain valid if the data are missing at random (MAR), i.e. that given the observed data, the probability of a missing observation does not depend on the unobserved data [36,37]. However, this requires maximum likelihood estimation based on adaptive Gaussian quadrature to be used; other estimation methods such as penalized quasi likelihood (PQL) can lead to biased estimates of the covariate effects [18]. In contrast, inferences with the beta GEE are only valid under the stronger assumption that data are missing completely at random (MCAR), i.e. that missingness is independent of both, unobserved and observed data [33,38]. Extensions of the GEE have been proposed to allow the data to be MAR, however, these methods either focus on monotone missing patterns or require the correct specification of the working correlation matrix [39,40].
Model comparison and residuals: current state of research
Model comparison and model checking in the GLMM and GEE framework is not straightforward and suitable methods are sparse [41]. In general, if GLMMs are estimated using a full likelihood approach, models can be compared using information criteria such as Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) [16]. AIC and BIC are measures of the likelihood, penalized for the complexity of the model. Zimprich suggested comparing beta GLMM and LMM on the basis of a pseudoR^{2} which is motivated from the pseudoR^{2} suggested by Cox and Snell for model comparison in logistic regression models [17,42]. It is defined as
where L_{Intercept} is the likelihood of a simple interceptonly linear model fit to the data, L_{Full} is the likelihood of the considered beta GLMM or LMM, and N is the total number of observations. The pseudoR^{2} compares the likelihood of the observed data in the beta GLMM and LMM with that of a simple interceptonly linear regression model. Thus, it reflects the improvement each model has over a model without explanatory variables and can be interpreted as the geometric mean squared improvement per observation [42].
There are two types of residuals in the GLMM. Depending on the level on which fitted values are produced, one can distinguish average Pearson residuals (related to the unconditional mean ) and individualspecific residuals (relating to the conditional means ) where g denotes the link function of the regression model (i.e. the logit in the beta GLMM). Diagnostic plots typically use individualspecific residuals. Several different residuals have been proposed for use in beta regression with independent observations, namely standardized residuals, deviance residuals, weighted residuals, and standardized weighted residuals [7,43]. However, none of these residuals has yet been extended to be applicable in the mixed regression context.
Basu and Manca used a beta regression model to analyze QALY data and examined raw scale residuals to evaluate goodness of fit [3]. In particular, they calculated mean residuals across deciles of the linear predictor in order to identify systematic patterns of misfit in the predictions.
Model specification
In both empirical examples, we compared the performance of LMM, beta GLMM, and beta GEE model. Response variables were the SF6D score and the SIS mobility subscale (SISMob) score, respectively.
We transformed the zero and onevalued responses in our empirical datasets to 0.005 and 0.995, respectively [8,10,16]. This is because transformation (2) depends on the number of observations, and its use in the large KORA data would move the onevalued observations to 0.9997 which is extremely close to the upper bound. However, to ensure that estimates are not affected by this choice, we also used other values for ε between 0.002 and 0.01 to move observations away from the boundary points.
Covariates in the regression models were age at baseline, sex, and time point. In the KORA data, we additionally included diabetes and its interaction with time. Thus, the respective beta GLMM was given by
where timeII_{ij} is the dummy variable for the second measurement time.
In the ICF stroke data, we additionally included rehabilitation phase and its interaction with time:
where timeII_{ij} and timeIII_{ij} are the dummy variables for the second and third measurement times, respectively. The same (fixed effects) covariate structure was specified for the beta GEE models.
Since we only had two to three time points, our mixed models only contained a random intercept but no random slope component. In accordance, we chose a compound symmetry correlation structure in the GEE models, assuming that all measurements on the same unit are equally correlated. In the case of two measurements only, this structure is identical to more complicated structures such as autoregressive correlation.
Taking an individualspecific perspective, we compared model fit of LMM and beta GLMM using AIC, BIC, and pseudoR^{2}. However, in contrast to Zimprich, we did not calculate the pseudoR^{2} by comparing the likelihood of the models specified above to that of an interceptonly linear model, but to the likelihood of a simple LMM with random intercept only. This is because for longitudinal data, the correlation between observations within the same individual should also be accounted for in the basic model used for comparison.
To further examine whether the two models provide a good fit to all parts of the data, we calculated mean raw residuals on the individualspecific level across deciles of the corresponding linear predictor [3]. If these means are not randomly scattered around 0, this indicates a systematic misfit of the model.
Taking the population average perspective, we compared the unconditional predictions from the LMM with the corresponding predictions from the marginal beta GEE model. Fore each time point we calculated adjusted mean HRQL scores stratified by diabetes (in the KORA data) or rehabilitation phase (in the ICF stroke data).
If an individual had a missing quality of life score or missing covariates at a certain time point, we deleted the respective observation but did not exclude the entire individual from the analysis.
All models were estimated using the GLIMMIX procedure in SAS. We approximated the marginal likelihood in the beta GLMM through Gaussian quadrature which is implemented in the SAS GLIMMIX procedure (from version 9.2) by the method = quad option. The code used to fit LMM, beta GLMM and beta GEE to the KORA data is provided in Additional file 1.
Additional file 1. SAS Code used to fit LMM, beta GLMM and beta GEE to the KORA data.
Format: PDF Size: 24KB Download file
This file can be viewed with: Adobe Acrobat Reader
Results
In the KORA data, 91 observations were deleted due to missing values in the response variable. One additional observation was removed due to missing information on the diabetes status. This reduced the final sample size from 2037 to 1945. In the stroke data, the observations from 15 participants were deleted because they had no information on the rehabilitation phase. Nine further observations were removed due to missing values in the response variable. This reduced the final size from 566 to 517. In the KORA data, mean age at baseline was 66.2 years (SD 4.3), and 592 (50.9%) participants were male. The percentage of individuals with diabetes was 8.9% at baseline and 16.2% at followup. Mean age in the ICF stroke data was 57.2 years (SD 12.8), and 115 (54.3%) individuals were male. About two third (67.5%) of the participants were assigned to rehabilitation phase D, and one third to phase C.
When single density curves were fitted to the univariate data, in both examples, the beta distribution reproduced the shape of the observed HRQL score (SF6D and SISMob) distributions clearly better than the normal distribution (Figure 1). It accommodated the leftskew of the observed data and respected the boundary points while large parts of the fitted normal density function were lying outside the theoretically possible range of HRQL values, especially in the ICF stroke data.
Figure 1. Distribution of SF6D utility scores by time in the KORA data (upper part) and distribution of the SIS Mobility scores by time in the ICF stroke data (lower part). The curves represent estimated single density functions of the beta (solid) and the normal (dashed) distribution fitted to the univariate data.
The parameter estimates of the regression analyses fitted to the KORA data, are shown in Table 1. Comparing LMM and beta GLMM, one observes that age, sex and diabetes had a significant effect on the mean SF6D utility score in both models, however, with an AIC of −2723 and a BIC of −2682, the beta GLMM fitted the observed data better than the LMM (AIC −2441; BIC −2401). This is also reflected by the pseudoR^{2} statistics. (0.054 in the LMM, 0.181 in the beta GLMM).
Table 1. Parameter estimates of LMM, beta GLMM and beta GEE in the KORA data (N = 1945)
Interpretation of parameter estimates in the beta regression model is similar to logistic regression where exponentiated coefficients can be interpreted in terms of odds ratios. For example, the parameter coefficient of male sex in the beta GLMM means that for a man, the ratio between the expected quality of life score μ and the difference to perfect health (1μ) is about exp(0.3483) = 1.42 times higher than for a woman with the same set of covariates (and random effect).
The interaction between diabetes and time suggests that the decline in HRQL over time was slightly larger in individuals with diabetes, however, the interaction term was only borderline significant.
Figure 2 shows the mean residuals across deciles of the linear predictors for the KORA data. One observes a strong correlation between residuals and predicted means for both LMM and beta GLMM, suggesting that both models overestimated the mean at the lower, and underestimated the mean at the upper part of the distribution. Probably, this results from the fact that generic HRQL scores are usually highly dispersed and that we only included very few covariates in our model.
Figure 2. Mean residuals across deciles of linear predictors for beta GLMM and LMM in the KORA data.
Parameter coefficients from beta GLMM and beta GEE are difficult to compare, however, one recognizes that parameters in the GEE are estimated with less precision. The adjusted mean SF6D scores from LMM and beta GEE model together with their 95% confidence intervals are shown in Table 2. It shows that both models produce very similar estimates but that for individuals with diabetes, standard errors from the LMM were slightly smaller than those from the beta GEE.
Table 2. Adjusted marginal mean SF6D scores with 95% confidence intervals for time and diabetes in the KORA data (N = 1945)
The regression models fitted to the ICF stroke data are shown in Table 3. Likewise, the table shows that the beta GLMM fitted the data better than the corresponding LMM. It achieved better AIC and BIC values and had a higher pseudoR^{2}. Furthermore, the beta GLMM respected the restricted range of the SISMob scores, whereas 6 individual predictions based on the LMM estimates were lying outside the theoretically possible range. The significant interaction term between time and phase indicates that individuals in phase C showed greater improvement over time than individuals in phase D.
Table 3. Parameter estimates of LMM, beta GLMM and beta GEE in the ICF stroke data (N = 517)
Looking at the mean residuals across deciles in Figure 3, one recognizes that, compared to the LMM, the beta GLMM underestimated the mean at the upper part of the distribution.
Figure 3. Mean residuals across deciles of linear predictors for beta GLMM and LMM in the ICF stroke data.
The adjusted mean SISMob scores from LMM and beta GEE are shown in Table 4, suggesting that, again, both methods lead to nearly identical mean estimates. For patients in phase C, standard errors from the LMM were smaller than those from the beta GEE model, however, the opposite was true for patients in phase D.
Table 4. Adjusted marginal mean SISMob scores with 95% confidence intervals for time and rehabilitation phase in the ICF stroke data (N = 517)
The use of different values ε to move observations away from the boundary points in the beta GLMM did not appreciably affect parameter estimates; solely transformation (2) decreased the precision of estimates by about 20%.
Discussion
Beta regression is a promising method for modeling HRQL data in cross sectional research [3,10], and recent methodological work has extended the beta regression model to deal with dependent observations [8,16]. In this paper, we examined the potential of beta regression methods in the analysis of longitudinal HRQL data. We highlighted the need to distinguish between mixed and marginal models, namely beta GLMM and beta GEE, when beta regression is extended to the longitudinal case. Using two empirical applications with data distributions typically encountered in practice, we compared the performance of the beta regression methods to that of the commonly used LMM.
Data collected in longitudinal designs typically have correlated observations, violating a basic assumption of ordinary regression methods. Longitudinal analyses require regression techniques that account for this dependence. In general, the correlation among repeated measures can be modeled implicitly, i.e. by including random effects as in the mixed model, or explicitly, i.e. by specifying a covariance structure between observations as in the marginal model. Through the inclusion of random effects, mixed models assume natural heterogeneity across individuals in some regression coefficients [18]. Random effects can also be motivated as an omitted subjectvarying covariate, thus they give a potential explanation for the sources of correlation [19]. In contrast, marginal models treat the dependence between observations as nuisance and account for its effects by specifying a working correlation.
For linear longitudinal models, regression coefficients have the same interpretation regardless of how the correlation is modeled. For regression models with nonidentity link such as beta regression, however, interpretation depends on whether a mixed model (i.e. a GLMM) or a marginal model is fitted. In the GLMM, estimated effects are adjusted for individual difference and thus only refer to withinindividual change. In the marginal model, in contrast, the mean response is conditional only on covariates and not on other responses or random effects [18].
The choice between the two depends mainly on the specific scientific question of interest. GLMMs are most useful for making inferences about individuals and tracking individual trajectories, while the marginal model is more useful for inferences about population or subpopulation averages. No model is a priori more suitable for the analysis of HRQL data than the other. It has been argued that mixed models may be more appropriate in epidemiological research as they allow a better understanding of the underlying mechanisms [28]. Also, they have a close relationship to matchedpair design methods often used in epidemiologic and public health research [19]. Due to the individualspecific interpretation of regression coefficients, the GLMM is also most meaningful for timevarying covariates. In contrast, the interpretation of timeinvariant or betweensubject covariates in the GLMM is less intuitive or even misleading since they also only allow a withinsubject interpretation which is difficult to imagine. For example, if a beta GLMM is used to estimate treatment effects on HRQL in clinical trials, the respective treatment arm coefficient is interpreted as the difference in outcomes between two individuals with the same covariate values and the same random effects b_{i}, differing only in their treatment arm. It does not describe the average treatment effect which is usually of major interest in intervention studies, especially if preferencebased HRQL measures are used in economic evaluation studies [4]. Therefore, the marginal model may be more suitable in many applications in public health research. Also, it has been argued that many epidemiologic methods such as stratified methods are essentially populationaveraged methods [19]. For our empirical applications this means that the change in SF6D index scores associated with diabetes in the KORA data may be better described by a beta GLMM, while the difference in mean SIS scores between rehabilitation phases in the ICF stroke data may be better assessed using a beta GEE. Differences between beta GLMM and beta GEE also exist with respect to the handling of missing data: In practice, the beta GLMM may be more convenient since it remains valid under the MAR assumption which is usually more plausible in quality of life studies than the MCAR assumption made by the beta GEE.
A common approach to compare regression models and assess goodness of fit is to consider likelihoodbased statistics which evaluate the probability of the observed data under the model. In both of our empirical examples, beta GLMM had better fit statistics (such as AIC, BIC or pseudoR^{2}) than the commonly used LMM, indicating that the beta distribution better accounted for the bounded support of the observed HRQL scores and their highly skewed distributions. However, an important question is whether better likelihood statistics make the beta GLMM more suitable than the LMM in practice. A similar issue has also been addressed previously: Zimprich used a beta GLMM to analyze longitudinal data on complex choice reaction time and concluded from better likelihoodbased fit statistics that beta GLMM fitted the data much better than a LMM did [17]. However, given a fairly close similarity between parameter estimates, he also raised the question whether apart from these statistical considerations, beta GLMM is worth the effort to apply in practical data analyses. We even go one step further arguing that the likelihood may not be the most relevant criterion when comparing models to analyze HRQL data. Distributional fit and predicted densities may be important in applications with focus on individual density forecasts, such as in the reaction time example. However, when analyzing HRQL data in RCTs or cohort studies, conditional means rather than predictive densities are commonly of major interest [4]. Against this background, more attention should be attached to the question whether the mean structure is appropriately reproduced by the model. Figure 2 showed that the beta GLMM reproduced the observed values at the upper end of the distribution less satisfactorily than the LMM. This may be explained by the fact that beta regression fits both means and variances to the data. Since in the beta distribution the variance is a function of the mean, the estimated mean function may be biased. This phenomenon has already been observed in a crosssectional design and suggests that full likelihoodbased beta regression methods should be used with care when analyzing HRQL [3].
In the marginal perspective, beta GEE produced nearly identical estimates to the LMM, however, differences could be observed with respect to standard errors, especially in the ICF stroke data (Table 4). The larger standard errors of the beta GEE for patients in phase C are probably due to the fact that beta GEE provides robust standard errors using the sandwich formula. The smaller standard errors for patients in phase D, however, indicate that beta regression accounts for heteroscedasticity related to the bounded nature of the response variable [8]. This is because the predicted means of individuals in phase D were rather high, and for outcomes bounded on the unit interval, the variability of scores declines as the mean approaches one.
An important limitation of the beta regression is that it does not contain the boundary points 0 and 1 so that quite arbitrary transformation methods need to be applied. However, our sensitivity analyses support previous research in that parameter estimates are robust to the choice of transformation, provided that the values are moved far enough away from the boundaries.
The two empirical data sets used in this study were chosen to cover different types of studies commonly encountered in HRQL research. In particular, we addressed both diseasespecific and generic HRQL scores and used data both from a cohort study and from a clinical setting. The two illustrative examples tackle clinically relevant research questions that have also been addressed in other studies [4446]. However, since this paper focused on the comparison between different methodological approaches, we did not deal in detail with interpreting results in the healthcare context. For the purpose of this paper we have also made some simplifications, e.g. we did not consider model building but preferred using a rather lean model with only a few covariates. Also, we treated the precision parameter in the beta GLMM as constant instead of modeling it in terms of covariates, although such an approach may have improved model fit [10,17,26].
Another limitation of our study is that our empirical data only provided up to three measurements per individual. Further research is needed to examine the use of beta regression in more complex study designs. Also, we did not consider random slopes which are commonly used to model heterogeneity in the effect of time on the response variable [15]. However, for reasons of model convergence, it is not recommended to fit anything more complex than a single random intercept model to nonnormal data with only a few time points per person. Similarly, we did not consider working correlation structures other than compound symmetry in the beta GEE. However, compound symmetry assumes the same correlation for all observations within a person which we think is reasonable in the case of only a few time points per person. Furthermore, it corresponds to the correlation structure implicitly modeled by the mixed model with single random intercept.
Conclusions
In conclusion, longitudinal beta regression models are a natural candidate to analyze HRQL over time since they account for the bounded range and the skewed distribution of the response variable. However, depending on whether a populationaveraged or a subjectspecific approach is preferred, researchers should distinguish between a mixed (beta GLMM) and a marginal (beta GEE) model. The mixed model may be more appropriate in cohort studies in order to track individual HRQL trajectories, while the marginal model is more suitable to estimate average treatment effects in intervention studies. Although beta regression addresses the specific idiosyncrasies of bounded HRQL data, empirical estimates only slightly differed from those of the commonly applied linear mixed model.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MH devised the concept for the paper, performed the statistical analysis, interpreted the data and drafted the manuscript. AD was involved in the coordination of the KORA study and commented on drafts of paper. RH was involved in the conception of the KORA study and assisted in writing the manuscript. All authors read and approved the final manuscript.
Acknowledgment
The KORA research platform (KORA, Cooperative Research in the Region of Augsburg) was initiated and financed by the Helmholtz Zentrum München  German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research and by the State of Bavaria. We acknowledge with thanks PD Dr. Alarcos Cieza and Dr. Carla Sabariego from the Institute for Public Health and Health Care Research at the LudwigMaximiliansUniversity Munich for the kind provision of the ICF stroke data. Special thanks go to Nora Fenske from the Department of Statistics of the LudwigMaximiliansUniversity Munich for providing valuable inputs to improve the quality of the paper.
References

Calvert M, Blazeby J, Revicki D, Moher D, Brundage M: Reporting quality of life in clinical trials: a CONSORT extension.

Konig HH, Bernert S, Angermeyer MC, Matschinger H, Martinez M, Vilagut G, Haro JM, de Girolamo G, de Graaf R, Kovess V, Alonso J: Comparison of population health status in six european countries: results of a representative survey using the EQ5D questionnaire.

Basu A, Manca A: Regression estimators for generic healthrelated quality of life and qualityadjusted life years.

Pullenayegum EM, Tarride JE, Xie F, Goeree R, Gerstein HC, O'Reilly D: Analysis of health utility data when some subjects attain the upper bound of 1: are Tobit and CLAD models appropriate?

Austin PC: A comparison of methods for analyzing healthrelated qualityoflife measures.

Li L, Fu AZ: Some methodological issues with the analysis of preferencebased EQ5D index score.

Ferrari SLP, CribariNeto F: Beta Regression or Modeling Rates and Proportions.

Smithson M, Verkuilen J: A better lemon squeezer? Maximumlikelihood regression with betadistributed dependent variables.

Bruche M, GonzálezAguado C: Recovery rates, default probabilities and the credit cycle.

Hunger M, Baumert J, Holle R: Analysis of SF6D index data: is beta regression appropriate?

Koch CG, Khandwala F, Blackstone EH: Healthrelated quality of life after cardiac surgery.

Schaafsma FG, Kurrle SE, Quine S, Lockwood K, Cameron ID: Wearing hip protectors does not reduce healthrelated quality of life in older people.

Barrera M, Atenafu E, Hancock K: Longitudinal healthrelated quality of life outcomes and related factors after pediatric SCT.

Chen H, Cohen P: Using individual growth model to analyze the change in quality of life from adolescence to adulthood.

Verkuilen J, Smithson M: Mixed and mixture regression models for continuous bounded responses using the beta distribution.

Zimprich D: Modeling change in skewed variables using mixed beta regression models.

Fitzmaurice GM, Laird NM, Ware JH: Applied Longitudinal Analysis. 2nd edition. Wiley, New York; 2011.

Hu FB, Goldberg J, Hedeker D, Flay BR, Pentz MA: Comparison of populationaveraged and subjectspecific approaches for analyzing repeated binary outcomes.

Holle R, Happich M, Lowel H, Wichmann HE: KORA–a research platform for population based health research.

Brazier JE, Roberts J: The estimation of a preferencebased measure of health from the SF12.

Duncan PW, Wallace D, Lai SM, Johnson D, Embretson S, Laster LJ: The stroke impact scale version 2.0. evaluation of reliability, validity, and sensitivity to change.

Hunger M, Sabariego C, Stollenwerk B, Cieza A, Leidl R: Validity, reliability and responsiveness of the EQ5D in German stroke patients undergoing rehabilitation.

Rollnik JD, Janosch U: Current trends in the length of stay in neurological early rehabilitation.

Brown LD: Fundamentals of statistical exponential families: with applications in statistical decision theory. Volume 9. Hayward, California: Institute of Mathematical Statistics  Lecture Notes; 1986.

Aitchison J: The statistical analysis of compositional data.

Carriere I, Bouyer J: Choosing marginal or randomeffects models for longitudinal binary responses: application to selfreported disability among older persons.

Gardiner JC, Luo Z, Roman LA: Fixed effects, random effects and GEE: what are the differences?

Hahn EA, Cella D, Dobrez DG, Weiss BD, Du H, Lai JS, Victorson D, Garcia SF: The impact of literacy on healthrelated quality of life measurement and outcomes in cancer outpatients.

Levy AR, Christensen TL, Johnson JA: Utility values for symptomatic nonsevere hypoglycaemia elicited from persons with and without diabetes in Canada and the United Kingdom.

Papke LE, Wooldridge JM: Econometric methods for fractional response variables with an application to 401(k) plan participation rates.

Liang KY, Zeger S: Longitudinal Data Analysis Using Generalized Linear Models.

Zeger SL, Liang KY: Longitudinal data analysis for discrete and continuous outcomes.

Molenberghs G, Verbeke G: Models for Discrete Longitudinal Data. New York: Springer; 2005.

Ibrahim JG, Molenberghs G: Missing data methods in longitudinal studies: a review.

Little RJA, Rubin DB: Statistical analysis with missing data. 2nd edition. New York: Wiley; 2002.

Horton NJ, Lipsitz SR: Review of software to fit generalized estimating equation (gee) regression models.

Lipsitz SR, Molenberghs G, Fitzmaurice GM, Ibrahim J: GEE with Gaussian estimation of the correlations when data are incomplete.

Robins JM, Rotnitzky A, Zhao LP: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data.

Dean CB, Nielsen JD: Generalized linear mixed models: a review and some extensions.

Cox DR, Snell EJ: Analysis of Binary Data. 2nd edition. Boca Raton: Chapman & Hall; 1989.

Espinheira PL, Ferrari SLP, CribariNeto F: On beta regression residuals.

Ferrucci L, Bandinelli S, Guralnik JM, Lamponi M, Bertini C, Falchini M, Baroni A: Recovery of functional status after stroke. a postrehabilitation followup study.

Frank B, Schlote A, Hasenbein U, Wallesch CW: Prognosis and prognostic factors in ADLdependent stroke patients during their first inpatient rehabilitation–a prospective multicentre study.

Johnson JA, Pickard AS, ConnerSpady B: Longitudinal changes in healthrelated quality of life of people with diabetes compared to those without chronic conditions. In Proceedings of the 17th EuroQol Plenary Meeting. Edited by Badia X. Pamplona, Spain; 2010:193206.
Prepublication history
The prepublication history for this paper can be accessed here: