Email updates

Keep up to date with the latest news and content from BMC Medical Research Methodology and BioMed Central.

Open Access Research article

Joint modelling rationale for chained equations

Rachael A Hughes1*, Ian R White2, Shaun R Seaman2, James R Carpenter34, Kate Tilling1 and Jonathan AC Sterne1

Author Affiliations

1 School of Social and Community Medicine, University of Bristol, Bristol, UK

2 MRC Biostatistics Unit, Institute of Public Health, Cambridge, UK

3 London School of Hygiene and Tropical Medicine, London, UK

4 MRC Clinical Trials Unit, London, UK

For all author emails, please log on.

BMC Medical Research Methodology 2014, 14:28  doi:10.1186/1471-2288-14-28


The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2288/14/28


Received:12 September 2013
Accepted:13 February 2014
Published:21 February 2014

© 2014 Hughes et al.; licensee BioMed Central Ltd.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Background

Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e.g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples.

Methods

Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied.

Results

We provide an additional “non-informative margins” condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak.

Conclusions

Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.

Keywords:
Chained equations imputation; Gibbs sampling; Joint modelling imputation; Multiple imputation; Multivariate missing data

Background

Multiple imputation [1] has become a popular approach for the analysis of incomplete data, with several mainstream statistical packages now incorporating multiple imputation tools. It involves making several draws of the missing data from their posterior predictive distribution given the observed data and an imputation model. For multivariate, non-monotone missing data there are two main approaches for constructing an imputation model: joint modelling and chained equations. Joint modelling imputation requires the specification of a parametric joint model for the complete data: current implementations impute under the multivariate normal model, the log linear model and the general location model [2]. However for datasets containing different types of variables the current classes of joint models [3-5] may not be appropriate for the joint distribution of the data. The alternative method, chained equations imputation [4,6], is more flexible as it specifies a separate imputation model, typically a univariate regression model, for each incomplete variable and updates the missing data for each variable in turn.

Chained equations imputation has been proposed under several different names including: fully conditional specification, stochastic relaxation, variable-by-variable imputation, regression switching, sequential regressions, ordered pseudo-Gibbs sampler, partially incompatible MCMC and iterated univariate imputation [3]. In addition to handling variables of varying types, the chained equations approach has other flexible features such as incorporating restrictions, logistical and consistency bounds (for example, to handle imputation of gender specific variables or impute only questions that were not intentionally skipped in a questionnaire [4]). van Buuren and Groothuis-Oudshoorn [7] discuss the wide range of medical fields that have used chained equations imputation (e.g. addiction [8], epidemiology [9], infectious diseases [10], genetics [11], cancer [12], obesity and physical activity [13]), and a brief review of available software that have implemented chained equations imputation is given by [14]. Given the popularity of chained equations, among users of varying degrees of expertise, there is now guidance in its practical use (e.g. [14] and [15]).

Despite its widespread use, a known theoretical weakness of the chained equations method is that the implicit joint distribution underlying the separate models may not always exist: that is, the conditional models may be incompatible [4,5,16-18]. In such situations, the results after chained equations imputation may systematically differ according to the order in which the missing variables are updated in the chained equations algorithm. We shall refer to this phenomenon as an “order effect”.

Previous authors [3-5] have stated that chained equations imputation under a set of normal linear regression models, with all other variables as covariates and no interactions, is equivalent to a Gibbs sampler that draws from a multivariate normal distribution. van Buuren [3] also states for a dataset of three partially observed binary variables that chained equations under a set of logistic regression models, with all other variables included as main effects only, is equivalent to a joint modelling imputation under a log linear model with the three-way factor term set to zero. However, none of these papers provides a proof beyond stating that the set of conditional models is compatible [16] and are all derived from the specified joint distribution.

Independently and concurrently with our work, Liu et al. [19] have given sufficient conditions (which include compatibility of the conditionals) under which, as the sample size tends to infinity, the stationary distribution of the Markov chain generated by the chained equations algorithm (assuming that this stationary distribution exists and that the chain converges to it) converges to the posterior predictive distribution of the missing data implied by a joint Bayesian model. That is, under these sufficient conditions, the total variation of the distance between the chained equations stationary distribution and the posterior predictive distribution tends to zero as the sample size tends to infinity. As a corollary, Liu et al show the equivalence of the two imputation methods in finite samples under a condition we have independently identified and named the “non-informative margins” condition.

Our work is complementary to that of Liu et al. Firstly, we have taken a different approach to prove the equivalence of the two imputation methods in finite samples. Additionally, in specific examples, we prove whether the non-informative margins condition is satisfied or not, and in a simulation study we demonstrate the consequences when the conditional models are compatible but do not satisfy the non-informative margins condition.

In this paper, we provide a “non-informative margins” condition that, together with compatibility of the conditionals (and assuming that the Markov chain generated by the chained equations converges to a stationary distribution), guarantees that the imputed values obtained using chained equations (at convergence) are drawn from the posterior predictive distribution of the missing data implied by a Bayesian joint model. We give examples of chained equations algorithms that satisfy the non-informative margins condition when the joint model is the multivariate normal model and the saturated multinomial model, and examples where this condition is not satisfied when the joint model is an unsaturated multinomial model and the general location model. A simulation study considers a simple chained equations algorithm in which the conditional models are compatible but do not satisfy the non-informative margins condition, and shows that it is not equivalent to any joint model procedure.

Methods

Notation

Suppose K random variables <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M1','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M1">View MathML</a> are intended to be observed on N subjects. We use subscripts i and j to index subjects and variables respectively (i = 1,…,N; j = 1,…,K). Let x = (xij) denote an (N × K) matrix, whose i,j element is xij. Column j of matrix x is denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M2','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M2">View MathML</a>. It is assumed that the rows of matrix x are independent and identically distributed draws from a probability distribution with probability distribution function p (X ∣ θ), where θ is an unknown parameter.

In practice some subjects have missing observations on up to K - 1 variables and we write xj = (xjobs,xjmis) for any j, x = (xobs,xmis) and p (x ∣ θ) = p(xobs,xmis ∣ θ), with superscript obs and mis denoting the observed and missing data respectively. In keeping with the assumptions of joint modelling imputation and chained equations imputation, the missing data mechanism is assumed to be ignorable for Bayesian inference [20] p. 120, so that inferences about θ can be based on the marginal observed data posterior p (θ ∣ xobs).

Joint modelling imputation

Joint modelling imputation requires the specification of a parametric joint model p(xobs,xmis∣ θ) for the complete data and a prior distribution p(θ) for parameter θ. Imputations are independent draws from the posterior predictive distribution of the missing data given the observed data p(xmis∣ xobs) [2] p. 105, which under the ignorability assumption is

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M3','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M3">View MathML</a>

Therefore, to draw from this posterior predictive distribution, first draw θ ∼ p (θ ∣ xobs) followed by xmis ∼ p (xmis ∣ xobs,θ) [2] p. 105. When it is difficult to draw from the observed data posterior p(θxobs), Markov chain Monte Carlo methods can be used. For example, the data augmentation algorithm of Tanner and Wong [21] draws missing values from the posterior predictive distribution xmis ∼ p (xmis∣ xobs,θ) and then draws θ from the complete data posterior θ ∼ p (θ ∣ xobs,xmis), where ∗ denotes the last drawn values of θ or xmis. Upon convergence this produces a draw from the joint posterior distribution p(θ,xmis ∣ xobs).

Chained equations imputation

For every incomplete variable the chained equations algorithm requires an imputation model, typically a univariate regression model, and an accompanying prior distribution for the model’s parameter. Let X-j = (X1,…,Xj-1,Xj+1,…,XK)T denote the vector of random variables excluding variable Xj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M4','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M4">View MathML</a> the submatrix of x corresponding to variables X-j. We write p (xj ∣ x-j,ψj) for the probability distribution function of the imputation model for variable Xj and p (ψj) for the prior distribution of the unknown parameter ψj.

Chained equations draws the imputations using an iterative algorithm, typically with 10 to 20 iterations [15]. To start off, the missing values of each incomplete variable are replaced by its mean or a random sample of its observed values. Suppose, without loss of generality, that variables X1,…,XR (R ≤ K) are incomplete and variables XR+1,…,XK are fully observed. Given the imputations from the last iteration <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M5','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M5">View MathML</a>, iteration t of the chained equations algorithm consists of the following draws [18]

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M6','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M6">View MathML</a>

During each iteration the following two steps are applied to each incomplete variable Xj in turn: <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M7','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M7">View MathML</a> is drawn from the posterior distribution proportional to <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M8','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M8">View MathML</a> and missing values <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M9','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M9">View MathML</a> are drawn from the predictive posterior <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M10','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M10">View MathML</a>. The imputations from the last iteration form the imputed dataset. The whole iterative algorithm is repeated to obtain further imputed datasets.

Equivalence of joint modelling and chained equations imputation

We investigated, in finite samples, sufficient conditions under which a chained equations algorithm with compatible conditional models imputes missing data from the predictive distribution of the missing data implied by the joint model and its accompanying prior. We provide examples of chained equations algorithms (with compatible conditional models) where our identified condition is satisfied and examples where it is not satisfied.

Simulation study

We conducted a simulation study to explore the consequences for chained equations imputation when the conditional models were compatible with the same joint model but the non-informative margins condition of Proposition 1 was not satisfied. In particular, we looked for evidence of “order effects”, where the distribution from which the final imputed values of the variables were drawn differed according to the order in which the variables were updated in the chained equations sampler. If the chained equations algorithm imputes all variables from the predictive distribution of the missing data implied by a specific joint model, then order effects cannot occur [22]. Thus, the existence of order effects implies that the chained equations algorithm is not equivalent to imputing from any joint model.

The simulation study was based on a general location model, discussed in the Theoretical results section below, with one incomplete binary variable Y and two continuous variables W1 and W2, where W1 was also incompletely observed. We compared joint modelling imputation under the general location model, considered as a gold standard, with the chained equations algorithm that imputes the binary variable Y under a logistic regression model and the continuous variable W1 under a normal linear regression model.

We generated 500 datasets, each with a sample size of 100. For each dataset, the rows were independent, identically distributed realizations of the general location model Y ∼ Bernoulli (3/10), W1 ∣ Y ∼ N (10 + βY,9) and W2 ∣ W1,Y ∼ N (9 + 8/9 + 1/9W1 + βY,8 + 8/9). The data model was a simplified version of data that can occur in the medical literature [23]. The simulation study was repeated when β, the regression coefficient for covariate Y, was set to 1 and 3. The analysis of interest was the normal linear regression of W2 on W1 and Y. To ensure that any observed order effects could only be due to the failure of the non-informative margins condition we considered the simplest setting, that of data missing completely at random [20] p. 16, and set the values of Y and W1 to be missing for the first 50 individuals in the dataset. Below we describe the joint modelling imputation procedure and the chained equations algorithm that were separately applied to the same 500 datasets.

We used the data augmentation algorithm (as described under the heading “Joint modelling imputation”) to perform joint modelling imputation under the general location model and the joint prior given in the general location example (see example 4 of the Results), setting hyperparameters τ = ν = 1/2 and κ = 3/2. The number of imputed datasets generated, the burn-in period and the number of iterations between imputed datasets was 100. The analysis model was applied to each dataset separately and the mean of the multiple estimates of β, the coefficient for Y, was calculated.

In the (standard) chained equations algorithm, a logistic regression model for Y given W1 and W2 was first fitted to those rows of the dataset in which Y was observed. Let <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M11','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M11">View MathML</a> denote the maximum likelihood estimate of the parameters of this model and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M12','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M12">View MathML</a> denote its associated estimated variance-covariance matrix. A draw <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M13','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M13">View MathML</a> was then made from the multivariate normal approximation <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M14','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M14">View MathML</a> and used to impute the missing Y values. The continuous variable W1 was imputed using the linear regression model W1 ∣ Y,W2 ∼ N (λ + ξY + ϕW2,ω) and prior distribution p (λ,ξ,ϕ,ω) ∝ ω-3/2.

To start off the chained equations algorithm the missing values of Y and W1 were replaced with a random sample of their observed values. We augmented the chained equations algorithm such that, within each iteration we fitted the analysis model immediately after updating the binary variable Y and also immediately after updating the continuous variable W1. The simulation study focused on systematic differences between the two resulting estimates of β. Given the imputations from the last iteration <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M15','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M15">View MathML</a>, iteration t of the augmented chained equations algorithm consisted of the following steps:

1. Generate y(t) by imputing values for the missing binary observations, conditioning on <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M16','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M16">View MathML</a> and w2.

2. Linearly regress w2 on <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M17','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M17">View MathML</a> and y(t) and store the estimate for the coefficient of Y, denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M18','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M18">View MathML</a>.

3. Generate <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M19','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M19">View MathML</a> by imputing values for the missing continuous observations, conditioning on y(t) and w2.

4. Linearly regress w2 on <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M20','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M20">View MathML</a> and y(t) and store the estimate for the coefficient of Y, denoted by <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M21','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M21">View MathML</a>.

The chained equations algorithm was implemented with 10010 iterations. The first 10 iterations were regarded as burn-in and the estimates from these iterations discarded. The remaining 10000 estimates of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M22','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M22">View MathML</a> were averaged, and likewise for <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M23','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M23">View MathML</a>. We denote these means as <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M24','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M24">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M25','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M25">View MathML</a> and their difference by <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M26','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M26">View MathML</a>. The quantity <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M27','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M27">View MathML</a> can be interpreted as an estimate of the order effect for imputation in one dataset. We estimated the (Monte Carlo) standard error of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M28','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M28">View MathML</a> using the batch-means method, a method for computing standard errors for correlated output [24] p. 124, and calculated a 95% confidence interval from this.

Linear discriminant analysis is an alternative way to estimate a logistic regression [25,26]. A modified chained equations algorithm using linear discriminant analysis on all individuals with observed Y has been proposed as an alternative way to impute the binary variable Y[27]. Because the linear discriminant likelihood is the joint distribution of Y, W1 and W2, this model has the advantage of recovering some information about ψY in the W margin. We repeated the simulation study using this modified chained equations algorithm.

We assessed the sensitivity of our results by repeating the simulation study using different specifications. For joint modelling imputation we increased the number of imputed datasets generated, the burn-in period and the number of iterations between imputed datasets to 250. For the standard and modified chained equations procedures we (1) increased the burn-in period of the chained equations sampler to 1000 iterations and (2) sampled every 50th iteration thereby reducing serial correlation (with a burn-in period of 10 iterations). To check that our results were not dependent upon our choice of prior distributions we repeated the simulation study with improper imputation procedures; that is, using maximum likelihood estimates of ψj instead of Bayesian draws ψj from its posterior distribution <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M29','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M29">View MathML</a>. Lastly, we also repeated the simulation study with a sample size of 1000 observations.

Results

Theoretical results

Equivalence of joint modelling and chained equations imputation

In this section, we give our key result Proposition 1, which shows that, in finite samples, compatibility of the conditionals and our proposed non-informative margins condition are sufficient for chained equations and joint modelling to yield imputations from the same predictive distribution.

Consider a joint model p (x ∣ θ) and prior p (θ). From here onwards we shall use p (.∣.) to refer to any probability distribution derived from this joint model. In particular, for each j = 1,…,R, p (xj ∣ x-j,θ) is the conditional distribution of xj given x-j and θ implied by the joint model, and p(x-j ∣ θ) is the conditional distribution of x-j given θ. The distribution of x given θ factorizes as

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M30','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M30">View MathML</a>

Let ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M31','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M31">View MathML</a> be functions of θ such that p (xj ∣ x-j,θ) = p(xj ∣ x-j,ψj) and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M32','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M32">View MathML</a>. That is, the distribution of xj given x-j and the distribution of x-j depend on θ only through the functions ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M33','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M33">View MathML</a>, respectively, of θ. Let <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M34','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M34">View MathML</a> denote the joint prior for ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M35','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M35">View MathML</a> implied by p (θ), and let p (ψj) and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M36','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M36">View MathML</a> denote the corresponding marginal priors.

The chained equations algorithm applies the following two steps for each xj in turn: Step CE1 Draw <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M37','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M37">View MathML</a> from the distribution proportional to <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M38','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M38">View MathML</a>. Step CE2 Draw <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M39','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M39">View MathML</a> from <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M40','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M40">View MathML</a>.

The choice of the parameterizations ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M41','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M41">View MathML</a> does not affect the output of step CE2, but a parsimonious choice will help to make the condition of Proposition 1 hold.

Proposition 1. Upon convergence, the chained equations algorithm defined by CE1 and CE2 and the joint model p (x ∣ θ) and prior p (θ) draw from the same predictive distribution of xmis if, for each incomplete variable xj, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M42','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M42">View MathML</a>, i.e. if the joint prior distribution for ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M43','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M43">View MathML</a> factorizes into independent priors for ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M44','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M44">View MathML</a>.

Proof. Using the condition of Proposition 1,

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M45','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M45">View MathML</a>

Now, integrating out <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M46','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M46">View MathML</a>,

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M47','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M47">View MathML</a>

Therefore, step CE1 yields a draw from <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M48','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M48">View MathML</a>.

Next, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M49','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M49">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M50','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M50">View MathML</a> are conditionally independent given <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M51','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M51">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M52','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M52">View MathML</a> so <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M53','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M53">View MathML</a> and step CE2 yields a draw from <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M54','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M54">View MathML</a>.

Hence steps CE1 and CE2 together yield a draw from <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M55','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M55">View MathML</a>, and in particular they draw <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M56','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M56">View MathML</a> from <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M57','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M57">View MathML</a>. The latter is a full-conditional distribution corresponding to the joint density p (x) implied by the joint model. Once <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M58','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M58">View MathML</a> has been sampled, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M59','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M59">View MathML</a>, the sampled value of ψj, is not used again in the chained equations algorithm. So, the application of steps CE1 and CE2 to each j in turn and then iterating is a Gibbs sampler which, at convergence, yields a draw from p(xmis ∣ xobs), the predictive distribution implied by the joint model and its accompanying prior.

Comment 1. The condition of Proposition 1 does not hold if the conditional and marginal parameters ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M60','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M60">View MathML</a> are not distinct (i.e. if their joint parameter space is not the product of their separate parameter spaces), and in particular if the combined dimension of ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M61','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M61">View MathML</a> is greater than that of θ. Distinctness of parameter spaces is a property of the model p (x ∣ θ) and not of the prior p (θ). It will be used in the examples of the unsaturated multinomial distribution and the general location model to identify joint models where the condition of Proposition 1 does not hold. Comment 2. Heuristically, the condition of Proposition 1 says that there is no information about ψj  in the marginal distribution of x-j, so we call it the “non-informative margins” condition. When such information does exist, it is used by the joint modelling sampler but not by the chained equations algorithm, so they may draw from different distributions. Our simulation study will show that this occurs in an example by demonstrating order effects. Comment 3. As the non-informative margins condition has only been shown to be sufficient, then potentially if this condition is not satisfied the chained equations algorithm defined by CE1 and CE2 and the joint model p (xθ) and prior p (θ) could still draw from the same predictive distribution of xmis. Comment 4. This proof holds for improper prior distributions provided the posterior distributions are proper. Comment 5. When the non-informative margins condition holds true the chained equations algorithm is a Gibbs sampler, and so order effects cannot occur [22].

Example 1: Multivariate normal

Consider the multivariate normal joint model X ∼ N (μ,Σ) with parameter θ = (μ,Σ) and prior distribution p (θ) ∝ |Σ | -κ<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M62','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M62">View MathML</a>. We show that the corresponding chained equations algorithm, which imputes under a set of normal linear regression models, satisfies the non-informative margins condition of Proposition 1 (and hence draws from the same joint model as joint modelling imputation).

For each j we partition the mean vector μ as <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M63','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M63">View MathML</a> and the covariance matrix Σ as

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M64','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M64">View MathML</a>

such that Xj ∼ N (μj,σj) and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M65','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M65">View MathML</a>. The conditional distribution of Xj given X-j is the normal linear regression model <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M66','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M66">View MathML</a> where <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M67','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M67">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M68','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M68">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M69','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M69">View MathML</a>[2] p. 157. Using our notation for chained equations imputation (see under the subsection “Chained equations imputation” of the Methods section) ψj = (αj,βj,ωj) and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M70','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M70">View MathML</a>.

The joint prior for ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M71','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M71">View MathML</a> derived from p (θ) is <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M72','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M72">View MathML</a>[2] p. 158-159. So, using a standard result from matrix algebra for the determinant of a partitioned matrix,

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M73','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M73">View MathML</a>

Therefore, the non-informative margins condition of Proposition 1 is satisfied.

Example 2: Saturated multinomial distribution

We now show for joint modelling imputation under a saturated multinomial model and a Dirichlet prior for θ, that the corresponding chained equations algorithm satisfies the non-informative margins condition of Proposition 1.

Consider K categorical variables X = (X1,…,XK), where each Xj takes possible values 1,…,Ij (j = 1,…,K). Variables X define a K-way contingency table. Let c = (c1,…,cK) denote a generic cell of the contingency table, θc denote the cell probability pr (X = c) and  denote the set of all cells of the contingency table. The joint distribution of X is a multinomial distribution with parameter θ = (θc : c ∈ ) and index equal to 1.

Summing the table counts over variable Xj produces a collapsed contingency table defined by variables X-j. Let c-j = (c1,…,cj-1,cj+1,…,cK) denote a generic cell of the collapsed table, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M74','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M74">View MathML</a> denote the cell probability pr (X-j = c-j) and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M75','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M75">View MathML</a> denote the set of all cells of the collapsed table. The marginal distribution of X-j  is multinomial with parameter <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M76','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M76">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M77','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M77">View MathML</a>. The conditional distribution of Xj given X-j = c-j is the multinomial distribution with parameters <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M78','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M78">View MathML</a>. So, the full set of parameters for the conditional distribution of Xj given X-j is <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M79','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M79">View MathML</a>.

If the prior distribution for θ is Dirichlet with hyperparameter α = {αc : c ∈ }, then the implied prior distributions for <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M80','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M80">View MathML</a> and ψj are independent: the prior for <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M81','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M81">View MathML</a> is Dirichlet with hyperparameter <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M82','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M82">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M83','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M83">View MathML</a>, and the prior distribution for ψj is the product of the set of independent Dirichlet distributions <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M84','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M84">View MathML</a>, where <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M85','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M85">View MathML</a> is a subset of α[2] p. 256. Since the prior for <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M86','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M86">View MathML</a> can be factored into independent distributions for ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M87','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M87">View MathML</a>, the non-informative margins condition of Proposition 1 is satisfied.

Example 3: Unsaturated multinomial distribution

When the joint model is an unsaturated multinomial model, we give an example where the conditional and marginal parameters ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M88','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M88">View MathML</a> are not distinct (see comment 1 of Proposition 1). Consequently the non-informative margins condition of Proposition 1 is not satisfied.

Consider K categorical variables X = (X1,…,XK) as described in the saturated multinomial example. Assume that all cell probabilities are positive, θ (c) gt; 0 for all c, to ensure that every multinomial distribution considered can be written as a log linear model and that all possible conditional distributions exist [28] p. 202. Let the joint model be the hierarchical log linear model that contains all two-way factors between the K variables and no higher order factors. We shall refer to this as the all two-way factor hierarchical model. Under this model, for any j the conditional distribution of Xj given X-j follows a multinomial logistic regression model (or a logistic regression model when Xj is binary) where the regression model includes variables X-j as main effects only (i.e. no interaction terms).

In generating class notation [29] the all two-way factor hierarchical model is written as [ 1 2]…[ 1 K]… [ (K - 1) K]. Any hierarchical log linear model for X can be represented by an undirected graph in which the graph vertices are the variables X1,…,XK and two vertices Xg and Xh are connected by an edge if and only if the log linear model contains two-factor or higher order terms involving variables Xg and Xh. Any model containing all two-factor terms is therefore represented by the complete graph. Asmussen and Edwards [29] state that two vertices in a graph are adjacent if there is an edge between them. For any subset a of the vertices, Asmussen and Edwards define the boundary of a to be the set of vertices that are not in a but that are adjacent to one or more vertices in a. So, for the complete graph, the boundary of Xj is X-j. Theorem 2.3 of [29] states that a hierarchical log linear model is collapsible onto X-j if and only if the boundary of Xj is contained in a generator of the hierarchical log linear model. Further, Theorem 4.1 of [29] states that if a hierarchical log linear model is not collapsible onto X-j, then parameters ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M89','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M89">View MathML</a> are not distinct. For any j, X-j, the boundary of Xj in the complete graph, contains all of the remaining K - 1 variables. When K ≥ 4, X-j is not contained in any of the generators, [ 1 2]…[ 1 K]… [ (K - 1) K], of the all two-way factor hierarchical model, and hence the log linear model is not collapsible onto X-j. Therefore, parameters ψj and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M90','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M90">View MathML</a> are not distinct, and so the non-informative margins condition of Proposition 1 is not satisfied when K ≥ 4.

Example 4: General location model

We give an example of a chained equations algorithm derived from joint modelling under a general location model that does not satisfy the non-informative margins condition of Proposition 1. Our simulation study is based on this example.

Suppose that the data on each individual consist of one incomplete binary variable Y and K - 1 continuous variables W = (W1,…,WK - 1)T, where one or more of the continuous variables are also incomplete. Let the joint distribution of X = (Y,W)T be the general location model Y ∼ Bernoulli (γ) and W ∣ Y ∼ N (μ0 + μ1Y,Σ), for unknown parameters θ = (γ,μ0,μ1,Σ) [30]. Let the joint prior for θ be p (θ) = γτ-1 (1-γ)ν-1 ∣ Σ ∣ -κ with hyperparameters τ,ν gt; 0 and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M91','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M91">View MathML</a>.

From the multivariate normal example above it is straightforward to show that the non-informative margins condition of Proposition 1 holds for imputing any Wj.

The conditional distribution of Y given W, p(Y ∣ W, ψY), is the logistic regression model with covariates W[25]. The marginal distribution of W, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M92','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M92">View MathML</a>, can be written as a mixture of normal distributions

<a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M93','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M93">View MathML</a>

This cannot be parameterized more parsimoniously than <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M94','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M94">View MathML</a>. As ψY is a function of θ, it is determined by <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M95','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M95">View MathML</a>.

Consequently, given <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M96','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M96">View MathML</a> the parameters of the logistic regression model ψY are fully determined, and so <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M97','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M97">View MathML</a> and ψY are not distinct. Therefore, as discussed in comment 1 above, the non-informative margins condition of Proposition 1 does not hold.

Using the same argument, the non-informative margins condition of Proposition 1 does not hold for a chained equations algorithm derived from joint modelling under the restricted general location model, with cell probabilities restricted by the all two-factor hierarchical model (discussed in the unsaturated multinomial example) and cell means restricted to be a linear function of the categorical variables.

Simulation study results

This section reports the results of the simulation study, where the chained equations conditional models were compatible with the same joint model but the non-informative margins condition of Proposition 1 was not satisfied.

Figures 1a and 1b show, for the first 30 of the 500 datasets, the value of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M98','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M98">View MathML</a> (estimate of the order effect in one dataset) along with its 95% confidence interval, for the (standard) chained equations procedure (i.e., binary variable Y imputed under the logistic regression model). In a number of the datasets the 95% confidence intervals did not cross zero. Thus, there was clear evidence of order effects, with the magnitude of such effects varying between datasets. Such statistically significant evidence of an order effect occurred in 164 and 386 of the 500 datasets for β = 1 and β = 3 respectively. The range of absolute values of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M99','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M99">View MathML</a> was larger for β = 3 than for β = 1. These results confirm that the chained equations procedure was not equivalent to any joint model procedure.

thumbnailFigure 1. Four forest plots of the posterior mean differences <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M101','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M101">View MathML</a>. Each panel is a forest plot of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M102','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M102">View MathML</a> for the first 30 datasets, 95% confidence intervals calculated using the Monte Carlo standard error. Panels (a) and (b)  correspond to binary variable Y imputed under the logistic regression model, with β = 1 and β = 3 respectively. Panels (c) and (d) correspond to Y imputed under the linear discriminant model, with β = 1 and β = 3 respectively.

The average magnitudes of the order effects <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M103','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M103">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M104','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M104">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M105','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M105">View MathML</a> over the 500 datasets are shown in Table 1. Consider the results for the chained equations algorithm (labelled LR). In keeping with Figure 1, the average magnitude of the order effect was larger for β = 3 than β = 1. The means of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M106','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M106">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M107','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M107">View MathML</a> did not differ systematically, consistent with the direction of the order effect being arbitrary and dataset dependent. Estimates <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M108','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M108">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M109','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M109">View MathML</a> appeared to be equally good estimates of β.

Table 1. Over 500 datasets, average of the complete case estimates, joint modelling imputation estimates and values of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M110','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M110">View MathML</a>, <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M111','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M111">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M112','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M112">View MathML</a> for the chained equations algorithm and the modified chained equations algorithm, with confidence intervals [mean ± 1 . 96 × (standard deviation ÷ 500)]

The forest plots of <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M118','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M118">View MathML</a> with 95% confidence intervals corresponding to the modified chained equations algorithm (i.e Y imputed under the linear discriminant model) are shown in Figures 1c and 1d. Order effects were smaller than in Figures 1a and b, but were still present because the modified algorithm did not use information about W1 and W2 when Y was missing. Out of the 500 datasets the number of datasets that showed statistically significant evidence of an order effect was 113 for β = 1 and 330 for β = 3.

From Table 1, the complete case estimates of β were unbiased. When β = 3 the linear discriminant analysis values <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M119','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M119">View MathML</a> and <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M120','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M120">View MathML</a>, and the joint modelling imputation estimate <a onClick="popup('http://www.biomedcentral.com/1471-2288/14/28/mathml/M121','MathML',630,470);return false;" target="_blank" href="http://www.biomedcentral.com/1471-2288/14/28/mathml/M121">View MathML</a> were slightly biased towards the null. This bias was due to the prior (which was the same for imputation under the linear discriminant analysis model and joint modelling imputation under the general location model) and it disappeared when the sample size was increased to 1000 observations and when the simulation study was conducted using improper imputation; i.e using maximum likelihood estimates instead of Bayesian draws of parameters. For joint modelling imputation, and the standard and modified chained equations procedures changing their specifications (e.g., larger number of iterations, burn-in period) gave the same pattern of results as above (results not shown but available on request from the authors).

In preliminary simulation studies, when the non-informative margins condition was satisfied the results were consistent with a zero order effect (results not shown but available on request from the authors).

Discussion

We have defined a non-informative margins condition which, together with compatibility of the conditional models, we have proved is sufficient for a chained equations algorithm to impute missing data from the predictive distribution of the missing data implied by the joint model and its prior distribution. Also, we have shown that compatibility of the conditional models is not alone a sufficient condition. In a scenario where the conditionals models were compatible but the non-informative margins condition failed, our simulation study showed that the distribution from which the final imputed values of the variables were drawn differed, in a dataset-dependent manner, according to the order in which the variables were updated in the chained equations sampler.

In work that is complementary to the finite-sample results presented in this paper, Liu et al. [19] identified sufficient conditions for chained equations imputation and imputation under a fully Bayesian model to be asymptotically equivalent; that is, for the supremum of the difference between the two imputation distributions to converge to zero as the sample size tends to infinity. This implies that when the non-informative margins condition is not satisfied but the conditional models are compatible with the same joint model, the order effects identified in our simulation study will disappear as the sample size tends to infinity.

In our simulation study the average magnitude of the order effects was small and did not induce bias. Given that chained equations imputation is a widely used approach to imputation, these results are somewhat reassuring. However, the scope of these simulations was limited and it remains possible that chained equations imputation could lead to more substantial bias in different situations; for example, when there are many partially observed variables.

When the non-informative margins condition does not hold we expect some loss of efficiency in general (because some information is discarded). However, in our simulation study we did not detect any sizable loss of efficiency. The issue of variance estimation for chained equations imputation is beyond the scope of this paper.

The advantage of chained equations imputation is that we do not need to specify the joint distribution of the variables. In cases where it is not known that there is a joint distribution, several methods for checking compatibility have been proposed (e.g., [16,31-35]). In practice, these methods are either limited to discrete distributions or are difficult to apply for multivariate distributions of more than 2 or 3 dimensions. This means that it may not be possible to check that the conditionals are compatible with the same joint model or that our non-informative margins condition holds true. van Buuren and other authors [3,18], in the examples they considered, concluded that chained equations is a robust approach even when the set of conditionals are not compatible with the same joint model. The findings of our simulation study support this body of work. Other studies [3,4,36,37] have compared chained equations and joint modelling, when missingness is multivariate, nonmonotone and ignorable, in settings which reflect real data (e.g. mixture of different types of variables, non-linear relationships and interactions between variables, semi-continuous variables). None of these studies has reported substantial differences in the performances of joint modelling imputation and chained equations imputation. Nonetheless many authors emphasize the need for further understanding of the theoretical underpinnings of the chained equations approach and the establishment of the robustness of the chained equations method (e.g. [3,7,14,38]).

Conclusions

In finite samples, compatibility of the conditionals and our non-informative margins condition are sufficient for chained equations and joint modelling to yield imputations from the same predictive distribution. Furthermore, our simulation study demonstrated that, even in a simple setting, a chained equations procedure that does not satisfy the non-informative margins condition is not necessarily equivalent to a joint model procedure, even though when its conditional models are compatible.

When conditionals are incompatible or the non-informative margins condition is not satisfied, the distribution from which the imputed values are drawn can differ according to the order in which the variables are updated in the chained equations sampler, thereby introducing order effects.

Given the widespread use of chained equations imputation in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous. As noted by van Buuren [3], further work is needed to verify the robustness of chained equations to incompatibility of the conditional models in more general and realistic settings. Equally, future work could evaluate the robustness of chained equations imputation when the sample size is small-to-moderate, the conditionals are compatible and the non-informative margins condition is not satisfied.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

IRW and JRC proposed the project. All authors contributed to the examples and/or the simulation study. RAH carried out the programming for the simulation study. RAH drafted the manuscript, and RAH, IRW and SRS edited the manuscript. All authors read and approved the final manuscript.

Acknowledgements

This work was supported by the Medical Research Council [Grant G0900724, Unit Programme number U105260558].

References

  1. Rubin DB: Multiple Imputation for Nonresponse in Surveys. New York: John Wiley & Sons, Inc; 1987. OpenURL

  2. Schafer JL: Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997. OpenURL

  3. van Buuren S: Multiple imputation of discrete and continuous data by fully conditional specification.

    Stat Methods Med Res 2007, 16:219-242. PubMed Abstract | Publisher Full Text OpenURL

  4. Raghunathan TE, Lepkowski JM, van Hoewyk J, Solenberger P: A multivariate technique for multiply imputing missing values using a sequence of regression models.

    Surv Methodol 2001, 27:85-95. OpenURL

  5. Gelman A, Raghunathan TE: [Conditionally specified distributions: An introduction]: comment.

    Stat Sci 2001, 16:268-269. OpenURL

  6. van Buuren S, Boshuizen HC, Knook DL: Multiple imputation of missing blood pressure covariates in survival analysis.

    Statist Med 1999, 18:681-694. Publisher Full Text OpenURL

  7. van Buuren S, Groothuis-Oudshoorn K: mice: Multivariate Imputation by Chained Equations in R.

    J Stat Softw 2011, 45:1-67. OpenURL

  8. Morgenstern M, Wiborg G, Isensee B, Hanewinkel R: School-based alcohol education: Results of a cluster-randomized controlled trial.

    Addiction 2009, 104:402-412. PubMed Abstract | Publisher Full Text OpenURL

  9. Mueller B, Cummings P, Rivara F, Brooks M, Terasaki R: Injuries of the head, face, and neck in relation to ski helmet use.

    Epidemiology 2008, 19:270-276. PubMed Abstract | Publisher Full Text OpenURL

  10. Nash D, Katyal M, Brinkhof M, Keiser O, May M, Hughes R, Dabis F, Wood R, Sprinz E, Schechter M, Egger M: Long-term immunologic response to antiretroviral therapy in low-income countries: A collaborative analysis of prospective studies.

    AIDS 2008, 22:2291-2302. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  11. Souverein O, Zwinderman A, Tanck T: Multiple imputation of missing genotype data for unrelated individuals.

    Ann Hum Genet 2006, 70:372-381. PubMed Abstract | Publisher Full Text OpenURL

  12. Huo D, Adebamowo C, Ogundiran T, Akang E, Campbell O, Adenipekun A, Cummings S, Fackenthal J, Ademuyiwa F, Ahsan H, Olopade O: Parity and breastfeeding are protective against breast cancer in nigerian women.

    Br J Cancer 2008, 98:992-996. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  13. Wiles N, Jones G, Haase A, Lawlor D, Macfarlane G, Lewis G: Physical activity and emotional problems amongst adolescents.

    Soc Psychiatry Psychiatric Epidemiol 2008, 43:765-772. Publisher Full Text OpenURL

  14. Azur M, Stuart E, Frangakis C, Leaf P: Multiple imputation by chained equations: what is it and how does it work?

    Int J Methods Psychiatric Res 2011, 20:40-49. Publisher Full Text OpenURL

  15. White IR, Royston P, Wood A: Multiple imputation using chained equations: Issues and guidance for practice.

    Stat Med 2011, 30:377-399. PubMed Abstract | Publisher Full Text OpenURL

  16. Arnold BC, Press SJ: Compatible conditional distributions.

    J Am Statist Assoc 1989, 84:152-156. Publisher Full Text OpenURL

  17. Heckerman D, Chickering DM, Meek C, Rounthwaite R, Kadie C: Dependency networks for inference, collaborative filtering, and data visualization.

    J Mach Learn Res 2000, 1:49-75. OpenURL

  18. van Buuren S, Brand JPL, Groothuis-Oudshoorn CGM, Rubin DB: Fully conditional specification in multivariate imputation.

    J Stat Comput Simulat 2006, 76:1049-1064. Publisher Full Text OpenURL

  19. Liu J, Gelman A, Hill J, Su Y, Kropko J: On the stationary distribution of iterative imputations.

    Biometrika 2013.

    doi: 10.1093/biomet/ast044

    OpenURL

  20. Little RJA, Rubin DB: Statistical Analysis with Missing Data. New York: John Wiley & Sons, Inc; 2002. OpenURL

  21. Tanner MA, Wong WH: The calculation of posterior distributions by data augmentation.

    J Am Statist Assoc 1987, 82:528-540. Publisher Full Text OpenURL

  22. Arnold B, Castillo E, Sarabia J: Conditionally specified distributions an introduction.

    Stat Sci 2001, 16:249-265. Publisher Full Text OpenURL

  23. Kirkwood B, Sterne J: Essential Medical Statistics. Hoboken, New Jersey, US: Wiley-Blackwell; 2003. OpenURL

  24. Albert J: Bayesian computation with R. Dordrecht, Heidelberg, London, New York: Springer; 2009. OpenURL

  25. Efron B: The efficiency of logistic regression compared to normal discriminant analysis.

    J Am Statist Assoc 1975, 70:892-898. Publisher Full Text OpenURL

  26. Cox D, Snell E: Analysis of Binary Data. second edition. London, UK: Chapman and Hall; 1989. OpenURL

  27. van Buuren S, Oudshoorn C:

    Multivariate Imputation by Chained Equations (Mice V1.0 User’s Manual). 2000.

    http://www.stefvanbuuren.nl/publications/MICE\%20V1.0\%20Manual\%20TNO00038\%202000.pdf webcite

    OpenURL

  28. Whittaker J: Graphical models in applied multivariate statistics. New York: John Wiley & Sons, Inc; 1990. OpenURL

  29. Asmussen S, Edwards D: Collapsibility and response variables in contingency tables.

    Biometrika 1983, 70:567-578. Publisher Full Text OpenURL

  30. Olkin I, Tate RF: Multivariate correlation models with mixed discrete and continuous variables.

    Ann Math Stat 1961, 32:448-465. Publisher Full Text OpenURL

  31. Arnold B, Castillo E, Sarabia J: Compatibility of partial or complete conditional probability specifications.

    J Stat Plann Inference 2004, 123:133-159. Publisher Full Text OpenURL

  32. Ip E, Wang Y: Canonical representation of conditionally specified multivariate discrete distributions.

    J Multivariate Anal 2009, 100:1282-1290. Publisher Full Text OpenURL

  33. Tian G, Tan M, Ng K, Tang M: A unified method for checking compatibility and uniqueness for discrete conditional distributions.

    Commun Stat: Theory Methods 2009, 38:115-129. OpenURL

  34. Chen H: Compatibility of conditionally specified models.

    Stat Probability Lett 2010, 80:670-677. Publisher Full Text OpenURL

  35. Kuo K, Wang Y: A simple algorithm for checking compatibility among discrete distributions.

    Comput Stat Data Anal 2011, 55:2457-2462. Publisher Full Text OpenURL

  36. Horton N, Kleinman K: Much ado about nothing: A comparison of missing data methods and software to fit incomplete data regression models.

    Am Stat 2007, 61:79-90. PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  37. Yu LM, Burton A, Rivero-Arias O: Evaluation of software for multiple imputation of semi-continuous data.

    Stat Methods Med Res 2007, 16:243-258. PubMed Abstract | Publisher Full Text OpenURL

  38. Kenward M, Carpenter J: Multiple imputation: current perspectives.

    Stat Methods Med Res 2007, 16:199-218. PubMed Abstract | Publisher Full Text OpenURL

Pre-publication history

The pre-publication history for this paper can be accessed here:

http://www.biomedcentral.com/1471-2288/14/28/prepub