Email updates

Keep up to date with the latest news and content from BMC Medical Research Methodology and BioMed Central.

Open Access Highly Accessed Research article

Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

Jochen Hardt*, Max Herke and Rainer Leonhart

BMC Medical Research Methodology 2012, 12:184  doi:10.1186/1471-2288-12-184

PubMed Commons is an experimental system of commenting on PubMed abstracts, introduced in October 2013. Comments are displayed on the abstract page, but during the initial closed pilot, only registered users can read or post comments. Any researcher who is listed as an author of an article indexed by PubMed is entitled to participate in the pilot. If you would like to participate and need an invitation, please email, giving the PubMed ID of an article on which you are an author. For more information, see the PubMed Commons FAQ.

Ridge parameter

Stef van Buuren   (2013-02-15 14:56)  Netherlands Organization for Applied Scientific Research TNO

The paper by Hardt, Herke and Leonhart is a welcome addition to the literature. It warns against simplistic approaches that throw just anything into the imputation model. While the imputation model is generally robust against including junk variables, the paper clearly demonstrates that we should not drive this to the edge. In general building the imputation model requires appropriate care. My personal experience is that it is not beneficial to include more than -say- 25 well-chosen variables into the imputation model.

In their simulations the authors investigate cases where the number of variables specified in the imputation model exceeds the number of cases. Many programs break down in this case, but MICE will run because it uses ridge regression instead of the usual OLS estimate. The price for this increased computational stability is -as confirmed by Hardt et all - that the parameters estimates will be biased towards zero. It is therefore likely that some of the bias observed by the authors is not intrinsic to PMM, but rather due to the setting of the ridge parameter (the default value 1E-5 may be easily changed as mice(..., ridge = 1E-6)). Would a tighter ridge setting (e.g., 1E-6 or 1E-7) appreciably reduce the bias?

The '1 out of 3 of the complete cases' rule is interesting and easily remembered. However, a complication in practice is that there are often no complete cases in real data, especially in merged datasets. What would the authors think of the slightly more liberal rule 'n/3 variables'?

Stef van Buuren

Competing interests

Author of the mice package.


Post a comment