Email updates

Keep up to date with the latest news and content from BMC Medical Research Methodology and BioMed Central.

Open Access Highly Accessed Research article

Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

Jochen Hardt1*, Max Herke1 and Rainer Leonhart2

Author Affiliations

1 Medical Psychology and Medical Sociology, Clinic for Psychosomatic Medicine and Psychotherapy, University of Mainz, Duesbergweg 6, Mainz 55128, Germany

2 Social Psychology and Methods, University of Freiburg, Engelberger Straße 41, Freiburg, 79106, Germany

For all author emails, please log on.

BMC Medical Research Methodology 2012, 12:184  doi:10.1186/1471-2288-12-184

Published: 5 December 2012

Abstract

Background

Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit.

Methods

A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10) vs. moderate correlations (r=.50) with X’s and Y.

Results

The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful.

Conclusion

More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

Keywords:
Multiple imputation; Auxiliary variables; Simulation study; Small and medium size samples