Open Access Research article

Dealing with missing data in the Center for Epidemiologic Studies Depression self-report scale: a study based on the French E3N cohort

Noémie Resseguier12, Hélène Verdoux34, Roch Giorgi2, Françoise Clavel-Chapelon5 and Xavier Paoletti16*

Author Affiliations

1 Institut Curie, Biostatistics Department, Paris, France

2 Inserm, UMR 912, Aix-Marseille Univ, UMR 912, SESSTIM, Marseille, France

3 University Victor Segalen Bordeaux 2, Bordeaux, France

4 Inserm U657, Bordeaux, France

5 Centre for Research in Epidemiology and Population Health, UMRS 1018, Team 9, Institut Gustave Roussy, Villejuif, France

6 Inserm U900 Institut Curie, Paris, France

For all author emails, please log on.

BMC Medical Research Methodology 2013, 13:28  doi:10.1186/1471-2288-13-28

Published: 21 February 2013

Abstract

Background

The Center for Epidemiologic Studies - Depression scale (CES-D) is a validated tool commonly used to screen depressive symptoms. As with any self-administered questionnaire, missing data are frequently observed and can strongly bias any inference. The objective of this study was to investigate the best approach for handling missing data in the CES-D scale.

Methods

Among the 71,412 women from the French E3N prospective cohort (Etude Epidémiologique auprès des femmes de la Mutuelle Générale de l’Education Nationale) who returned the questionnaire comprising the CES-D scale in 2005, 45% had missing values in the scale. The reasons for failure to complete certain items were investigated by semi-directive interviews on a random sample of 204 participants. The prevalence of high depressive symptoms (score ≥16, hDS) was estimated after applying various methods for ignorable missing data including multiple imputation using imputation models with CES-D items with or without covariates. The accuracy of imputation models was investigated. Various scenarios of nonignorable missing data mechanisms were investigated by a sensitivity analysis based on the mixture modelling approach.

Results

The interviews showed that participants were not reluctant to answer the CES-D scale. Possible reasons for nonresponse were identified. The prevalence of hDS among complete responders was 26.1%. After multiple imputation, the prevalence was 28.6%, 29.8% and 31.7% for women presenting up to 4, 10 and 20 missing values, respectively. The estimates were robust to the various imputation models investigated and to the scenarios of nonignorable missing data.

Conclusions

The CES-D scale can easily be used in large cohorts even in the presence of missing data. Based on the results from both a qualitative study and a sensitivity analysis under various scenarios of missing data mechanism in a population of women, missing data mechanism does not appear to be nonignorable and estimates are robust to departures from ignorability. Multiple imputation is recommended to reliably handle missing data in the CES-D scale.

Keywords:
CES-D; Cohort; Missing data; Multiple imputation; Non ignorable; Sensitivity analysis