Open Access Open Badges Research article

Estimation methods with ordered exposure subject to measurement error and missingness in semi-ecological design

Hyang-Mi Kim1*, Chul Gyu Park2, Martie van Tongeren3 and Igor Burstyn4

Author Affiliations

1 Department of Mathematics and Statistics, University of Calgary, Calgary, Canada

2 School of Mathematics and Statistics, Carleton University, Ottawa, Canada

3 Centre for Human Exposure Science, Institute of Occupational Medicine, Edinburgh, UK

4 Department of Environmental and Occupational Health, Drexel University, Philadelphia, USA

For all author emails, please log on.

BMC Medical Research Methodology 2012, 12:135  doi:10.1186/1471-2288-12-135

Published: 4 September 2012



In epidemiological studies, it is often not possible to measure accurately exposures of participants even if their response variable can be measured without error. When there are several groups of subjects, occupational epidemiologists employ group-based strategy (GBS) for exposure assessment to reduce bias due to measurement errors: individuals of a group/job within study sample are assigned commonly to the sample mean of exposure measurements from their group in evaluating the effect of exposure on the response. Therefore, exposure is estimated on an ecological level while health outcomes are ascertained for each subject. Such study design leads to negligible bias in risk estimates when group means are estimated from ‘large’ samples. However, in many cases, only a small number of observations are available to estimate the group means, and this causes bias in the observed exposure-disease association. Also, the analysis in a semi-ecological design may involve exposure data with the majority missing and the rest observed with measurement errors and complete response data collected with ascertainment.


In workplaces groups/jobs are naturally ordered and this could be incorporated in estimation procedure by constrained estimation methods together with the expectation and maximization (EM) algorithms for regression models having measurement error and missing values. Four methods were compared by a simulation study: naive complete-case analysis, GBS, the constrained GBS (CGBS), and the constrained expectation and maximization (CEM). We illustrated the methods in the analysis of decline in lung function due to exposures to carbon black.


Naive and GBS approaches were shown to be inadequate when the number of exposure measurements is too small to accurately estimate group means. The CEM method appears to be best among them when within each exposure group at least a ’moderate’ number of individuals have their exposures observed with error. However, compared with CEM, CGBS is easier to implement and has more desirable bias-reducing properties in the presence of substantial proportions of missing exposure data.


The CGBS approach could be useful for estimating exposure-disease association in semi-ecological studies when the true group means are ordered and the number of measured exposures in each group is small. These findings have important implication for cost-effective design of semi-ecological studies because they enable investigators to more reliably estimate exposure-disease associations with smaller exposure measurement campaign than with the analytical methods that were historically employed.

Constrained estimation; EM algorithm; Group-based strategy; Isotonic regression; Measurement errors