Department of Statistics, University of British Columbia, Vancouver, BC, Canada

Department of Environmental and Occupational Health, School of Public Health, Drexel University, Philadelphia, PA, USA

Abstract

Background

In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a

Methods

Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed.

Results

The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed.

Conclusions

In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model.

Background

Measurement error refers to the variation of the observed measurement from the true value, and consists of two components, random error and systematic error. The first component, the random error, is caused by any factors that randomly affect the measurement across a sample, and usually arises from inaccuracy in a measuring laboratory instrument or random fluctuations in the environmental conditions. The second error component, the systematic error, is caused by any factors that systematically affect the measurement across a sample, and can be attributed to non-random problems in the system of measurement (e.g. wrong use or improper calibration of the measurement instrument).

In many scientific areas where statistical analysis is performed, the problem of dealing with explanatory variables subject to measurement error is present. In particular, in epidemiologic studies, the explanatory variables (or 'exposures') that reflect exposure to suspected risk factors associated with a disease (the outcome variable) are commonly measured with error. These errors can be either

Thus, in this paper, we develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies that may be generalized to different settings, where information regarding the measurement error variability is available from additional experiments. The methodology is illustrated using data from a study of association of perfluorinated acids (PFAs) with disruption of thyroid homeostasis in pregnant women

We start this paper by describing the data in the motivating example in detail, followed by derivation of an estimate of the random error variability from

Methods

Data

The developed Bayesian method is illustrated using individually matched case-control data from a study of Chan et al.

Chan et al. ^{th }percentile (less than 8.8 pmol/L). Meanwhile the controls correspond to women with normal TSH concentrations but having free T4 concentrations between the 50^{th }and 90^{th }percentiles (between 12.0 and 14.1 pmol/L). Each case was matched to between one and three controls on the basis of two matching factors: maternal age at blood draw (± 3 years) and referring physician (a total of 29 physicians). Further details on the construction of the data can be found in Chan et al.

In summary, the matched case-control data used to illustrate the Bayesian method to correct for measurement error contain information from 96 cases and 175 individually matched controls. For the purpose of this paper, it is assumed there is no misclassification of control/case status. In addition, the data contain, for each subject, the corresponding exposure to PFOA, PFOS and PFHxS, which are reported on a continuous scale in log- molar units and are assumed to be subject only to random measurement error. Moreover, four potential confounders which are precisely measured are reported: maternal age (years), maternal weight (pounds), maternal race (Caucasian and non-Caucasian) and gestational age (days). All potential confounders except for maternal race were reported on a continuous scale. The maternal age variable is retained despite its use as a matching factor, in case the matching is too coarse to fully eliminate confounding.

Measurement model

Generally, in observational studies, the vector of imprecise surrogate exposures

- W

- X

- X

- W

Assume the vector of independent surrogates

- W

where

- U

- W

- X

- W

- X

- U

- X

- 0

- U

- X

- X

- 0

Under the stated assumptions,

- W

- X

- W

- X

- X

where

- w

- x

For the particular case of data used in the study on PFAs, the surrogate variables are measured concentrations of PFOA, PFOS, and PFHxS, which correspond to the exposures to the compounds reported on a continuous scale in log-molar units. Consequently, an additive measurement error model for the exposures in log-molar units translates into a multiplicative error structure, in which the corresponding error term is proportional to the true exposure in molar scale. In many epidemiological studies, positive explanatory variables are subject to this sort of measurement error. Using available validation data from the quality control procedure performed by Chan et al.

Disease model

In order to describe a relationship between the true exposures and the probability associated to the response variable, it is necessary to specify a disease model. Since the study analysed in this paper involves matched sets, the conditional logistic regression likelihood is adopted.

Consider a study having _{
i
}) of the

- X

- Y

- X

- X

The conditional likelihood is obtained by conditioning on the number of cases in each matched set, i.e. conditioning on

where

- β

The parameter

- β

Bayesian model

Consider a retrospectively collected matched case-control data where each case is matched to one or more controls based on suspected confounders as matching factors. Let

- X

- W

The aim of this subsection is to develop a Bayesian method to understand the association between the vector of continuous exposures

- X

Under the Bayesian paradigm, the posterior density of the unknown quantities is given by

where θ refers to the vector of unknown parameters. The first term of the right hand side of (4) refers to the joint posterior distribution of the true exposures

- X

- W

Exposure model

The conditional logistic regression model has been successfully applied in matched retrospective case-control studies, and the use of this procedure has been statistically justified using Bayesian (see for example

We justify the use of conditional logistic regression likelihood as a disease model when adjusting for measurement error in an individually matched case-control study via a random-effect exposure model; details are presented in Appendix II. A different approach that does not involve a random-effect exposure model is provided by Gulo et al.

In order to describe the random-effect exposure model, we assume that the vector of exposures for the

and

where

- γ

- λ

It has been assumed that the vector of true exposures follows a

Joint posterior density

For the particular case of this paper, the data considered consist of _{
i
}subjects, with _{
i
}∈ {2,3,4}. Thus one subject is the case per set and the remaining _{
i
}- 1 subjects are the controls. Let

It is commonly assumed that the unknown parameters are independent of each other

where _{
P
}(_{
P
}is an identity matrix of size

- μ

Adjustment for additional confounders

Considering the possibility that confounding is only partially addressed by matching, further potential confounders can be introduced in the disease model. In general, potential confounders should also be included in the exposure model; however, for simplicity these confounders are not considered in our random-effect exposure model, keeping it as presented in equation (5). For the case of the PFA's data, this simplification might be justified by the fact that the exposures and the confounders exhibit small correlations (less than 0.18), so we do not expect the potential confounders to be very helpful in reconstructing the true exposures. In addition, due to the assumption of

Consider the situation where the

- Z

where

- δ

Thus, the posterior density of the unknown quantities can be rewritten as

where

- δ

- δ

- 0

Results and Discussion

In this section, the proposed Bayesian method to correct for measurement error is illustrated using data from the study of Chan et al.

The models are implemented in WinBUGS software, version 1.4.3

**WinBUGS Code**. Code used to perform the Bayesian adjustment for measurement error in a matched case-control study with multiple continuous covariates

Click here for file

Posterior means and 95% equal-tailed credible intervals of the ORs obtained for the models under the

Comparison of posterior means and credible intervals of the ORs

**OR**

**95% Cred. Int.**

**Adjusted OR**

**95% Cred. Int.**

**PFOA**

**0.905**

(0.661, 1.209)

**0.828**

(0.584, 1.127)

**PFOS**

**0.802**

(0.495, 1.214)

**0.752**

(0.445, 1.181)

**PFHxS**

**1.315**

(0.964, 1.755)

**1.302**

(0.934, 1.779)

**PFOA**

**0.904**

(0.656, 1.212)

**0.821**

(0.568, 1.131)

**PFOS**

**0.794**

(0.482, 1.221)

**0.743**

(0.431, 1.191)

**PFHxS**

**1.333**

(0.960, 1.816)

**1.329**

(0.938, 1.856)

Posterior means and 95% equal-tailed credible intervals of the ORs for the simple model and the model adjusted for confounding variables. Results are presented for the Bayesian

Figure

- δ

Posterior distributions of ORs and credible intervals

**Posterior distributions of ORs and credible intervals**. Posterior distributions (curves), and corresponding posterior means and 95% equal-tailed credible intervals (vertical lines) of ORs based on the two models under the measurement error analysis. The solid curves/lines correspond to the simple model (ME-S) and the dashed curves/lines correspond to the model adjusted for confounding variables (ME-A).

A sensitivity analysis of the measurement error variability

Sensitivity analysis of measurement error variability

**Sensitivity analysis of measurement error variability**. Posterior means and 95% equal-tailed credible intervals of the ORs for different scenarios of measurement error variability. The solid lines correspond to the simple model (ME-S) and the dashed lines to the model adjusted for confounding variables (ME-A).

Conclusions

We propose a Bayesian method to correct for measurement error in multiple continuous exposures for individually matched case-control studies. This method assumes a classical measurement model in order to account for random error in the exposures. It uses the conditional logistic regression likelihood as a disease model. We justify the use of this model in the presence of measurement error in the exposures by having a random-effect exposure model.

The proposed method can be implemented in WinBUGS software, which manages the computational complexity associated with likelihood-based approaches, to which Guolo et al.

For the particular case of the study on PFAs, Bayesian inference of ORs indicates that little adjustment for exposure measurement error is needed for the magnitude of error determined from the quality control experiment. However, bigger adjustments arise if larger measurement errors are assumed.

Some avenues for future research are suggested by our results. First, the method assumes a multivariate normal distribution on the exposures. However, it is important to keep in mind that a model misspecification may lead to biased estimates. In this context some authors have proposed the use of parametric and non-parametric flexible models. Nevertheless, some complications are involved in their implementations. For instance, Richardson et al.

Second, we have not made explicit comparisons between our method and other methods. We have, however, considered implementation issues for our method versus others. Particularly, we considered regression calibration techniques which impute best-guess exposure values and then plug these in to the disease model. While this is a simple procedure with some data formats, it would be no simpler that our method in the present format. The imputation involves estimating

- X

- W

Finally, using available information from the quality control experiment performed on the PFA concentrations and the multivariate version of delta method, we present a statistical approach to estimate the measurement error variability. However, different assumptions and estimation methods can be developed in the presence of additional validation data or a different structure of quality control data. For instance, the complicated structure of the percent recovery experiments necessitated a 'plug-in' approach to dealing with the measurement error covariance matrix. Simpler data structures for informing the measurement error variance, such as a validation subsample, replicates, or an instrumental variable, would much more easily lend themselves to incorporating uncertainty about this covariance matrix as part of the overall Bayesian analysis.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

GEH developed and implemented the Bayesian method under supervision of PG. IB designed epidemiological study, oversaw its conduct and identified the need to better understand the impact of measurement error on

Appendix I. Measurement error variability estimation

In reference to the epidemiological matched case-control study on PFAs, Chan et al.

Let

Let _{
p
}be the factor used to convert molar units to ppb concentrations, corresponding to exposure

where _{
p
}, _{
p, spiked
}and _{
p, gold
}correspond to the unspiked serum, spiked serum and gold standard samples in log-molar concentrations, respectively.

Using the normality and homogeneity assumptions of the error component, the corresponding equation (1) for a particular exposure

Therefore, it follows that

By substituting these three equations into (AI.1), it is possible to see that

Moreover, according to the description of the samples in the quality control procedure _{
p,spiked
}, _{
p,gold
}, which correspond to the true spiked serum and gold standard samples in log-molar concentrations, have the following underlying structures

Substitution of (AI.3) into equation (AI.2) yields

where _{
p
}= _{p}* exp(_{
p
}). Therefore, the percentage of recovery corresponding to exposure _{
p
}= (ε_{
p
}, ε_{
p,spiked
}, ε_{
p,gold
})^{
T
}. Assuming (ε_{
p
}, ε_{
p,spiked
}, ε_{
p,gold
}) are independent, _{
p
}follows a trivariate normal distribution with a mean vector of zeros and a covariance matrix equal to the identity matrix. Thus, based on the multivariate delta method, the variance of the percent recovery is given by

where is the gradient of _{
p
}results of the quality control procedure (standard deviations of the percentages of recovery for PFAs in ppb concentrations), and by taking _{
p
}as the sample average of the ppb concentrations recorded for exposure across-sample, estimates for the measurement error variability for each exposure can be obtained as follows

Using that information, the estimate of covariance matrix

Appendix II. Justification for conditional likelihood in matched case-control studies with measurement error in continuous exposures

Bayesian justifications for using conditional likelihood when actual exposure is observed are given by Rice

Under the Bayesian paradigm, for individually matched case-control data retrospectively collected and subject to measurement error, the joint posterior model of the true exposure and surrogate variables for a specific stratum (matched set)

The first term of the right hand side of (AII.1) is obtained under the assumption of

- Y

- X

- Y

Notice the distribution of

- Y

Since

Thus, the joint posterior density of the true exposure and surrogate variables for a specific stratum

where the conditional density of (

- W

- X

- X

- X

- S

Acknowledgements

The authors thank JW Martin, E Chan, F Bamforth and NM Cherry of the University of Alberta (Edmonton, Canada) for their contribution to generating data that motivated our work. This research was financially supported by the Canadian Institutes for Health Research (Funding Reference Number 62863).

Pre-publication history

The pre-publication history for this paper can be accessed here: