Abstract
Background
Reliable exposure data is a vital concern in medical epidemiology and intervention studies. The present study addresses the needs of the medical researcher to spend monetary resources devoted to exposure assessment with an optimal costefficiency, i.e. obtain the best possible statistical performance at a specified budget. A few previous studies have suggested mathematical optimization procedures based on very simple cost models; this study extends the methodology to cover even nonlinear cost scenarios.
Methods
Statistical performance, i.e. efficiency, was assessed in terms of the precision of an exposure mean value, as determined in a hierarchical, nested measurement model with three stages. Total costs were assessed using a corresponding threestage cost model, allowing costs at each stage to vary nonlinearly with the number of measurements according to a power function. Using these models, procedures for identifying the optimally costefficient allocation of measurements under a constrained budget were developed, and applied on 225 scenarios combining different sizes of unit costs, cost function exponents, and exposure variance components.
Results
Explicit mathematical rules for identifying optimal allocation could be developed when cost functions were linear, while nonlinear cost functions implied that parts of or the entire optimization procedure had to be carried out using numerical methods.
For many of the 225 scenarios, the optimal strategy consisted in measuring on only one occasion from each of as many subjects as allowed by the budget. Significant deviations from this principle occurred if costs for recruiting subjects were large compared to costs for setting up measurement occasions, and, at the same time, the betweensubjects to withinsubject variance ratio was small. In these cases, nonlinearities had a profound influence on the optimal allocation and on the eventual size of the exposure data set.
Conclusions
The analysis procedures developed in the present study can be used for informed design of exposure assessment strategies, provided that data are available on exposure variability and the costs of collecting and processing data. The present shortage of empirical evidence on costs and appropriate cost functions however impedes general conclusions on optimal exposure measurement strategies in different epidemiologic scenarios.
Background
Reliable exposure assessment is a vital concern in medical epidemiology and intervention research. In occupational as well as public health studies, exposure is often monitored using equipment that allows data to be collected at a high resolution for long periods and on repeated occasions (e.g. [14]). A considerable emphasis has been put on developing and applying methods for analyzing sources of exposure variability in such data, in terms of socalled variance components [58]. As an example, variance components pertaining to, e.g. companies, occupations, subjects, days within subjects, and exposure samples within days have been determined for a large number of airborne, dermal, and biomechanical exposures in working life (e.g. [2,3,915]). These variance components have been utilized as a remedy for identifying targets for surveillance, intervention and prevention [6,16,17], as well as for designing effective exposure assessment strategies producing information at a desired level of precision. While an extensive literature deals with the consequences of random exposure variability to bias and precision in exposureoutcome relationships [1822], some attention has also been paid to the use of variance components for estimating sampling needs in studies examining compliance with exposure limits [6], and in studies comparing groups [12] or conditions [13] as in an intervention scenario. In the latter case, the requirement for reliable exposure data can be expressed as a need to obtain estimates of the mean exposure of individuals or groups with a sufficient precision to arrive at a confidence interval of acceptable size, or secure an acceptable statistical power in a specified hypothesis test. Generalized formulae are available for estimating statistical efficiency, i.e. the relationship between the precision of a mean exposure estimate, on the one hand, and, on the other, the size of relevant variance components, and the number of measurements at the corresponding sampling stages [23,24]. The most frequently applied measurement model is hierarchical and random with two or three nested stages, for instance subjects and days within subjects [2,25,26]; subjects, days within subjects and samples within days [12,27]; or groups, subjects within groups, and days within subjects [28]. A few attempts have been made to apply more complicated models, e.g. including crossed (nonnested) components related to the distribution of measurement days among subjects [29] or associated with methodological variance [11]. Also, mixed models including fixed determinants of exposure in addition to random effects are in increasing use [13,3033].
Some studies have been devoted particularly to understanding the effects on the precision of an estimated group mean exposure of allocating measurement efforts in different ways between and within subjects [12], between occupational recordings and data processing [11], or across time within a measurement day [34,35]. This had led to a number of principles for statistically efficient exposure assessment, i.e. measurement strategies that perform well at a specified investment of measurement resources, or, equivalently, yield a specified performance with comparatively small measurement efforts [12,34]. As one trivial conclusion, more data generally leads to better statistical performance, and furthermore, efficiency increases if measurements are allocated to higher sampling stages in the hierarchical model [23].
At the same time, more measurements inevitably imply larger monetary costs. While budget constraints are the pragmatic reality in most exposure assessments, surprisingly few studies have addressed the issue of how to design a measurement strategy so as to give the best possible statistical efficiency at the available monetary resources [36]. This endeavor is not equivalent to addressing statistical efficiency per se, as introduced above, since measurements at different stages may entail different costs. For instance, increasing the number of groups may be considerably more expensive than collecting data from more subjects in an existing group; and the process of identifying and approaching a new subject may be more expensive than achieving more measurements from a subject already in the sample population. Also, different measurement instruments yielding the same exposure variables may imply different costs, in particular if the risk of measurement failures is acknowledged [37]. Of the limited literature devoted to efficiency and cost in data collection, some studies compare a selection of measurement strategies in order to identify the one superior in costefficiency [3841]. A few studies take on the more challenging task of determining the optimally costefficient strategy at a certain budget, on the basis of specified costs for collecting data at different stages, and specified sizes of the corresponding variance components. The general significance of examining costefficiency in data collection is illustrated by previous studies appearing in a variety of research areas, including occupational hygiene [38], environmental medicine [39,42,43], clinical chemistry [44], and nutrition [45].
Basically, optimization in the case of exposure assessment strives to identify data collection strategies at the frontier of possible relationships between cost and statistical efficiency (figure 1).
Figure 1. The notion of optimal costefficiency. The horizontal axis illustrates the total cost associated with an exposure measurement strategy, and the vertical axis shows the variance of the resulting mean exposure. The frontier curve illustrates the minimal obtainable variance at each level of spending, i.e. the best possible statistical performance, e.g. s^{2}_{μ}*, at a particular total cost, e.g. c*. Strategies above the frontier are, in principle, possible, but do not yield an optimal performance. No strategies occur below the frontier.
Previous optimization studies have addressed hierarchical models with two [4547] or three [43,44,47] stages, as well as the optimal allocation of measurements between two alternative yet correlated instruments for data collection [42,48,49]. All these studies have, however, assumed that the price of one measurement unit at each stage is constant, implying that costs increase in a linear fashion at that stage, proportionally to the number of samples. Only in an appendix of the paper by Duan and Mage [42], an empirical example appears of the quite likely case that costs may vary with the number of measurements; for instance that subjects recruited late in a study may require more time for being persuaded, and thus entail larger labor costs, than subjects signing up immediately. Also, in his textbook on sampling strategies, Cochran [47] reports some nonlinear cost functions in other areas of data collection, and additional examples appear in Groves [50]. In addition, the cited costefficiency studies do not, in general, consider whether the identified optimal strategies are feasible under the constraints dictated by a specified, yet limited budget.
Thus, the present paper is devoted to deriving methods for optimizing exposure assessment strategies, in terms of offering the best possible tradeoff between total costs and statistical efficiency. In contrast to previous literature, this study explores optimal costefficiency even when cost functions are not linear and budget constraints apply, and the study also identifies alternative optimization procedures in those cases where analytical closedform solutions cannot be developed.
First, the paper presents a general theoretical model of cost and efficiency when assessing exposure mean values in occupational groups, including some theoretical results based on that model. Then, the general model is simplified, and procedures are derived for identifying optimally costefficient exposure assessment strategies, depending on the shapes of cost functions. These results are illustrated by numerical examples. A general discussion on the representativeness and sensitivity of the suggested optimization procedures concludes the paper.
Methods
A framework for costefficient exposure assessment
Exploring costefficiency at an ordinal level only requires a specification of the properties of the mathematical function associating each exposure assessment strategy with its stated statistical objective. If, however, the goal of the costefficiency analysis is to compare or optimize strategies in explicit, quantitative terms, specific functional forms need be identified that parameterize objectives and costs. This is a necessary requirement when aiming at the (occasionally more than one) strategy that maximizes efficiency among the large selection of possible assessment strategies entailing a particular cost.
Thus, three major issues must be considered as part of a quantitative analysis of costefficient resource consumption: (1) why resources are used, i.e. the objective of collecting data, (2) how much resources are required to fulfil the objective, expressed in terms of unitcosts, and (3) whether the intended strategy for resource consumption is feasible. When examining costefficient assessments of group mean exposure we thus need to know (1) the relationship between the group mean and the assessment strategy, as reflected by what is usually referred to as the objective function, (2) the amount of monetary resources required to realise a particular assessment strategy, expressed by the cost function, and (3) the amount of monetary resources at our disposal, as reflected by the budget constraint.
The objective function  precision of the mean
For a hierarchical threestage balanced data set (subjects, occasions within subject, samples within occasion), the group mean exposure, μ, can be estimated using a "mean of means" approach [23] as:
Where x_{k(ij) }is an individual exposure sample, collected from subject i on occasion j; n_{s }is the number of subjects included in the data set; n_{d }is the number of distinct measurement occasions, for instance days, per subject; and n_{q }is the number of samples, or quanta, per measurement occasion. Accordingly, averaging is made across quanta within each occasion, then across occasions within each subject, and finally across subjects.
A general formula for determining the variance of this group mean exposure estimate, , has been proposed and applied by several authors [12,23,44,47]. This objective function takes the form:
, , and are the variances between subjects, between measurement occasions within each subject, and between quanta within occasions, respectively. The size of a quantum can be defined as convenient, and previous studies have used quanta of, for instance, one minute [34,51], one work cycle [11,13,52,53], several consecutive work cycles [12,54], and one hour [55]. Thus, equation (1) gives an estimate of the precision of a group mean exposure resulting from a particular measurement strategy in terms of subjects, occasions and quanta, in a setting with known components of exposure variability.
The cost function
While all cost functions suggested in the literature have been linear, the cost associated with collecting n_{q}quanta on each of n_{d }occasions for each of n_{s }subjects can be assessed even in a nonlinear case, provided that information is available on the "capability" to recruit subjects, that is, the amount of resources needed for recruiting any specific number of subjects, and the equivalent capabilities for setting up measurement occasions within each subject and collecting quanta within each occasion.
Assume first that these three capabilities are all homogeneous of degree k, in the sense that if all resources are multiplied by a certain factor, x (x > 1), output will increase by x^{k}. This is a common assumption in economics addressing nonlinear production capabilities. For example, if k = 1 and resources allocated to the process of recruiting subjects are doubled, then the number of subjects recruited will also double; this is simple proportional linearity. In the case of k = 0.5, doubled recruitment resources would lead to an increase in the number of recruited subjects by a factor . Assume further that the resources needed for setting up n_{d }measurement occasions, each containing n_{q}quanta, do not depend on the subject from whom data are collected, and the resources needed to collect n_{q }quanta on a particular measurement occasion for a particular subject are independent of occasion and subject.
The first of these two assumed capability properties allows cost functions for recruiting subjects, c_{s}, setting up measurement occasions within each subject, c_{d}, and collecting measurement quanta within each occasion, c_{q}, to be expressed as: ; ; and ,
where the πvalues are the costs for obtaining one measurement unit at each stage of data collection, socalled unit costs, and α, β and γ are parameters, all larger than 0, describing the shape of a power relationship between the number of measurement units and costs.
The relationship between the value(s) of π and the exponents α, β and γ can be illustrated by examining the cost functions. If, for instance, α = 1, the cost of recruiting n_{s }subjects is c_{s}(n_{s}) = π_{s }⋅n_{s}, i.e. the cost increases in direct proportion to the number of subjects. In this case, π_{s }is the oneunit cost (c_{s }(1) = π_{s}), as well as the marginal cost of recruiting any additional subject (∂c_{s}/∂n_{s }= π_{s}). If α ≠ 1, π_{s }is still the oneunit cost, but the marginal cost is now . Thus, if α > 1, the marginal cost of including an additional subject increases with the number of subjects, while it decreases when 0 < α < 1.
The second capability property assumed above implies that the total cost of collecting a data set including n_{s }subjects each observed for n_{d }occasions, each containing n_{q }quanta can be stated as c_{s }(n_{s}) + n_{s }c_{d }(n_{d}) + n_{s }n_{d }c_{q }(n_{q}), which equals:
This cost function presents a generalisation of previously suggested linear cost functions [43,44,46] by permitting both linear and nonlinear relationships between the sample size at different stages of data collection and the cost of obtaining data. With (α, β, γ) = (1,1,1), equation (2) takes the customary linear form used in previous studies. Notably, equation (2) only expresses the variable costs associated with measurement; possible fixed costs, which do not depend on the number of samples, need to be added to give the total cost of collecting the data set, but will not affect the optimization procedures developed below [41,43].
The general optimization problem
If a data collection is allowed to consume a total budget R (after possible reduction by fixed costs), combinations of n_{s}, n_{d }and n_{q }that optimize the output, i.e. minimize the resulting variance of the estimated mean exposure, can be retrieved by solving the following optimization problem:
with respect to n_{s}, n_{d}, n_{q}; subject to the constraint:
Due to the nonlinear property of this threevariable equation system, explicit solutions for optimization can be derived only in exceptional cases. Moreover, solutions to a threevariable problem are difficult to illustrate graphically. Therefore, the following analysis will be limited to cases in which the number of quanta, n_{q}, within each measurement occasion is not a choice variable. This situation occurs for instance when exposure is assessed for complete days, or when the withinday schedule of data sampling cannot or should not be manipulated for reasons of logistics or feasibility.
The twovariable reduction
Given a predetermined number of sampled quanta within each measurement occasion, the general optimization problem above is reduced to the twovariable problem of identifying optimal values of n_{s }and n_{d}. This allows graphical illustrations of the problem and its solutions. It also opens for further simplification into onevariable optimisation problems, which in many cases can be solved explicitly, as shown in the results section.
The twovariable problem takes the form:
with respect to n_{s}, n_{d}; subject to the constraint:
In these equations, the terms and have been substituted into the threevariable expressions of mean exposure variance (equation (1)) and cost (equation (2)), respectively. This notation emphasizes that the specific variance of an exposure estimate obtained at one measurement occasion, s^{2}_{μWD}, and the cost of collecting data within each occasion, c_{q}, are no longer allowed to vary.
In principle, the twovariable problem can be solved by applying constrained optimization techniques, i.e. by employing the problem's Lagrange function (e.g. [56]). As an alternative, the budget constraint, equation (4), can be substituted into the objective function, equation (3), so as to get a new objective function, which expresses the variance as a function of only one variable, be it either n_{s }or n_{d}. This approach relies on the prerequisite that any solution to the optimization problem entails that the entire budget R is consumed. In that case, the budget constraint (equation (4)) can be replaced by an equality:
Isolating n_{s }or n_{d }from equation (4a), followed by substitution into equation (3), yields a onevariable objective function, , with i = s or i = d. This function can be examined using standard methodologies for identifying and illustrating possible local minima within a specified choice set. The resulting optimal value of either n_{s }or n_{d }can then be entered into the budget constraint to get the optimal value of the other variable.
The onevariable substitution approach
The core challenge in the substitution approach outlined in the previous section is to identify that exposure assessment strategy in the choice set defined by the budget constraint for which the objective function, i.e. equation (3) with substituted n_{s }or n_{d}, has its minimal value. This can, in principle, be accomplished by determining the derivative of the objective function and finding its roots.
Figure 2 illustrates four principally different cases of how the objective variance function may look as a function of invested resources. At the lower boundary of the choice set, all resources are spent on one unit of n_{i}, and at the upper boundary on as many n_{i }as allowed by the budget, n_{i,max}. Thus, if i = s, these two boundaries correspond to allocating as many measurement occasions as possible to one subject, and obtaining measurements at one occasion from as many subjects as possible.
Figure 2. Principally different cases of local extremes of the onevariable objective variance function. The boundaries of possible resource investment, i.e. the choice set, are given by n_{i }= 1 and n_{i }= n_{i,max}. In I1, the variance function has a local minimum at n_{i }= n_{i}*; this is an interior optimal solution with minimal variance. For I2, the variance function also has an interior zero derivative, at n_{i }= n_{i}°, but this solution maximizes the variance and is therefore not useful. In cases E1 and E2, the local extreme of the variance function lies below and above the choice set, respectively. In these cases, minimal variance is obtained at the lower (E1) and upper (E2) choice set boundary.
As a general procedure, the optimal n_{i }for a given budget can be found by comparing the performance obtained: (1) at the lower boundary of the choice set, i.e. using n_{i }= 1, (2) at the upper boundary of the choice set, i.e. with n_{i }= n_{i,max}, and (3) entering values of n_{i}, if any, in the interior of the choice set, 1 ≤ n_{i }≤ n_{i,max}, for which .
Thus, examining the properties of the objective function, , at the boundaries of the choice set is an appropriate first step for identifying the optimal allocation of resources. Provided that the objective function has one unique minimum, i.e. that the objective function is convex (I1, E1 and E2 in figure 2), a necessary, but also sufficient, condition for the optimum to be internal (case I1) is that and . The exact location of the internal minimum can then be retrieved in a second step. The basic shape of the objective function can be determined by examining its secondorder derivative. If this derivative is positive, the function is convex; if not it is concave (case I2), and the optimal strategy will be at one of the choice set boundaries.
If a convex objective function does not have an internal minimum, as in cases E1 and E2 in figure 2, the optimal strategy is represented by the boundary of the choice set. In case E1, which occurs if , the optimal strategy is to set n_{i }= 1, that is, collect data from only one subject (if i = s), or having only one measurement occasion per subject (if i = d). Case E2 is characterized by a decreasing objective function at n_{i }= n_{max}, i.e.. In this case, if i = s, the best choice will be to measure as many subjects as possible and hence only one occasion per subject, or, if i = d, to collect data for as many occasions as possible from only one subject.
Results
Below, procedures for determining optimal sampling strategies are developed using the onevariable substitution approach described above. Procedures will be stratified according to the sizes of α and β, which determine the shape of the cost function (equation (4a)), and hence the form of the substituted objective function,. For each combination of α and β, the objective function is examined, and the boundaries of the choice set determined. Procedures for determining whether the objective function is convex (cases I1, E1 and E2 in figure 2) or concave (case I2) are described where needed. For convex functions, explicit rules are, if possible, developed for when (case I1) and when not (cases E1, E2) the optimal measurement allocation occurs within the choice set. Finally, procedures for identifying an optimal sampling strategy inside the choice set (case I1) are described.
Case A: α = 1, β = 1
In this case, the marginal costs of including another subject or measurement occasion are both independent of the number of previously included subjects and occasions. Thus, the cost function is linear at both of these stages.
Case A; substitution and objective function
With α = β = 1, the budget constraint (equation 4a) can be expressed as:
Substituting this expression for n_{d }in equation (3) gives the corresponding objective function:
Taking the derivative with respect to n_{s }yields:
Setting , equation (7) can be expressed as:
This onevariable objective function is convex in n_{s}, since the derivative of equation (7a) is positive for all n_{s }in the choice set.
Case A; boundaries of the choice set
With α = β = 1, the choice set boundaries in terms of n_{s }are n_{s }= 1 and ; the latter obtained by setting n_{d }= 1 in the budget constraint, equation (4a), and solving for n_{s}.
At n_{s }= 1, equation (7a) takes the form: . Thus, a positive derivative at n_{s }= 1 occurs when:
This gives a necessary and sufficient condition that the optimal allocation of measurements is obtained with n_{s }= 1, and hence with measurement occasions per subject.
At the other boundary, , the derivative of the objective function is:
This derivative is negative only when the first term in the numerator is negative, i.e., or rearranged:
This is the necessary and sufficient condition for the optimal allocation being to choose the maximal affordable number of subjects, , and measure on one occasion for each of these. Notably, condition (9) is independent of the budget R. Also, unless is zero, the condition is always valid if π_{s }= 0, that is if the recruitment of subjects does not lead to any costs. Under case A, this implies that all measurement occasions entail the same cost, π_{d}+c_{q}, irrespective of how they are allocated between subjects. Thus, in this highly simplified case [38,39], the optimal strategy is always to measure on one occasion from each of as many subjects as allowed by the budget.
Case A; optimization inside the choice set
Setting the derivative of the variance function (7a) equal to zero yields:
If this optimal value of n_{s }is an interior solution, i.e., the corresponding number of measurement occasions per subject can be obtained by substitution of equation (10) into equation (4a):
Thus, in this case the optimal number of measurement occasions per subject does not depend on the budget R.
The explicit solution derived above for the optimal set (n_{s}, n_{d}) can lead to noninteger values of one or both numbers. Since both are, by nature, discrete, a posthoc procedure may be necessary in which integer sets of (n_{s}, n_{d}) close to the mathematically derived solution are entered into the budget constraint (equation (4)) to check that they are affordable, and into the objective function (equation (3)) to evaluate their statistical performance. For instance, if an interior n_{s }determined by equation (10) is not an integer, the nearest larger and smaller integers are identified, and for each of those, at least two associated integer values of n_{d }are determined that are larger and smaller than the value of n_{d }derived by equation (11). The resulting affordable sets of (n_{s}, n_{d}) are then examined to identify the one resulting in the smallest mean exposure variance.
Table 1 summarizes the derived procedures for optimizing cost efficiency in case A, together with procedures for the other cases, as derived below.
Table 1. Summary of equations, in terms of their numbers in the running text, for identifying the optimal exposure assessment strategy
Case B: α = 1, β≠1
Case B entails constant marginal costs in the recruitment of new subjects but either increasing or decreasing marginal costs for organizing measurement occasions.
Case B; substitution and objective function
In case B, the onevariable problem is most easily solved if the objective function is rearranged so that n_{s }is expressed as a function of n_{d}. From the budget constraint, equation (4a), n_{s }is isolated as:
The corresponding objective function is:
And its derivative:
The objective function (equation (13)) is always convex for β ≥ 2. For 1 < β < 2 it is convex if , and for β < 1, convexity requires (proof, see appendix).
If none of these inequalities are fulfilled, the optimal measurement strategy will correspond to one of the choice set boundaries.
Case B; boundaries of the choice set
The choice set boundaries in terms of n_{d }are n_{d }= 1 and n_{d }= n_{d,max}. The latter is found by setting n_{s }= 1 in the budget constraint, equation (4a), and rearrange to get: . This equation does not have a closedform solution for n_{d}. In this case, n_{d,max }can be determined numerically by calculating the cost, c(1, n_{d}), when entering increasing values of n_{d }in the cost function, equation (4), at n_{s }= 1, that is:
n_{d,max }is then the largest value of n_{d }for which c(1, n_{d}) ≤ R. Figure 3 illustrates an example of this procedure, for three different combinations of (π_{s}, π_{d}, c_{q}) and two different levels of β, which will reappear in the collection of numerical examples.
Figure 3. Numerical determination of the upper boundary of the choice set in case B (α = 1, β≠1). For six different combinations of unit costs and size of the exponent β, the maximal possible number of measurement occasions, i.e. n_{d,max}, for a single subject is identified under a budget constraint of 500 (arbitrary units). Squares, rhomboids, and triangles: (π_{s}, π_{d}, c_{q}) = (2, 10, 10), (11, 5.5, 5.5), and (20, 1, 1), respectively. Open and closed symbols: β = 0.50 and β = 1.50, respectively. The value of n_{d,max }in each scenario is indicated by an enlarged symbol.
At n_{d }= 1, the derivative of the objective function, i.e. equation (14), is equal to:
which is positive under the following condition:
Thus, for parameter sets obeying this inequality, the optimal sample allocation is to measure for one occasion on each of subjects.
At the other boundary, n_{d }= n_{d,max}, the sign of the derivative of the objective function must be obtained by entering the numerically determined value of n_{d,max }in equation (14). A negative is then a necessary and sufficient condition for the optimal measurement strategy to be to choose one subject and measure record from that subject on n_{d,max }occasions.
Case B; optimization inside the choice set
The objective function, equation (13), cannot be minimized using analytical methods, since (cf. equation (14)) does not have a closedform solution. Thus, a possible interior optimum must be located by entering all values of n_{d }in the interval [1, n_{d,max}] into the objective function and locate the minimal result. The corresponding optimal value of n_{s }can be found by entering the identified optimal value of n_{d }in equation (12).
Case C: α≠1, β = 1
In case C, all measurement occasions for a particular subject can be organized at the same cost, while the cost of recruiting additional subjects changes with their numbers.
Case C; substitution and objective function
In case C, the onevariable problem is most easily solved if the objective function is rearranged to express n_{d }as a function of n_{s}. Isolating n_{d }in the budget constraint, equation (4a), gives
And hence the objective variance function in terms of n_{s }is:
Taking the derivate with respect to n_{s }yields:
Setting , this can be expressed as:
It is straightforward to verify that this function is convex in n_{s }and, hence, has one unique minimum.
Case C; boundaries of the choice set
The choice set boundaries in this case are n_{s }= 1 and n_{s }= n_{s,max}. The latter is found by setting n_{d }= 1 in the budget constraint, equation (4a), and solving for n_{s}. This leads to the equation: which does not have a closedform solution. Thus, similar to the determination of n_{d,max }in case B above, n_{s,max }must be determined by entering increasing values of n_{s }in the cost function until reaching the largest value of n_{s }for which c(n_{s}, 1) ≤ R.
At the boundary n_{s }= 1, the derivative of the objective function, c.f. equation 18a, is: A necessary and sufficient condition for choosing n_{s }= 1, and hence (cf. equation (16); if necessary truncated to the nearest smaller integer), is derived by rearranging the inequality to give:
At the other boundary, n_{s,max}, the sign of the derivative of the objective function must be determined numerically by entering the n_{s,max }identified above into equation (18a). If the sign is negative, n_{s,max }is the optimal number of subjects, and each should be recorded for one occasion.
Case C; optimization inside the choice set
In case C, the equation (cf. equation (18a)) has no closedform solution. Thus, an interior solution to the optimization must be identified by entering all n_{s }in the interval [1, n_{s,max}] into the objective function, i.e. equation (17), and locate the minimal variance. After having identified the optimal n_{s}, the corresponding n_{d }can be found by solving equation (16).
Case D: α≠1, β≠1
In case D, neither n_{s }nor n_{d }can be expressed as a function of the other on basis of the budget constraint. Thus, a onevariable problem cannot be formulated in explicit terms, and, consequently, no analytical expressions can be developed, neither for the derivative of the objective function, nor for boundary conditions, nor for possible interior solutions. Therefore, the optimal choice of the number of subjects and measurement occasions has to be identified by means of a numerical procedure, such as the following:
(1) For n_{s }= 1, the cost function, equation (4), is .
In this function, increasing n_{d }values are entered, up to largest possible value, n_{d,max}, for which c(1, n_{d}) ≤ R;
(2) The values (n_{s}, n_{d}) = (1, n_{d,max1}) are entered into the objective function, equation (3), i.e. , and the resulting value is noted.
(3) These two steps are repeated for n_{s }= 2, corresponding to the cost function, thus obtaining the value of
(4) Subsequent values of are derived using this same procedure for stepwise increasing n_{s}, until reaching the largest possible n_{s }allowed by the budget.
(5) By inspecting the set of values of , which all entail costs as close as possible to the budget constraint R, the combination of n_{s }and n_{d }offering the smallest variance can be identified.
Figure 4 illustrates the numerical procedure for identifying the maximal possible value of n_{d }at increasing values of n_{s}, and the resulting variance of the exposure mean. Since the values of n_{s }and n_{d }are discrete, and hence even the corresponding total cost c(n_{s}, n_{d}), it may happen that the optimal measurement strategy does not consume the entire budget R. For instance, the optimal strategy (n_{s}, n_{d}) = (5, 12) identified in figure 4 only utilizes 98.3% of the allowed resources.
Figure 4. Numerical procedure for determining the optimal exposure assessment strategy in case D (α≠1, β≠1). For increasing values of n_{s }as indicated inside the open symbols in each curve, the maximal number of measurement occasions, i.e. n_{d,maxns}, allowed by a budget of 500 (arbitrary units) is identified, as marked by open symbols. The resulting statistical performance, i.e. s^{2}_{μ}(n_{s}, n_{d,maxns}), is shown above each curve. In the illustrated case, (n_{s}, n_{d}) = (5, 12) was the optimal allocation. The illustration refers to a scenario with (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) = (2, 10, 10), (π_{s}, π_{d}, c_{q}) = (20, 1, 1), and (α, β) = (1.50, 1.50).
Numerical examples
Using the procedures developed above, optimal sampling strategies were identified for 225 scenarios representing different combinations of costs and variance components, and different marginal costs of recruiting new subjects and organizing more measurement occasions, as expressed through α and β (table 2). Unit costs π_{s}, π_{d }and c_{q }were selected to illustrate large, medium and small costs of recruiting subjects relative to obtaining measurements on each of them, and the sets of variance components , and represent large, medium and small betweensubjects to withinsubject variance ratios. Parameter values were chosen so that the total cost of assessing the exposure of one subject at one occasion (cf. equation (4)) as well as the resulting mean exposure variance (cf. equation (3)) takes the same numerical value (22) in all scenarios. In all scenarios, the budget R was constrained at 500 (arbitrary units). In median, the 225 strategies utilized 97.9% of the allowed budget (5^{th}95^{th }percentile range: 92.3% to 100.0%).
Table 2. Optimal sampling strategies (n_{s}, n_{d}) and the resulting mean exposure variance s^{2}_{μ }(cf. equation (3)) at different combinations of variance components (s^{2}_{BS},s^{2}_{BD},s^{2}_{μWD}; sections ac), unit costs (π_{s}, π_{d}, c_{q}), and exponents α and β describing the shape of the relationship between costs and number of measurements (cf. equation (4))
As illustrated in table 2, the optimally costefficient strategy in many scenarios is to obtain data on one occasion from as many subjects as possible. In particular, this applies when s^{2}_{BS }is "large" relative to s^{2}_{BD }and s^{2}_{μWD }(table 2c), and even when s^{2}_{BS }is similar to (s^{2}_{BD}+s^{2}_{μWD}) if π_{s }is also equal to or smaller than (π_{d}+c_{q}) (table 2b). In these cases, the principle of measuring from as many subjects as possible is valid irrespective of whether cost functions are linear or not, i.e. irrespective of the sizes of α and β.
Considerable deviations from the principle of collecting data from as many subjects as possible do, however, occur; the most extreme examples appearing when s^{2}_{BS }is "small" relative to s^{2}_{BD }and s^{2}_{μWD }and π_{s }is "large" compared to (π_{d}+c_{q}) and α is "large" (bottom right corner of table 2a). The combination of a "small" variance between subjects and "large" costs associated with recruiting subjects also leads to the optimal sampling strategy being particularly sensitive to nonlinearities in costs. Thus, with (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) = (2, 10, 10) and (π_{s}, π_{d}, c_{q}) = (20, 1, 1), a linear cost function implies an optimal sampling strategy of (n_{s}, n_{d}) = (13, 9) (table 2a), while the deviations of α and β from 1 illustrated in table 2 result in optimal strategies (n_{s}, n_{d}) ranging from (5, 12) to (49, 5), and corresponding variances s^{2}_{μ }between 0.12 and 0.73. In contrast, with (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) = (20, 1, 1) and (π_{s}, π_{d}, c_{q}) = (2, 10, 10) (table 2c), the most extreme nonlinear cost functions lead to sampling strategies, (n_{s}, n_{d}) = (24, 1) and (n_{s}, n_{d}) = (17, 1), which do not deviate much from the optimal strategy in the linear case, (n_{s}, n_{d}) = (22, 1), and only result in moderate differences in variance.
While not illustrated in table 2, a larger total budget leads to a wider occurrence of the optimal strategy being to collect data on one occasion per subject. Thus, with a budget of 500, 135 of the 225 scenarios illustrated in table 2 imply that data should be collected according to this principle; if the budget is increased to 1000, this count increases to 139. However, in 3 cases the optimal strategy changes in the opposite direction, i.e. into collecting data on more than one occasion per subject. This was caused by irregularities due to the effect of n_{s }and n_{d }needing to be integers. With a decreasing budget, oneoccasionpersubject optima get rarer, as expected, but irregularities occur more often.
Even if nonlinearities in cost functions may not affect the principle of how to allocate measurements at many combinations of unit costs and variance components, the size of α is always important to the eventual size of the data set, and therefore to the precision of the eventual mean exposure estimate. In contrast, the size of β is only important if the optimal strategy implies, or is close to implying, measurements from more than one occasion per subject, that is when s^{2}_{BS }is "small" relative to s^{2}_{BD }and s^{2}_{μWD }(table 2a), but even when s^{2}_{BS }is similar to (s^{2}_{BD}+s^{2}_{μWD}) if π_{s }is also larger than (π_{d}+c_{q}) (table 2b). This is an expected result, since the cost of setting up measurement occasions is independent of β at n_{d } = 1 (cf. equation (4)). Thus, when analyzing whether an intended exposure assessment strategy, constrained by budgets, will lead to a sufficient statistical performance, access to a valid estimate of α is generally more important than knowing the exact size of β.
While the size of β is not always important to size of the optimal data set, the best statistical performance at any specific combination of (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) and (π_{s}, π_{d}, c_{q}) will always be obtained with small sizes of α and β; in table 2 exemplified by (α, β) = (0.50, 0.50). This is a reasonable result, since small α and β entail small marginal costs of including more subjects and more measurement occasions.
Although not illustrated in table 2, the effects on statistical performance of deviating from the optimal choice of (n_{s}, n_{d}), but still using the entire budget, were also investigated. In certain cases, deviations did not lead to any particular reduction of performance. For instance, with (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) = (2, 10, 10), (π_{s}, π_{d}, c_{q}) = (20, 1, 1), and (α, β) = (0.75, 0.75), the optimal strategy is to choose (n_{s}, n_{d}) = (21, 9), resulting in a variance of 0.20 (cf. table 2a). However, all strategies with n_{s }in the range between 15 and 32, and corresponding values of n_{d,maxns }ranging from 15 to 4 as allowed by the budget, resulted in variances of 0.22 or less, except for the strategy (30, 4) which gave a variance of 0.23 because it only managed to utilize 92% of the available budget. In other cases, performance was more sensitive to nonoptimal choices of (n_{s}, n_{d}). Again using (s^{2}_{BS}, s^{2}_{BD}, s^{2}_{μWD}) = (2, 10, 10) and (α, β) = (0.75, 0.75), the optimal strategy with (π_{s}, π_{d}, c_{q}) = (2, 10, 10) is now (n_{s}, n_{d}) = (13, 2), resulting in a variance of 0.92 (table 2a). In this case, all strategies allowed by the budget besides the nearest neighbour, (n_{s}, n_{d}) = (12, 2), gave variances of 1.09 or more, i.e. at least 18% larger than the optimum.
Discussion
As illustrated by the numerical examples in table 2, a large ratio of betweensubjects to withinsubject variance generally implies that the optimal allocation principle is to collect data on one occasion from as many subjects as allowed by the budget. This also applies when betweensubjects and withinsubject variances are of similar size, unless the unit cost of recruiting subjects is large relative to that of setting up measurement occasions. In these cases, nonlinearity in the cost functions does not influence the optimal allocation principle; only the eventual size of the data set allowed by budgets. However, at a large relative recruitment cost combined with a small betweensubjects to withinsubject variance ratio, and in particular if the total budget is also small, the optimal sampling strategy may consist in approaching only a few subjects on several occasions each, and the strategy is very sensitive to nonlinearities in cost functions. Nonlinearities in subject recruitment costs always have a clear influence on the size of the optimal data set, while nonlinearities in costs for setting up measurement occasions are important only in cases when the optimal strategy includes multiple measurements per subject.
Representativeness
Statistical model
The present study investigated a hierarchical, nested measurement model with three stages as used in a majority of previous studies of the effects of random measurement error on statistical properties and efficiency in exposure assessment (e.g. [2,12,2628]). Even though the application exemplified in the paper refers to subjects, measurement occasions within subjects, and measurement units within occasions, the generic results are applicable also to other sources of exposure variability that can be described by a hierarchical model. This includes the case of data processing and analysis adding "postsampling" costs and also some methodological variance to each collected exposure sample, thus modifying the sizes of c_{q }(equation (4)) and (equation (3)), respectively. Also, the present study addressed, as most other studies, the case of balanced data sampling, i.e. that the same number of measurement units are collected during each of the same number of occasions from each subject [23]. While the assumption of a balanced, hierarchical model facilitates mathematical derivation of optimal measurement strategies, costefficiency needs to be investigated even for more complicated models, for instance designs including crossed components [11,29]. In particular, the effects of unbalancedness, which is probably a very frequent incident in epidemiologic research, need to be addressed in further studies. Unbalancedness has been shown both mathematically [23,57] and empirically [58] to reduce statistical efficiency, and will thus also influence costefficiency.
During the last decade, powerful statistical techniques have been developed to analyse exposure variability and its determinants using socalled mixedeffect modelling [3033,59]. While mixed model analyses have predominantly been used to identify exposure targets for effective prevention and intervention, they also represent a challenging opportunity to develop exposure assessment strategies that are both "cheap" and statistically efficient. As an example, several occupational studies have proposed or implemented the idea of estimating fullshift job exposures by combining observed or selfreported time proportions of tasks in the job with task exposures from a data base [6065]. In some studies, the taskbased estimates appeared easy to obtain and, at the same time, well correlated with "true" job exposures (e.g. [66]), while other studies indicate that taskbased procedures can also be grossly inefficient [64,65]. Some attention has been given to developing mathematical principles for assessing the statistical performance of taskbased exposure modelling [34,67], but no studies have sofar, to our knowledge, addressed if taskbased assessment can, indeed, be costefficient as compared to direct measurement of job exposures, and if so, on which conditions. A similar concern can be raised with respect to other techniques for combining exposure information from different sources into a "hybrid" estimate of some exposure metric [68]. The approach can be statistically informative [68], but might also entail costs to the extent that the tradeoff between efficiency and resource consumption is disadvantageous as compared to measuring "true" exposures directly.
Statistical performance criterion
The present study addressed the objective of obtaining a precise estimate of the exposure mean value in a group of subjects (cf. equation (3)), the reason being that precision of the mean is a decisive factor for the usefulness of exposure surveys, and for statistical power in studies comparing conditions and groups. Other measures of statistical performance will, however, be of interest in other types of epidemiologic research, and thus need attention in future costefficiency research. A particularly important example is the size of bias and/or precision in a regression of outcome on exposure [1922]. Since both bias and precision can, under a number of assumptions, be expressed as mathematical functions of variance components and the number of measurements [18], it might be possible to develop closedform solutions to the problem of finding optimally costefficient measurement strategies, but this has not sofar been pursued. Another example that an exposure assessment strategy may have another purpose than producing a satisfying group exposure mean is standard surveillance of compliance with occupational exposure limits (OEL). First, the assessment focuses on individuals rather than groups, and second, the strategy needs assure that both the individual mean and the probability that single exposure values exceed the OEL is determined with a satisfying certainty [16,17]. Still another relevant measure of statistical performance for several purposes is the size of the standard reliability coefficient (ICC), i.e. the relationship between exposure variability in data sets with and without (random) measurement error [41].
Obviously, both for regression metrics, exceedance, and ICCs, optimally costefficient exposure assessment strategies may deviate from those driven by the objective of obtaining precise exposure means, as illustrated by two studies on optimal measurement allocation in reliability studies [69,70].
A particularly challenging situation comes up if the exposure assessment strategy has two simultaneous, yet conflicting objectives. For instance, the researcher may, at the same time, wish to get a precise estimate of a group mean exposure, but also a good estimate of exposure variance components between and within workers. This is a likely scenario if the specific exposure variability of the addressed occupational group is a priori insufficiently known, and the exposure data collection is viewed as an opportunity to get updated data on this variability, together with a documentation of the group mean exposure. Determination of variance components requires, as a minimum, duplicate samples at each stage of the measurement model [5], and this may often not be an optimally costefficient strategy if the objective is to get a precise group mean (cf. table 2ac; cases with n_{d }= 1). Thus, the researcher faces the decision of whether a certain loss in information on the group mean is an acceptable "price" of getting some information on exposure variability. While the numerical tradeoff between these two types of information, conditional on a restricted budget, may be resolved in future research, the final decision of which sampling allocation to prefer is an issue beyond mathematical procedures.
Recruitment capabilities and cost functions
While presenting a novel approach in allowing recruitment capabilities and, as a consequence, the corresponding cost functions to be nonlinear, the present study only addressed the case when nonlinearities can be expressed using homogeneous functions. This type of nonlinear production capabilities is often assumed in economics research, but other types of mathematical relationships may, obviously, be appropriate. Even cost functions that do not follow monotonous mathematical rules may apply, as illustrated by the example in Duan and Mage [42], where the basic shape of the cost function changes with the number of measurements, and by some examples in Cochran's excellent textbook [47]. We claim a strong need to bring forward more empirical evidence to suggest the appropriate shape of cost functions in exposure assessment; and if power relationships are, indeed, supported, to indicate reasonable sizes of the exponents α and β. Hypothetically, the recruitment of subjects could entail increasing marginal costs (α>1), as if additional time has to be devoted to persuading initially reluctant participants, but also decreasing costs (α<1), as if the first subjects are hard to recruit but their skeptic colleagues, taking after them, will then readily participate. Also, both increasing and decreasing marginal costs for organizing measurement occasions can be envisaged, as if a measurement equipment wears down over time and needs to be in place longer to provide a certain amount of data (β >1), or if a subject gets more and more accustomed to measurement preparations and thus less time consuming (β <1). As a tentative conjecture, however, considerable deviations of α from 1 are more likely to occur than deviations of β. In addition to the need for empirical data describing the shape of cost functions, information is also required concerning the size of unit costs for measuring at different stages; very little data has been reported in occupational or environmental epidemiology [37,43]. This stands in a striking contrast to the abundance of data on variance components for a multitude of occupational and environmental exposures, showing that the size of and relationship between exposure variabilities at different stages of measurement, e.g. subjects and occasions within subjects, differ widely between settings and exposure agents [3,911,25,71,72].
In the present study, optimization procedures were developed using a total cost model including only variable cost components (equation (4)). Other studies have addressed even fixed costs, i.e. costs that do not depend on the number of measurements [41,43]. While fixed costs are, under a constrained budget, decisive to the resources left for allocating measurements, they cancel out in the course of the mathematical differentiation associated with the optimization procedure, and thus will not affect the eventual optimal allocation strategy [43]. It is, however, important to notice that the optimization procedures in the present paper all refer to budgets where possible fixed costs have already been accounted for.
Analytical vs. numerical optimization
A complete closedform mathematical solution to costefficiency optimization was possible only when cost functions were linear, i.e. (α, β) = (1, 1), and in this case the allocation algorithms were consistent with previous studies [43,44,46,47]. When either α or β deviated from 1, neither the choice set boundaries nor an internal optimum could be explicitly determined, and if both deviated together, all optimization steps had to be performed using numerical methods. This suggests that explicit, formal expressions defining costefficient measurement allocations may only be obtainable if both cost functions and expressions of statistical performance are mathematically very simple. Thus, numerical optimization procedures might be the only alternative if, for instance, the objective (in casu variance) function contains not only nested components [11,29], or if the cost model does not express a straightforward relationship with the number of measurements [42]. This points to the idea of basing all optimization on numerical methods and ignore explicit solutions even in those cases where they do exist. However, we believe that mathematical expressions as developed in this paper may still be helpful as a screening tool for deciding whether the optimal strategy needs further (numerical) consideration, or whether it is merely situated at the boundary of the choice set, as in those frequent cases where as many subjects as possible should be measured on one occasion each (cf. table 2).
Sensitivity
The basic cost model
One important result of the present investigation was that for many combinations of unit costs and variance components, nonlinear cost functions did not change the general principle stated by a linear model: to measure from as many subjects as possible on one occasion each (cf. table 2). Thus, under these particular circumstances, the principle of how to optimize exposure assessment was not sensitive to the cost model, even if the eventual size of the data set allowed by budget constraints was influenced by nonlinearities in subject recruitment costs. At other combinations of variance components and unit costs, in particular when betweensubject variability was small compared to withinsubject variability and subject recruitment costs at the same time were large compared to costs for setting up measurement occasions, nonlinearities did, however, strongly affect both the optimal allocation principle and the eventual statistical performance. While, as mentioned above, examples of small between to withinsubject ratios of variance are abundant in the literature, relative sizes of unit costs are largely unknown, and thus we do not consider it justified sofar to form an opinion on the actual occurrence of such sensitive scenarios.
Uncertainties in input parameters
The procedures developed in the present study for identifying optimal exposure assessment strategies, whether analytical or numerical, rely on known values of unit costs, exponents in the cost function, and variance components. However, in a specific epidemiologic study, all of these inputs need be based on estimates associated with some degree of uncertainty. Thus, the derived "optimal" exposure assessment strategy will, in itself, be uncertain. Similar to the issue of cost function sensitivity discussed above, the principle of how to optimize exposure assessment seem, however, to be very robust to changes in unit costs and variance components when betweensubject variability is large compared to withinsubject variability and subject recruitment costs are small or similar to costs for setting up measurement occasions (table 2). Even the size of the eventual data set is robust to changes in exposure variability, as long as recruitment costs are small (table 2). If, however, recruitment unit costs are large, both the allocation and size of the optimal strategy is highly sensitive to the size of variance components, especially if recruitment costs accelerate with the number of subjects (α>1).
Even when closedform solutions are available for estimating the optimal choice of subjects and measurement occasions (equations (10) and (11)), a corresponding analytical expression of the uncertainty of these estimates may not be readily available. Optimization using numerical procedures evidently precludes any explicit mathematical representation of uncertainty. Thus, systematic analyses of the stability of optimized strategies to fluctuations in input variables need to be performed by numerical methods. Different approaches may then be viable, including Monte Carlo procedures (e.g. [73]), which will, however, require estimates of the distributions of input variables; and largescale resampling from empirical distributions as in bootstrapping [74]. Bootstrapping has been used successfully to address uncertainty in several occupational studies addressing exposure sampling efficiency [27,53,75], and is especially useful in cases when analytical methods are unavailable [12] or when assumptions underlying the analytical models are probably violated [35,54]. Bootstrapbased analysis of uncertainty has also been used successfully in health economics [76]. However, bootstrapping requires access to  preferably large  empirical data sets that can be used to represent the distributions of necessary variables. In the case of costefficiency optimization, this implies that extensive data, not available at present, are needed on unit costs, exponents in the cost function, and exposure variance components.
Deviations from the optimal strategy
For pragmatic reasons, exposure assessments in working life will rarely be carried out as planned (e.g. [37]). Thus, an intended optimal strategy may, in effect, be realized by collecting numbers of measurement units at different stages that deviate from the optimal choice, even if the total budget is still consumed. Presumably, the most likely deviations to occur appear in the form of slight departures from a completely balanced data set; for instance that some measurement occasions fail for some subjects but are compensated by more occasions from others. As noted from the numerical examples (table 2), statistical performance seems to be considerably more sensitive to nonoptimal strategies at some combinations of variance components, unit costs and cost function exponents than at others. However, this result concerns only nonoptimal strategies that are still balanced. The effects of unbalanced reallocations of measurements, which still consume the allowed budget, need to be determined in future studies. When facing scenarios that will be sensitive to deviations from the optimal strategy, we suggest, however, preparing for likely departures by designing an intentional oversampling.
Comparing costefficiencies
Comparing measurement allocations
Some previous studies on costefficient data collection have been devoted to comparing two or more alternative measurement strategies with respect to cost and efficiency, rather than identifying an optimal strategy. Thus, Armstrong compared the properties of two different instruments for retrieving the same exposure data [40,41], while Lemasters et al. [38] and Shukla et al. [39] devoted their studies to comparing different allocations of measurements using the same instrument. In the two latter studies, probably none of the compared strategies were optimal, but they were meant to represent feasible strategies in terms of e.g. logistics and selection constraints. The comparison approach to costefficiency analysis is considerably easier to deal with from a mathematical viewpoint than optimization as addressed in the present paper. A mere comparison also allows for both cost and output variance functions that cannot be addressed by analytical optimization procedures. Abstaining from optimization may thus represent a pragmatic level of analysis in cases where the principal objective is to decide for one of a number of possible exposure assessment strategies rather than determining an absolute optimum.
Comparing measurement instruments
While, as mentioned, some previous studies have addressed the issue of comparing the costefficiency of two alternative methods for obtaining the same exposure variable(s) [40,41], no attempts have been made on comparing two instruments in terms of their optimal performance under a constrained budget. This is an issue of obvious importance to a researcher or practitioner facing a decision on investments in new equipment or staff. For many occupational and environmental exposures, several alternative measurement instruments are available. For instance, working postures can be recorded using selfreports, observations and direct measurement tools [77,78]; i.e. methods associated with different costs and different statistical performance [79,80]. The procedures developed in the present paper can be used to identify an optimal measurement strategy for each method separately, including the resulting statistical performance, on which basis a comparison can be made. In this case, it is particularly important to acknowledge fixed costs with either method, since they determine the budget left for optimization.
Conclusion
In the present study, we demonstrated that nonlinearities in costs functions can have a significant influence on the principle of how to optimally allocate measurements between subjects and occasions within subjects. This happens if costs for recruiting subjects are large compared to costs for setting up measurement occasions, and, at the same time, the betweensubjects to withinsubject variance ratio is small. If, on the other hand, the betweensubjects variance is larger than or similar to the withinsubject variance, nonlinearities do not, in general, change the supremacy of measuring at one occasion from each of as many subjects as allowed by the budget. This principle applies in particular if the budget is large. Irrespective of the extent of exposure variability, however, nonlinear subject recruitment costs will affect the eventual size of the exposure data sample, and hence the precision of the resulting exposure mean value.
We noted a remarkable scarcity of empirical data on appropriate approximations of cost functions in exposure assessment, as well as on the sizes of costs pertaining to different measurement stages, for instance subjects and occasions within subjects.
Thus, in epidemiologic research requiring reliable exposure mean values, we suggest that exposure assessment strategies are discussed a priori, using the procedures developed in the present paper on educated estimates of relevant variance components, unit costs, and cost function shapes. This should lead to informed decisions on measurement strategies that pursue an optimal use of monetary resources, with due consideration as to whether the obtainable statistical performance is sufficient.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
SEM conceived of the study, derived some of the analytical procedures, performed all numerical simulations, and drafted major parts of the manuscript. KB derived most of the analytical procedures, and drafted significant parts of the manuscript. Both authors read and approved the final manuscript.
Appendix
The conditions for the objective function to be convex if β ≠ 1 (case B), can be derived as follows:
First, take the derivative of equation (14) with respect to n_{d}:
This expression will always be positive for β ≥ 2, and hence the objective function (equation (13)) convex. For β < 2 sufficient conditions for convexity follow from the inequality:
This last inequality is equivalent to:
, i.e.:
This inequality is true if β1 and are both positive or both negative.
Thus, to summarize, the objective function is always convex for β ≥ 2. For 1 < β < 2 and β < 1, it is convex if inequalities A1 and A2 apply, respectively.
Acknowledgements
The present study was supported by a grant from the Swedish Council for Working Life and Social Research (FAS Dnr. 20050183). The funding body had no influence on study design, analysis and interpretation of data, writing of the manuscript or decision to submit the paper for publication.
References

Kromhout H: Design of measurement strategies for workplace exposures.
Occup Environ Med 2002, 59:349354. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wahlström J, Mathiassen SE, Liv P, Hedlund P, Forsman M, Ahlgren C: Upper arm postures and movements in female hairdressers across four full working days.
Ann Occup Hyg 2010, 54:584594. PubMed Abstract  Publisher Full Text

Symanski E, Maberti S, Chan W: A metaanalytic approach for characterizing the withinworker and betweenworker sources of variation in occupational exposure.
Ann Occup Hyg 2006, 50:343357. PubMed Abstract  Publisher Full Text

Hansson GÅ, Balogh I, Ohlsson K, Granqvist L, Nordander C, Arvidsson I, Åkesson I, Unge J, Rittner R, Strömberg U, Skerfving S: Physical workload in various types of work: part I. Wrist and forearm.
Int J Ind Ergon 2009, 39:221233. Publisher Full Text

Loomis D, Kromhout H: Exposure variability: concepts and applications in occupational epidemiology.
Am J Ind Med 2004, 45:113122. PubMed Abstract  Publisher Full Text

Rappaport SM, Lyles RH, Kupper LL: An exposureassessment strategy accounting for within and betweenworker sources of variability.
Ann Occup Hyg 1995, 39:469495. PubMed Abstract  Publisher Full Text

Burdorf A, van Tongeren M: Variability in workplace exposures and the design of efficient measurement and control strategies.
Ann Occup Hyg 2003, 47:9599. PubMed Abstract  Publisher Full Text

Searle SR, Casella G, McCulloch CE: Variance components. New York: John Wiley & Sons; 1992.

Kromhout H, Symanski E, Rappaport SM: A comprehensive evaluation of within and betweenworker components of occupational exposure to chemical agents.
Ann Occup Hyg 1993, 37:253270. PubMed Abstract  Publisher Full Text

Kromhout H, Vermeulen R: Temporal, personal and spatial variability in dermal exposure.
Ann Occup Hyg 2001, 45:257273. PubMed Abstract  Publisher Full Text

Jackson JA, Mathiassen SE, Dempsey PG: Methodological variance associated with normalization of occupational upper trapezius EMG using submaximal reference contractions.
J Electromyogr Kinesiol 2009, 19:416427. PubMed Abstract  Publisher Full Text

Mathiassen SE, Burdorf A, van der Beek AJ: Statistical power and measurement allocation in ergonomic intervention studies assessing upper trapezius EMG amplitude. A case study of assembly work.
J Electromyogr Kinesiol 2002, 12:2739. PubMed Abstract  Publisher Full Text

Mathiassen SE, Möller T, Forsman M: Variability in mechanical exposure within and between individuals performing a highly constrained industrial work task.
Ergonomics 2003, 46:800824. PubMed Abstract  Publisher Full Text

Kromhout H, Tielemans E, Preller L, Heederick D: Estimates of individual dose from current measurements of exposure.

Tak S, Paquet V, Woskie S, Buchholz B, Punnett L: Variability in risk factors for knee injury in construction.
J Occup Environ Hyg 2009, 6:113120. PubMed Abstract  Publisher Full Text

TorneroVelez R, Symanski E, Kromhout H, Yu RC, Rappaport SM: Compliance versus risk in assessing occupational exposures.
Risk Anal 1997, 17:279292. PubMed Abstract  Publisher Full Text

Lyles RH, Kupper LL: On strategies for comparing occupational exposure data to limits.
Am Ind Hyg Assoc J 1996, 57:615. PubMed Abstract  Publisher Full Text

Tielemans E, Kupper LL, Kromhout H, Heederik D, Houba R: Individualbased and groupbased occupational exposure assessment: some equations to evaluate different strategies.
Ann Occup Hyg 1998, 42:115119. PubMed Abstract  Publisher Full Text

Burdorf A: Reducing random measurement error in assessing postural load on the back in epidemiological surveys.
Scand J Work Environ Health 1995, 21:1523. PubMed Abstract

Seixas NS, Sheppard L: Maximizing accuracy and precision using individual and grouped exposure assessments.
Scand J Work Environ Health 1996, 22:94101. PubMed Abstract

Reeves GK, Cox DR, Darby SC, Whitley E: Some aspects of measurement error in explanatory variables for continuous and binary regression models.
Stat Med 1998, 17:21572177. PubMed Abstract  Publisher Full Text

Ferrari P, Friedenreich C, Matthews CE: The role of measurement error in estimating levels of physical activity.
Am J Epidemiol 2007, 166:832840. PubMed Abstract  Publisher Full Text

Samuels SJ, Lemasters GK, Carson A: Statistical methods for describing occupational exposure measurements.
Am Ind Hyg Assoc J 1985, 46:427433. PubMed Abstract  Publisher Full Text

Chen CC, Chuang CL, Wu KY, Chan CC: Sampling strategies for occupational exposure assessment under generalized linear model.
Ann Occup Hyg 2009, 53:509521. PubMed Abstract  Publisher Full Text

Nordander C, Balogh I, Mathiassen SE, Ohlsson K, Unge J, Skerfving S, Hansson GÅ: Precision of measurements of physical workload during standardised manual handling. Part I: Surface electromyography of m. trapezius, m. infraspinatus and the forearm extensors.
J Electromyogr Kinesiol 2004, 14:443454. PubMed Abstract  Publisher Full Text

Symanski E, Rappaport SM: An investigation of the dependence of exposure variability on the interval between measurements.
Ann Occup Hyg 1994, 38:361372. PubMed Abstract  Publisher Full Text

Burdorf A, van Riel M: Design of strategies to assess lumbar posture during work.
Int J Ind Ergon 1996, 18:239249. Publisher Full Text

Kromhout H, Heederick D: Occupational epidemiology in the rubber industry: implications of exposure variability.
Am J Ind Med 1995, 27:171185. PubMed Abstract  Publisher Full Text

Lampa EG, Nilsson L, Liljelind IE, Bergdahl IA: Optimizing occupational exposure measurement strategies when estimating the logscale arithmetic mean value  an example from the reinforced plastics industry.
Ann Occup Hyg 2006, 50:371377. PubMed Abstract  Publisher Full Text

Peretz C, Goren A, Smid T, Kromhout H: Application of mixedeffects models for exposure assessment.
Ann Occup Hyg 2002, 46:6977. PubMed Abstract  Publisher Full Text

Burdorf A: Identification of determinants of exposure: consequences for measurement and control strategies.
Occup Environ Med 2005, 62:344350. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Rappaport SM, Weaver M, Taylor D, Kupper L, Susi P: Application of mixed models to assess exposures monitored by construction workers during hot processes.
Ann Occup Hyg 1999, 43:457469. PubMed Abstract  Publisher Full Text

Symanski E, Chan W, Chang CC: Mixedeffect models for the evaluation of longterm trends in exposure levels with an example from the nickel industry.
Ann Occup Hyg 2001, 45:7181. PubMed Abstract  Publisher Full Text

Mathiassen SE, Burdorf A, van der Beek AJ, Hansson GÅ: Efficient oneday sampling of mechanical job exposure data  a study based on upper trapezius activity in cleaners and office workers.
Am Ind Hyg Assoc J 2003, 64:196211. Publisher Full Text

Liv P, Mathiassen SE, Svendsen SW: Theoretical and empirical efficiency of sampling strategies for estimating upper arm elevation.
Ann Occup Hyg 2011, 55:436449. PubMed Abstract  Publisher Full Text

Rezagholi M, Mathiassen SE: Costefficient design of occupational exposure assessment strategies  a review.
Ann Occup Hyg 2010, 54:858868. PubMed Abstract  Publisher Full Text

Trask C, Teschke K, Village J, Chow Y, Johnson P, Luong N, Koehoorn M: Measuring low back injury risk factors in challenging work environments: an evaluation of cost and feasibility.
Am J Ind Med 2007, 50:687696. PubMed Abstract  Publisher Full Text

Lemasters GK, Shukla R, Li YD, Lockey JE: Balancing costs and precision in exposure assessment studies.
J Occup Environ Med 1996, 38:3945. PubMed Abstract  Publisher Full Text

Shukla R, Luo J, LeMasters GK, Grinshpun SA, Martuzevicius D: Sampling over time: developing a cost effective and precise exposure assessment program.
J Environ Monit 2005, 7:603607. PubMed Abstract  Publisher Full Text

Armstrong B: Study design for exposure assessment in epidemiological studies.
Sci Total Environ 1995, 168:187194. PubMed Abstract  Publisher Full Text

Armstrong BG: Optimizing power in allocating resources to exposure assessment in an epidemiologic study.
Am J Epidemiol 1996, 144:192197. PubMed Abstract  Publisher Full Text

Duan N, Mage DT: Combination of direct and indirect approaches for exposure assessment.

Whitmore RW, Pellizzari WD, Zelon HS, Michael LC, Quakenboss JJ: Cost/variance optimization for human exposure assessment studies.
J Expo Anal Environ Epidemiol 2005, 15:464472. PubMed Abstract  Publisher Full Text

Foster TA, Asztalos BF: Improved allocation of costs through analysis of variation in data: planning of laboratory studies.
Clin Chim Acta 2001, 314:5566. PubMed Abstract  Publisher Full Text

Stram DO, Longnecker MP, Shames L, Kolonel LN, Wilkens LR, Pike MC, Henderson BE: Costefficient design of a diet validation study.
Am J Epidemiol 1995, 142:353362. PubMed Abstract

Allison DB, Allison RL, Faith MS, Paultre F, PiSunyer FX: Power and money: Designing statistically powerful studies while minimizing financial costs.

Cochran WG: Sampling techniques. 3rd edition. New York: John Wiley & Sons; 1977.

Spiegelman D, Gray R: Costefficient study designs for binary response data with gaussian covariate measurement error.
Biometrics 1991, 47:851869. PubMed Abstract  Publisher Full Text

Spiegelman D: Costefficient study designs for relative risk modeling with covariate measurement error.
J Stat Plan Inference 1994, 42:187208. Publisher Full Text

Groves RM: Survey errors and survey costs. Hoboken, NJ: John Wiley & Sons; 2004.

Richter JM, Mathiassen SE, Slijper HP, Over EAB, Frens MA: Differences in muscle load between computer and noncomputer work among office workers.
Ergonomics 2009, 52:15401555. PubMed Abstract  Publisher Full Text

Möller T, Mathiassen SE, Franzon H, Kihlberg S: Job enlargement and mechanical exposure variability in cyclic assembly work.
Ergonomics 2004, 47:1940. PubMed Abstract  Publisher Full Text

Fethke NB, Anton D, Cavanaugh JE, Gerr F, Cook TM: Bootstrap exploration of the duration of surface electromyography sampling in relation to the precision of exposure estimation.
Scand J Work Environ Health 2007, 33:358367. PubMed Abstract  Publisher Full Text

Mathiassen SE, Paquet V: The ability of limited exposure sampling to detect effects of interventions that reduce the occurrence of pronounced trunk inclination.
Appl Ergon 2010, 41:295304. PubMed Abstract  Publisher Full Text

Westgaard RH, Vasseljen O, Holte KA: Trapezius muscle activity as a risk factor for shoulder and neck pain in female service workers with low biomechanical exposure.
Ergonomics 2001, 44:339353. PubMed Abstract

Sydsæter K, Hammond P, Seierstad A, Strøm A: Further Mathematics for Economic Analysis. Upper Saddle River, NJ: Prentice Hall; 2005.

Lyles RH, Kupper LL, Rappaport SM: A lognormal distributionbased exposure assessment method for unbalanced data.
Ann Occup Hyg 1997, 41:6376. PubMed Abstract  Publisher Full Text

Hoozemans MJM, Burdorf A, van der Beek AJ, FringsDresen MHW, Mathiassen SE: Groupbased measurement strategies in exposure assessment explored by bootstrapping.
Scand J Work Environ Health 2001, 27:125132. PubMed Abstract  Publisher Full Text

Burstyn I, Cherry NM, Yasui Y, Kim HM: Relative performance of different exposure modeling approaches for sulfur dioxide concentrations in the air in rural western Canada.
BMC Med Res Methodol 2008, 8:43. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Bernard TE, Joseph BS: Estimation of metabolic rate using qualitative job descriptors.
Am Ind Hyg Assoc J 1994, 55:10211029. PubMed Abstract  Publisher Full Text

Pernold G, Wigaeus Hjelm E, Wiktorin C, Mortimer M, Karlsson E, Kilbom Å, Vingård E, MUSICNorrtälje Study Group: Validity of occupational energy expenditure assessed by interview.
Am Ind Hyg Assoc J 2002, 63:2933. Publisher Full Text

Seixas NS, Sheppard L, Neitzel R: Comparison of taskbased estimates with fullshift measurements of noise exposure.

Mathiassen SE, Nordander C, Svendsen SW, Wellman HM, Dempsey PG: Taskbased estimation of mechanical job exposure in occupational groups.
Scand J Work Environ Health 2005, 31:138151. PubMed Abstract  Publisher Full Text

Svendsen SW, Mathiassen SE, Bonde JP: Taskbased exposure assessment in ergonomic epidemiology  a study of upper arm elevation in the jobs of machinists, car mechanics, and house painters.
Occup Environ Med 2005, 62:1826. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Chen JC, Chang WR, Shih TS, Chen CJ, Chang WP, Dennerlein JT, Ryan LM, Christiani DC: Using exposure prediction rules for exposure assessment: an example on wholebody vibration in taxi drivers.
Epidemiology 2004, 15:293299. PubMed Abstract  Publisher Full Text

Nicas M, Spear RC: A taskbased statistical model of a worker's exposure distribution: part I  description of the model.
Am Ind Hyg Assoc J 1993, 54:211220. PubMed Abstract  Publisher Full Text

Neitzel R, Daniell W, Sheppard L, Davies H, Seixas N: Improving exposure estimates by combining exposure information.
Ann Occup Hyg 2011, 55:537547. PubMed Abstract  Publisher Full Text

Eliasziw M, Donner A: A costfunction approach to the design of reliability studies.
Stat Med 1987, 6:647655. PubMed Abstract  Publisher Full Text

Shoukri MM, Asyali MH, Walter SD: Issues of cost and efficiency in the design of reliability studies.
Biometrics 2003, 59:11071112. PubMed Abstract  Publisher Full Text

Hansson GÅ, Arvidsson I, Ohlsson K, Nordander C, Mathiassen SE, Skerfving S, Balogh I: Precision of measurements of physical workload during standardised manual handling. Part II: Inclinometry of head, upper back, neck and upper arms.
J Electromyogr Kinesiol 2006, 16:125136. PubMed Abstract  Publisher Full Text

Balogh I, Ohlsson K, Nordander C, Skerfving S, Hansson GÅ: Precision of measurements of physical workload during standardized manual handling part III: Goniometry of the wrists.
J Electromyogr Kinesiol 2009, 19:10051012. PubMed Abstract  Publisher Full Text

Semple SE, Proud LA, Cherrie JW: Use of Monte Carlo simulation to investigate uncertainty in exposure modeling.
Scand J Work Environ Health 2003, 29:347353. PubMed Abstract  Publisher Full Text

Davison AC, Hinkley DV: Bootstrap methods and their applications. Cambridge: Cambridge University Press; 1997.

Paquet V, Punnett L, Woskie S, Buchholz B: Reliable exposure assessment strategies for physical ergonomics stressors in construction and other nonroutinized work.
Ergonomics 2005, 48:12001219. PubMed Abstract  Publisher Full Text

Briggs AH, Wonderling DE, Mooney CZ: Pulling costeffectiveness analysis up by its bootstraps: a nonparametric approach to confidence interval estimation.
Health Econ 1997, 6:327340. PubMed Abstract  Publisher Full Text

Teschke K, Trask C, Johnson P, Chow Y, Village J, Koehoorn M: Measuring posture for epidemiology: comparing inclinometry, observations and selfreports.
Ergonomics 2009, 52:10671078. PubMed Abstract  Publisher Full Text

Spielholz P, Silverstein B, Morgan M, Checkoway H, Kaufman J: Comparison of selfreport, video observation and direct measurement methods for upper extremity musculoskeletal disorder physical risk factors.
Ergonomics 2001, 44:588613. PubMed Abstract

Winkel J, Mathiassen SE: Assessment of physical work load in epidemiologic studies: concepts, issues and operational considerations.
Ergonomics 1994, 37:979988. PubMed Abstract  Publisher Full Text

van der Beek AJ, FringsDresen MHW: Assessment of mechanical exposure in ergonomic epidemiology.
Occup Environ Med 1998, 55:291299. PubMed Abstract  Publisher Full Text  PubMed Central Full Text
Prepublication history
The prepublication history for this paper can be accessed here: