Abstract
Background
Longitudinal studies often employ complex sample designs to optimize sample size, overrepresenting population groups of interest. The effect of sample design on parameter estimates is quite often ignored, particularly when fitting survival models. Another major problem in longterm cohort studies is the potential bias due to loss to followup.
Methods
In this paper we simulated a dataset with approximately 50,000 individuals as the target population and 15,000 participants to be followed up for 40 years, both based on real cohort studies of cardiovascular diseases. Two sample strategies  simple random (our golden standard) and Stratified by professional group, with nonproportional allocation  and two loss to followup scenarios  noninformative censoring and losses related to the professional group  were analyzed.
Results
Two modeling approaches were evaluated: weighted and nonweighted fit. Our results indicate that under the correctly specified model, ignoring the sample weights does not affect the results. However, the model ignoring the interaction of sample strata with the variable of interest and the crude estimates were highly biased.
Conclusions
In epidemiological studies misspecification should always be considered, as different sources of variability, related to the individuals and not captured by the covariates, are always present. Therefore, allowance must be made for the possibility of unknown confounders and interactions with the main variable of interest in our data. It is strongly recommended always to correct by sample weights.
Background
It is widely acknowledged, both theoretically and in practice, that incorporating design features into estimation of descriptive parameters, such as prevalence, can help avoid bias and reduce standard errors [14]. However, in spite of the consensus in the statistical environment, grounded in clear evidence and well established procedures to deal with complex sample strategies in survival modeling [5], these principles are quite often ignored in applied settings. For instance, recently published studies [6,7] using wellknown cohort data (MESA [8] and MONICA [9]) neither incorporate design weighting into the analysis, nor discuss its appropriateness.
This paper was motivated by discussion of the sample strategy used in a recent large multicenter cohort study, with approximately 50,000 people as the target population and 15,000 participants to be followedup for at least 20 years [10]. The participants were selected by nonproportional stratified sampling. The main aim here is to present  as clearly as possible for nonstatistical researchers  the impact of ignoring sample design, and thus to contribute to improving data analysis practice in epidemiology. In this case study we evaluate the impact of sampling weights and loss to followup on estimation of the parameters of a Cox proportional hazard model, by evaluating bias and precision.
Stratified random sampling involves dividing the population members into nonoverlapping groups called strata, defined by selected characteristics and each sampled separately. Varying sample fractions by stratum improves the efficiency of sample design and estimators for relatively small but important population subgroups. As the proportion of the samples in each stratum varies, the weight of each individual will be proportional to the inverse of the sample fraction in the respective group, as described in Kish (1965) [4]. Computing those weights gives each stratum the same relative importance as it displays in the population. In a Stratified sample, as the association between exposure and the event may vary within each stratum, estimation of the marginal association  the average association in the entire population  should consider the individual, and varying, probability of being included in the sample.
Varying sample weights across the strata may induce a difference between the probability distributions for the outcome in the sample and in the population, because of the covariates included in the model. In such cases the design carries information about the outcome, and is therefore considered informative or nonignorable.
In a survival model, where timetoevent T is the response variable, x the covariates vector and z the design factor, if z is not related to T x, the design factor z is ignorable. Boudreau and Lawless [11] analyzed the impact of sampling design on the Cox proportional hazards model, considering both clustering and stratification. If the sampling design is ignorable, both weighted and unweighted procedures are asymptotically unbiased and should yield similar point estimates. However, if the sampling design is nonignorable, consistent estimation can be achieved by introducing design weights into the estimating functions, as proposed by Binder (1992) [12] and Lin (2000) [13].
Another major problem in longterm cohort studies is potential bias due to loss to followup. This problem is widely recognized and several approaches deal with it [14]. The Cox model assumes noninformative censoring.
However, this is an unwarranted assumption in longterm cohort studies, and differential losses related to the sampling strata may increase the bias. Lawless (2003) [5] discusses these issues further and considers the use of timevarying weights that deal at the same time with a nonignorable sampling plan and nonignorable censoring.
The next section presents the case study, describing the simulated population and two different scenarios of loss to followup. Next the sample plan strategies and model fitting are presented. The results section uses a graphical representation to make the discussion of the impact of ignoring sample design more accessible to nonmathematical readers.
Methods: Simulation exercise
The population
A population of 52750 individuals belonging to three sampling strata was generated. As the focus of our motivating exemple was a study in a working population, we defined the strata by occupational category, which relates to socioeconomic status. The groups, in descending order of occupational category, were: professionals (50.5%), technicians (28%) and administrative staff (21.5%). The age and socioeconomic distributions were based on an epidemiological study with census data involving all employees at a Brazilian university [15]. The prevalence for the exposure variable, smoking, was based on the same data: 15.9% smokers among professionals, 20.9% among technicians and 25.3% administrative staff. Myocardial infarction (MI) was the event of interest. To generate age at infarction, which defined timetoevent T_{i}, we used data from a Spanish study [16]. We considered only administrative censoring at 40 years of followup for all surviving subjects.
Smoking affected survival in interaction with the occupational position: hazard ratios of 1.5 among professionals, 2.0 among technicians and 3.0 among administrative staff. In addition, as the occupational strata are related to socioeconomic position, hazard increased by 50% and tripled in the technicians and administrative strata, respectively, as compared to the professionals stratum. Summarizing, the equation to generate the timetoevent data was:
The Weibull density equation and curves for timetoevent using the parameters above are presented in Figure 1.
Figure 1. Weibull distribution. Effect of changing the scale parameter on the timetoevent curve based on the simulated scenario.
The sample plans
The sample size estimated in our motivating exemple [10] was 15,000 people. In order to increase the power in the administrative and technicians strata, these groups were oversampled: 3,000 individuals in the professionals group, 4,500 in the technicians group and 7,500 in the administrative staff. Therefore the weights  the inverse of the selection probability  in each group was 8.89, 3.29 and 1.50, respectively. A simple random sample was extracted for comparison.
We generated 2,000 samples, with 15,000 individuals each, for both random and stratified sample plans. To evaluate the impact of loss to followup we used the same samples as already simulated, censoring individuals that had experienced the event. Two different scenarios were defined: a 15% random loss and a differential loss by sample strata (professionals with 8% loss, technician with 12% and administrative with 20%).
Model fitting
Each sample was fitted using Cox proportional hazards model. The first  Full (Eq:1)  model used the same information that generated the population, except for the parametric Weibull curve. The second  Marginal (Eq:2)  model included the strata as independent terms, but not interacting with our variable of interest smoking. The last model, without the design factor, is the Smokeonly (Eq:3) model.
The population parameters for each model are given in Table 1. As we know the model and parameters that generated this population, both models 2 and 3 are incomplete. However, except in simulation studies, the complete model is never known and significant covariates are often ignored. Therefore, even under a misspecified model, it is important to compare the sample estimates with the "true" value. A Gaussian kernel was used to present the distribution of the 2,000 sample estimates for each model and strategy.
Table 1. Estimated Population Hazard Ratios for each fitted model
Results and Discussion
Comparison of sampling schemes under different models
Considering the Full model, both sample designs and fitting strategies give nonbiased estimates. For the designrelated variables, variance in parameter estimates is slightly smaller with simple random sampling than with weighted sampling. On the other hand, the variance in the samples for interaction of smoking with each professional category changes with the sample weighting: it is smaller for the professional stratum, larger for the technicians and much larger for the administrative group (Figure 2). Note that when the model is completely specified, whether or not the weights are included in stratified sampling, almost exactly the same point estimates are returned.
Figure 2. Simulated Hazard Ratios under the Full Model. Correctly specified model returns exactly the same results independently of considering sample weights.
The Marginal model, with just a common effect of smoking across all strata, presents similar and unbiased estimates for the design factor for both sample designs (Figure 3), when compared with the Marginal population parameters. The hazard ratios for smoking, whether with random sampling or in the model with sample design correction, were similar and nonbiased. However, those estimates were strongly biased when the sampling weights were not included in the model: the probability distribution for the estimates did not include the true value of the parameters, with 95% confidence. The argument in favor of not including the sample weights is that it improves precision [17,18], but in our example the increased precision excluded the true value of the parameter.
Figure 3. Simulated Hazard Ratios under Marginal Model. Large difference is observed for the hazards associated with smoking when fitting without sample weights, if the model does not include the interaction with professional category.
The Smokeonly model returned very similar results (Figure 4), with smaller variance but strong bias. The average risk for smoking, ignoring the interaction with professional category, is really 2.21 (Table 1). The misspecification of the model in this case caused an overestimation of the smoking effect, as it absorbed the effect of professional category. Most studies include the variable indicating the sampling strata in the model, even when ignoring the sample weights [6,7], considering that this, unfortunately insufficient, procedure will correct for the design effect. The crude estimated effect, usually used in exploratory analysis and to select the most important variables, is also misleading, as are the KaplanMeier estimates and MantelHaenszel (or logrank) tests [19]. Although correcting for the sample weights is possible and simple, it is rarely done.
Figure 4. Simulated Hazard Ratios under Smokeonly Model. The pattern is similar to the Marginal model, with similar bias.
Comparison of modeling strategies in terms of loss to followup
Random loss is a noninformative censoring mechanism. Therefore it affects only precision, with results similar to those presented in the previous section (Figures 5 to 7, upper frames). If the model is well specified, the covariate associated with loss will absorb the loss to followup, as shown in Figure 5. As expected, because this is informative censoring, the larger losses in the administrative category decreases its hazard in all models and all sample strategies.
Figure 5. Simulated Hazard Ratios with loss to followup under Full Model. The upper frames show the random loss to followup and the lower ones the nonrandom censoring.
Figure 6. Simulated Hazard Ratios with loss to followup under Marginal Model. The upper frames, with the random loss to followup, show the bias for the smokinghazard ratio for the nonweighted model. The lower frames with nonrandom censoring show the bias for all models.
Figure 7. Simulated Hazard Ratios with loss to followup under Smokeonly Model. The upper frame shows the bias for the nonweighted smokeonly model and the lower one the bias for all approaches due to nonrandom loss.
The Marginal model (Figure 6), with nonweighted fit, displays a bias for smoking similar to the same model without losses (Figure 3). Attrition is a recognized problem in longitudinal studies [20]. Yang and Shoptaw (2005) [21] present a thorough discussion of conceptual and practical issues in analyzing incomplete longitudinal data. However, in our simulations, the impact of ignoring the sample weights is larger than the impact of dropout, which is not as large in our example as in some of the studies discussed. The bias for smoking in the Smokeonly model (Figure 7) points in two directions. When the sampling weights were not included, it overestimates the hazard for smoking. On the other hand, as losses were larger in the administrative stratum, the values of the estimates decrease in the random sample and in the weighted model. This feature was already present, although not as visible, in the Marginal nonrandom loss to followup. Analyzis of the crude effect of smoking, using a MantelHaenszel test, should include the nonadministrative censored group as a separate category.
Overall comparison
The average variance of the estimates for each covariate (Figure 8) is very similar in both weighted and random sampling models, both with and without loss to followup. As expected, with the smaller number of events due to the losses, the average variance shifted towards higher values. The pattern of the nonweighted model is for the mean variance for the smoking variable, isolated or in interactions, to decrease, except for smoking among professionals. The variance, in the latter case, is very large because both the total number of observations and the hazard in this category are small.
Figure 8. Average Variance of Estimates according to two Scenarios: without loss and with nonrandom loss to followup. The upper frame, without loss, shows smaller variance than the lower one and a similar pattern.
Mean square error (MSE) is the sum of the variance and the squared bias of the estimates. This statistic is a good summary of the quality of a point estimate, as it combines the random and systematic error [22,23]. The coincidence between random sampling and weighted model in Figure 9 is the same as described previously for the average variance of the estimates. However, in the nonweighted model, the systematic error predominates, making it the worst fit for all variables, except for the interaction of smoking with the technicians and administrative staff. The loss to followup simulations displayed similar patterns, with much larger MSE.
Figure 9. Mean Square Error according to two Scenarios: without loss and with nonrandom loss to followup. Both simulations, without loss (upper frame) and with loss (lower one), display a similar pattern, with the nonweighted model performing much worse.
The simulation exercise was restricted to Cox regression, with only a few scenarios. We tested many different scenarios with other covariates, omitted risk factors, and so on, but decided to present only these simpler models, so as to highlight the impact of ignoring the sample weights. Evidently, the large disparity in sample weights favored clear demonstration of the bias. However, these sample weights reflect our experience. Other modeling approaches, such as repeated measures analysis, were not implemented, and different results could be obtained.
If nonadministrative censoring is considerable, then a valuable tool is to take a subdistribution hazard approach, reweighting individuals in the risk set. The sample weighting itself could be recalculated at each dropout [24].
Conclusions
Quite often researchers do not include either sample weights or strata indicators in statistical models. Yeboah et al (2010) [19] used only white race in a univariate model, in spite of the four strata (white, AfricanAmerican, Hispanic and Asian) that defined the sample strata in MESA [8]. Race was included as a common covariate, and excluded from the multivariate models. Neither the six study communities nor the sample weights were mentioned. Two other papers on the same cohort were more careful. Polonsky et al (2010) [25] controlled for race. Bertoni et al (2010) [6] not only included race, but tested for interaction with the main exposure variable. Neither evaluated the impact of the study communities.
Our results confirmed that, in a correctlyspecified model, ignoring the weights does not change the estimated parameters, and precision may improve (a result theoretically proven for inference based on ordinary least squares) [26,27]. As suggested by Winship and Radbill (1994) [28], the decision whether or not to include the weights in the model should be based on the role of the stratifying variable. In the presence of interaction between the stratifying variable and other independent variable not included in the model, bias will be introduced if sample weights are not considered. However, the correct model is only known for simulated populations. Also strata are usually chosen to increase the sample size of populations whose characteristics are important to the outcome under study.
The primary objective of analyzing survey data is to make inferences about the population of interest [29]. Therefore survey planning starts by defining the target population, to which results will be referenced [2,30]. The role of the population of reference in analysis of survey data is related to the meaning of the error term of the statistical model. In the physical sciences, the error of a regression is considered a measurement error. Epidemiology, however, besides measurement, has to consider different sources of variability relating to individuals, and not captured by the covariates included in the model [2]. Actually, this reasoning lies behind the development of random effect ("frailty") models in survival analysis [31]. Another issue is the use of crude estimates. The usual practice in epidemiology is to control for confounders. However, public health policies may need those numbers to estimate disease burden or to evaluate the impact of targeting specific risk factors. The Smokeonly model (Eq:3) gives exactly the desired estimate for these purposes. The correct numbers should thus be given, using the appropriate weighting in an uncontrolled model.
The stratification by professional categories, which assigns much larger weight to the lower social stratum, was guided by the need to increase the power to detect socialrelated risk factors. Nevertheless, almost any covariate displays different prevalence in different socioeconomic groups. Also almost all covariates interact, positively or negatively, changing the risk. Smoking itself presents similar physiological risk across socioeconomic strata. However, belonging to the most deprived stratum implies differences in other risk factors such as larger body mass index, worse diet, inadequate exercise, all associated with cardiovascular diseases, and these are the known and easilymeasured risk factors. Unknown or unreliable measures, such as stress or mental health, will always exist. Therefore allowance has to be made for the possibility of unknown confounders and interactions in our data associated with the sample strata. Rubin [32] recommends that observational studies should approximate randomized experiments, and that the assignment mechanism, in our case smoking or not smoking, should be as unconfounded as possible. Graubard and Korn (2002) [33] recommend weighted estimators, as they believe their modelfree aspects outweigh their potential inefficiency. On the same reasoning, we strongly recommend always correcting by sample weights.
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
Both authors designed, analyzed and wrote the paper.
Acknowledgements
CCC and MSC received support from the Brazilian Research Council (CNPq); MSC was also funded by the Rio de Janeiro State Research Foundation (FAPERJ).
References

Kalsbeek W, Heiss G: Building bridges between populations and samples in epidemiological studies. [http://dx.doi.org/10.1146/annurev.publhealth.21.1.147] webcite
Annu Rev Public Health 2000, 21:147169. PubMed Abstract  Publisher Full Text

Xie Y: Otis Dudley Duncan's legacy: The demographic approach to quantitative reasoning in social science.
Research in Social Stratification and Mobility 2007, 25:141156. Publisher Full Text

DuMouchel W, Duncan G: Using sample survey weights in multiple regression analysis of stratified samples.
Journal of the American Statistical Association 1983, 78:535548. Publisher Full Text

Lawless J: Censoring and weighting in survival estimation from survey data.
Proceedings of the Survey Mehods Section, Statistical Society of Canada 2003 Annual Meeting, Statistical Society of Canada 2003.

Bertoni AG, Burke GL, Owusu JA, Carnethon MR, Vaidya D, Graham Barr G, Jenny NS, Ouyang P, Rotter JI: Inflammation and the Incidence of Type 2 Diabetes The MultiEthnic Study of Atherosclerosis (MESA).
Diabetes Care 2010, 33(4):804810. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Bopp M, Braun J, Faeh D, Gutzwiller F, Group SNCS: Establishing a followup of the Swiss MONICA participants (19841993):record linkage with census and mortality data.
BMC Public Health 2010, 10:562. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Bild D, Bluemke D, Burke G, Detrano R, Diez Roux A, Folsom A, Greenland P, Jacob DJ, Kronmal R, Liu K, Nelson J, O'Leary D, Saad M, Shea S, Szklo M, Tracy R: Multiethnic study of atherosclerosis: objectives and design.
American Journal of Epidemiology 2002, 156(9):871881. PubMed Abstract  Publisher Full Text

Böthig S: WHO MONICA Project: objectives and design.
International Jounal of Epidemiology 1989, 18(3 Suppl 1):S2937.

Conheça o ELSA [http://www.elsa.org.br] webcite

Boudreau C, Lawless JF: Survival analysis based on the proportional hazards model and survey data.

Binder DA: Fitting Cox's proportional hazards models from survey data.
Biometrika 1992, 79:139147. Publisher Full Text

Lin D: On fitting Cox's proportional hazards models to survey data.
Biometrika 2000, 87:3747. Publisher Full Text

Kristman V, Manno M, Côté P: Loss to followup in cohort studies: how much is too much?
European Journal of Epidemiology 2004, 19:751760. PubMed Abstract  Publisher Full Text

Faerstein E, Chor D, Lopes CS, Werneck GL: Estudo PróSaúde: características gerais e aspectos metodológicos. [http://www.scielo.br/pdf/rbepid/v8n4/12.pdf] webcite
Rev bras epidemiol 2005, 8:454466. Publisher Full Text

Marín A, Medrano MJ, González J, Pintado H, Compaired V, Bárcena M, Fustero MV, Tisaire J, Cucalón JM, Martín A, Boix R, Hernansanz F, Bueno J: Risk of ischaemic heart disease and acute myocardial infarction in a Spanish population: observational prospective study in a primarycare setting.
BMC Public Health 2006, 6:38. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

Little RJ, Lewitzky S, Heeringa S, Lepkowski J, Kessler RC: Assessment of weighting methodology for the National Comorbidity Survey.
American Journal of Epidemiology 1997, 146:439449. PubMed Abstract  Publisher Full Text

Korn EL, Graubard BI: Analysis of large health surveys: accounting for the sampling design.
Journal of the Royal Statistical Society 1995, 158(2):263295. Publisher Full Text

Yeboah J, McNamara CC, Jiang XC, Tabas I, Herrington DM, Burke GL, Shea S: Association of plasma sphingomyelin levels and incident coronary heart disease events in an adult population: MultiEthnic Study of Atherosclerosis.
Arterioscherosis, Thrombosis and Vascular Biology 2010, 30:628633. Publisher Full Text

Hardy SE, Allore H, Studenski SA: Missing Data: A Special Challenge in Aging Research.
Journal of the American Geriatrics Society 2009, 57(4):722729. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Yang X, Shoptaw S: Assessing missing data assumptions in longitudinal studies: an example using a smoking cessation trial.
Drug and Alcohol Dependence 2005, 77:213225. PubMed Abstract  Publisher Full Text

Gunst RF, Mason RL: Biased estimation in regression: an evaluation using mean squared error.
Journal of American Statistical Association 1977, 72(359):616628. Publisher Full Text

Cox D: Principles of Statistical Inference. Cambrigde University Press; 2006.

Putter H, Fiocco M, Geskus RB: Tutorial in biostatistics: competing risks and multistate models.
Statistics in Medicine 2007, 26:23892430. PubMed Abstract  Publisher Full Text

Polonsky TS, McClelland RL, Jorgensen NW, Bild DE, Burke GL, Guerci AD, Greenland P: Coronary Artery Calcium Score and Risk Classification for Coronary Heart Disease Prediction.
The Journal of the American Medical Association 2010, 303(16):16101616. Publisher Full Text

Holt D, Simth TMF, Winter PD: Regression analysis of data from complex surveys.
Journal of the Royal Statistical Society, Series A 1980, 143(Part 4):474487.

Nathan G, Holt D: The effect of survey design on regression analysis.
Journal of the Royal Statistical Society, Series B 1980, 42(3):377386.

Winship C, Radbill L: Sampling weights and regression analysis.
Sociological Methods & Research 1994, 23(2):230257. PubMed Abstract  Publisher Full Text

LaVange LM, Koch G, Shchwartz TA: Applying sample survey methods to clinical trials data.
Statistics in Medicine 2001, 20:26092623. PubMed Abstract  Publisher Full Text

Feder M, Nathan G, Pferffermann D: Multilevel modelling of complex survey longitudinal data with time varying random effects.

Vaupel JW, Manton KG, Stallard E: The impact of heterogeneity in individual frailty on the dynamics of mortality.
Demography 1979, 16(3):439454. PubMed Abstract  Publisher Full Text

Rubin DB: The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials.
Statistics in Medicine 2007, 26(1):2036. PubMed Abstract  Publisher Full Text

Graubard BI, Korn EL: Inference for superpopulation parameters using sample surveys.
Statistical Science 2002, 17(1):7396. Publisher Full Text
Prepublication history
The prepublication history for this paper can be accessed here: