Email updates

Keep up to date with the latest news and content from BMC Medical Research Methodology and BioMed Central.

Open Access Research article

Stratified sampling design and loss to follow-up in survival models: evaluation of efficiency and bias

Cibele C César1* and Marilia S Carvalho2

Author affiliations

1 Department of Statistics, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil

2 National School of Public Health, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil

For all author emails, please log on.

Citation and License

BMC Medical Research Methodology 2011, 11:99  doi:10.1186/1471-2288-11-99

Published: 26 June 2011



Longitudinal studies often employ complex sample designs to optimize sample size, over-representing population groups of interest. The effect of sample design on parameter estimates is quite often ignored, particularly when fitting survival models. Another major problem in long-term cohort studies is the potential bias due to loss to follow-up.


In this paper we simulated a dataset with approximately 50,000 individuals as the target population and 15,000 participants to be followed up for 40 years, both based on real cohort studies of cardiovascular diseases. Two sample strategies - simple random (our golden standard) and Stratified by professional group, with non-proportional allocation - and two loss to follow-up scenarios - non-informative censoring and losses related to the professional group - were analyzed.


Two modeling approaches were evaluated: weighted and non-weighted fit. Our results indicate that under the correctly specified model, ignoring the sample weights does not affect the results. However, the model ignoring the interaction of sample strata with the variable of interest and the crude estimates were highly biased.


In epidemiological studies misspecification should always be considered, as different sources of variability, related to the individuals and not captured by the covariates, are always present. Therefore, allowance must be made for the possibility of unknown confounders and interactions with the main variable of interest in our data. It is strongly recommended always to correct by sample weights.