Open Access Research article

Assessment of two different types of bias affecting the results of outcome-based evaluation in undergraduate medical education

Sarah Schiekirka12, Sven Anders3 and Tobias Raupach14*

Author Affiliations

1 Department of Cardiology and Pneumology, University Hospital Göttingen, Göttingen, Germany

2 Study Deanery of Göttingen Medical School, Göttingen, Germany

3 Department of Legal Medicine, University Medical Centre Hamburg Eppendorf, Hamburg, Germany

4 Department of Clinical, Educational and Health Psychology, University College London, London, UK

For all author emails, please log on.

BMC Medical Education 2014, 14:149  doi:10.1186/1472-6920-14-149

Published: 21 July 2014



Estimating learning outcome from comparative student self-ratings is a reliable and valid method to identify specific strengths and shortcomings in undergraduate medical curricula. However, requiring students to complete two evaluation forms (i.e. one before and one after teaching) might adversely affect response rates. Alternatively, students could be asked to rate their initial performance level retrospectively. This approach might threaten the validity of results due to response shift or effort justification bias.


Two consecutive cohorts of medical students enrolled in a six-week cardio-respiratory module were enrolled in this study. In both cohorts, performance gain was estimated for 33 specific learning objectives. In the first cohort, outcomes calculated from ratings provided before (pretest) and after (posttest) teaching were compared to outcomes derived from comparative self-ratings collected after teaching only (thentest and posttest). In the second cohort, only thentests and posttests were used to calculate outcomes, but data collection tools differed with regard to item presentation. In one group, thentest and posttest ratings were obtained sequentially on separate forms while in the other, both ratings were obtained simultaneously for each learning objective.


Using thentest ratings to calculate performance gain produced slightly higher values than using true pretest ratings. Direct comparison of then- and posttest ratings also yielded slightly higher performance gain than sequential ratings, but this effect was negligibly small.


Given the small effect sizes, using thentests appears to be equivalent to using true pretest ratings. Item presentation in the posttest does not significantly impact on results.

Undergraduate medical education; Evaluation; Learning outcome; Response shift bias