Open Access Research article

Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups

Michael Marschollek1*, Mehmet Gövercin3, Stefan Rust1, Matthias Gietzelt2, Mareike Schulze1, Klaus-Hendrik Wolf2 and Elisabeth Steinhagen-Thiessen3

Author Affiliations

1 Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig - Institute of Technology and Hanover Medical School, Carl-Neuberg-Str. 1, 30625 Hanover, Germany

2 Peter L. Reichertz Institute for Medical Informatics, University of Braunschweig - Institute of Technology and Hanover Medical School, Mühlenpfordtstr. 23, 38106 Braunschweig, Germany

3 Geriatrics Research Group, Department of Geriatric Medicine, Charité University Medicine, Reinickendorfer Str. 61, 13347 Berlin, Germany

For all author emails, please log on.

BMC Medical Informatics and Decision Making 2012, 12:19  doi:10.1186/1472-6947-12-19

Published: 14 March 2012



Hospital in-patient falls constitute a prominent problem in terms of costs and consequences. Geriatric institutions are most often affected, and common screening tools cannot predict in-patient falls consistently. Our objectives are to derive comprehensible fall risk classification models from a large data set of geriatric in-patients' assessment data and to evaluate their predictive performance (aim#1), and to identify high-risk subgroups from the data (aim#2).


A data set of n = 5,176 single in-patient episodes covering 1.5 years of admissions to a geriatric hospital were extracted from the hospital's data base and matched with fall incident reports (n = 493). A classification tree model was induced using the C4.5 algorithm as well as a logistic regression model, and their predictive performance was evaluated. Furthermore, high-risk subgroups were identified from extracted classification rules with a support of more than 100 instances.


The classification tree model showed an overall classification accuracy of 66%, with a sensitivity of 55.4%, a specificity of 67.1%, positive and negative predictive values of 15% resp. 93.5%. Five high-risk groups were identified, defined by high age, low Barthel index, cognitive impairment, multi-medication and co-morbidity.


Our results show that a little more than half of the fallers may be identified correctly by our model, but the positive predictive value is too low to be applicable. Non-fallers, on the other hand, may be sorted out with the model quite well. The high-risk subgroups and the risk factors identified (age, low ADL score, cognitive impairment, institutionalization, polypharmacy and co-morbidity) reflect domain knowledge and may be used to screen certain subgroups of patients with a high risk of falling. Classification models derived from a large data set using data mining methods can compete with current dedicated fall risk screening tools, yet lack diagnostic precision. High-risk subgroups may be identified automatically from existing geriatric assessment data, especially when combined with domain knowledge in a hybrid classification model. Further work is necessary to validate our approach in a controlled prospective setting.

Accidental falls; Geriatric assessment; Data mining