Open Access Research article

Multivariate modeling to identify patterns in clinical data: the example of chest pain

Oliver Hirsch1*, Stefan Bösner1, Eyke Hüllermeier2, Robin Senge2, Krzysztof Dembczynski2 and Norbert Donner-Banzhoff1

Author Affiliations

1 Department of General Practice/Family Medicine, Philipps University Marburg, Germany

2 Department of Mathematics and Computer Science, Knowledge Engineering & Bioinformatics, Philipps University Marburg, Germany

For all author emails, please log on.

BMC Medical Research Methodology 2011, 11:155  doi:10.1186/1471-2288-11-155

Published: 22 November 2011



In chest pain, physicians are confronted with numerous interrelationships between symptoms and with evidence for or against classifying a patient into different diagnostic categories. The aim of our study was to find natural groups of patients on the basis of risk factors, history and clinical examination data which should then be validated with patients' final diagnoses.


We conducted a cross-sectional diagnostic study in 74 primary care practices to establish the validity of symptoms and findings for the diagnosis of coronary heart disease. A total of 1199 patients above age 35 presenting with chest pain were included in the study. General practitioners took a standardized history and performed a physical examination. They also recorded their preliminary diagnoses, investigations and management related to the patient's chest pain. We used multiple correspondence analysis (MCA) to examine associations on variable level, and multidimensional scaling (MDS), k-means and fuzzy cluster analyses to search for subgroups on patient level. We further used heatmaps to graphically illustrate the results.


A multiple correspondence analysis supported our data collection strategy on variable level. Six factors emerged from this analysis: „chest wall syndrome“, „vital threat“, „stomach and bowel pain“, „angina pectoris“, „chest infection syndrome“, and „ self-limiting chest pain“. MDS, k-means and fuzzy cluster analysis on patient level were not able to find distinct groups. The resulting cluster solutions were not interpretable and had insufficient statistical quality criteria.


Chest pain is a heterogeneous clinical category with no coherent associations between signs and symptoms on patient level.