Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study
- Equal contributors
1 Department of Medicine, Yonsei University College of Medicine, Seoul, South Korea
2 Department of Medical Engineering, Yonsei University College of Medicine, Seoul, South Korea
3 Department of Preventive Medicine 6& Institute of Health Services Research, Yonsei University, Seoul, South Korea
BMC Medical Informatics and Decision Making 2013, 13:106 doi:10.1186/1472-6947-13-106Published: 13 September 2013
Blindness due to diabetic retinopathy (DR) is the major disability in diabetic patients. Although early management has shown to prevent vision loss, diabetic patients have a low rate of routine ophthalmologic examination. Hence, we developed and validated sparse learning models with the aim of identifying the risk of DR in diabetic patients.
Health records from the Korea National Health and Nutrition Examination Surveys (KNHANES) V-1 were used. The prediction models for DR were constructed using data from 327 diabetic patients, and were validated internally on 163 patients in the KNHANES V-1. External validation was performed using 562 diabetic patients in the KNHANES V-2. The learning models, including ridge, elastic net, and LASSO, were compared to the traditional indicators of DR.
Considering the Bayesian information criterion, LASSO predicted DR most efficiently. In the internal and external validation, LASSO was significantly superior to the traditional indicators by calculating the area under the curve (AUC) of the receiver operating characteristic. LASSO showed an AUC of 0.81 and an accuracy of 73.6% in the internal validation, and an AUC of 0.82 and an accuracy of 75.2% in the external validation.
The sparse learning model using LASSO was effective in analyzing the epidemiological underlying patterns of DR. This is the first study to develop a machine learning model to predict DR risk using health records. LASSO can be an excellent choice when both discriminative power and variable selection are important in the analysis of high-dimensional electronic health records.