Sage Journals: Discover world-class research

Abstract

Background

Diabetic Retinopathy (DR) remains a leading cause of blindness among diabetic patients worldwide, necessitating early and accurate diagnostic interventions. While traditional screening methods rely heavily on manual ophthalmologic evaluations, recent advancements in machine learning (ML) and deep learning (DL) have opened new avenues for automated, scalable, and interpretable diagnostic tools. However, challenges persist in developing models that are not only high-performing but also transparent enough to gain clinical trust.

Objective

This study introduces a novel, standardized, and interpretable ML framework designed specifically to enhance diagnostic efficiency and accuracy for DR risk prediction. By prioritizing model interpretability alongside predictive performance, our approach aims to bridge the gap between cutting-edge AI technology and clinical applicability.

Methods

We evaluated eleven ML algorithms, optimizing hyperparameters via grid search and five-fold cross-validation to identify top-performing models. A key innovation lies in our dynamic weighted voting ensemble (Voting_soft), which integrates multiple classifiers based on model confidence, thereby leveraging the strengths of diverse algorithms. Model performance was rigorously assessed using accuracy, sensitivity, and area under the curve (AUC) metrics, with ROC and PR curves comparing performance across varying training dataset proportions. Crucially, we employed SHAP (SHapley Additive exPlanations) for interpretability analysis, providing clinicians with actionable insights into feature contributions.

Results

Through LightGBM-based correlation analysis and AUC curve determination, fourteen clinical features were identified as optimal predictors. Notably, the CatBoost model achieved superior performance on a 20% test set, while the Extreme Random Tree model demonstrated robustness on a 30% test set. Our dynamic weighted voting ensemble (Voting_soft) outperformed individual models in terms of AUC across both datasets. SHAP analysis revealed that age, triglycerides, sex, and HDL-C were key predictors of DR prevalence, offering clinically meaningful explanations for model decisions.

Conclusions

This study presents a groundbreaking ML-based DR risk prediction system that excels in both accuracy and interpretability. The integration of SHAP analysis not only enhances model transparency but also empowers clinicians with a deeper understanding of diagnostic decision-making, ultimately improving the precision and efficiency of DR screening. Our dynamic voting ensemble approach sets a new benchmark for interpretable, multi-model integration in medical diagnostics.

Keywords

machine learning SHAP diabetic retinopathy optimization prediction

Get full access to this article

View all access options for this article.

References

Yin

. Diabetic retinopathy: looking forward to 2030. Front Endocrinol (Lausanne). 2023; 13: 1077669.

Lin

Hsih

Lin

, et al. Update in the epidemiology, risk factors, screening, and treatment of diabetic retinopathy. J Diabetes Investig. 2021; 12: 1322–1325.

Ling

YihChung

Chak

MYY

, et al. Global prevalence of diabetic retinopathy and projection of burden through 2045: systematic review and meta-analysis. Ophthalmology 2021; 128: 1580–1591.

Shi

. Progress of machine learning in clinical research. China Digit Med 2025; 20: 1–10.

Sun

Liu

, et al. Research progress of deep learning in the field of diabetic retinopathy classification. Comput Eng Applic. 2024; 60: 16–30.

Wang

. Research on the risk prediction model of diabetic retinopathy occurrence based on machine learning. Qufu Normal University, 2024.

. Risk factor analysis and prediction model construction of diabetic retinopathy. Jilin University, 2024.

Kim

Park

Son

, et al. Development and validation of a machine learning algorithm for predicting diabetes retinopathy in patients with type 2 diabetes: algorithm development study. JMIR Med Inform. 2025; 13: e58107.

Luong

Cheung

McMurtry

, et al. Comparison of machine learning models to a novel score in the identification of patients at low risk for diabetic retinopathy. Ophthalmol Sci 2025; 5: 100592–100592.

10.

Zhao

Jin

Xiao

, et al. Application of a multimodal model integrating computer vision and structured data in diabetic retinopathy referral. China Digit Med 2024; 19: 29–35.

11.

Zhang

, et al. The application and clinical translation of the self-evolving machine learning methods in predicting diabetic retinopathy and visualizing clinical transformation. Front Endocrinol (Lausanne). 2024; 15: 1429974–1429974.

12.

Pang

Luo

Zhang

, et al. Multi-Omics integration with machine learning identified early diabetic retinopathy, diabetic macula edema and anti-VEGF treatment response. Transl Vis Sci Technol. 2024; 13: 23.

13.

Lukashevich

. Diabetic retinopathy fundus image classification using ensemble methods. Pattern Recognit Image Anal. 2024; 34: 331–339.

14.

Abdalla

IMM

Mohanraj

. Revolutionizing diabetic retinopathy screening and management: the role of artificial intelligence and machine learning. World J Clin Cases. 2025; 13: 101306.

15.

Breiman

. Random forests. Mach Learn. 2001; 45: 5–32.

16.

Mitchell

. Machine learning . 1997.

17.

Chen

Benesty

. xgboost: Extreme Gradient Boosting. 2016.

18.

Kramer

. K-Nearest neighbors. Berlin Heidelberg: Springer, 2013.

19.

Geurts

Ernst

Wehenkel

. Extremely randomized trees. Mach Learn. 2006; 63: 3–42.

20.

Meng

. LightGBM: A highly efficient gradient boosting decision tree. In Neural information processing systems. Curran Associates Inc., 2017.

21.

Wang

Huang

Ren

, et al. In-process belt-image-based material removal rate monitoring for abrasive belt grinding using CatBoost algorithm. Int J Adv Manuf Technol 2022; 123: 2575–2591.

22.

Morra

Apostolova

, et al. Comparison of AdaBoost and support vector machines for detecting Alzheimer's disease through automated hippocampal segmentation. IEEE Trans Med Imaging. 2010; 29: 30–43.

23.

Rokach

. Gradient boosting machines. Series in Machine Perception and Artificial Intelligence, 2019.

24.

Liu

Zhu

Dai

, et al. Multi-task support vector machine classifier with generalized huber loss. J Classif. 2025; 42: 221–252.

25.

Sivakumar

Desai

. Image restoration using a multilayer perceptron with a multilevel sigmoidal function. IEEE Trans Signal Process. 1993; 41: 2018–2022.

26.

Zhao

Wang

, et al. A machine-learning-derived online prediction model for depression risk in COPD patients: a retrospective cohort study from CHARLS. J Affect Disord. 2025; 377: 284–293.

27.

Lundberg

Gabriel

Hugh

, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020; 2: 56–67.

28.

Hong

. Investigation on the prevalence of diabetic retinopathy and construction of clinical prediction model in plain wind-sand and loess hilly areas of Gansu Province. Gansu University of Chinese Medicine, 2024.

29.

Zhao

, et al. Using machine learning techniques to develop risk prediction models for the risk of incident diabetic retinopathy among patients with type 2 diabetes mellitus: a cohort study. Front Endocrinol 2022; 13: 876559.

30.

Tsao

Chan

. Predicting diabetic retinopathy and identifying interpretable biomedical features using machine learning algorithms. BMC Bioinf 2018; 19: 111–121.

31.

Ogunyemi

Kermah

. Machine learning approaches for detecting diabetic retinopathy from clinical and public health records. AMIA Annu Symp Proc 2015; 2015: 983–990.

32.

Yoo

Park

. Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study. BMC Med Inf Decis Making 2013; 13: 1–4.

33.

Hosseini

Maracy

Amini

, et al. A risk score development for diabetic retinopathy screening in Isfahan-Iran. J Res Med Sci Off J Isfahan Univ Med Sci 2009; 14: 105.

34.

Islam

Rahman

Rabby

, et al. Predicting the risk of diabetic retinopathy using explainable machine learning algorithms. Diabetes Metab Syndr Clin Res Rev 2023; 17: 11.

35.

Guan

. Analysis of risk factors and diagnostic model for diabetic nephropathy in patients with diabetic retinopathy. Anhui Medical University, 2024.

36.

Goldstein

Ding

Carasquillo

, et al. Prediction of proliferative diabetic retinopathy using machine learning in Latino and non-Hispanic black cohorts with routine blood and urine testing. Ophthalm Physiol Opt J Br College Ophthalm Optic (Optometrists) 2024; 45: 1549–1567.

37.

Yang

. Construction of clinical prediction model for retinopathy in Type 2 diabetes. Lanzhou University, 2023.

38.

Tebeje

Yenit

Nigatu

, et al. Prediction of diabetic retinopathy among type 2 diabetic patients in University of Gondar Comprehensive Specialized Hospital, 2006–2021: a prognostic model. Int J Med Inf. 2024; 190: 105536.

Interpretable machine learning algorithms for diagnostic prediction of diabetic retinopathy

Abstract

Background

Objective

Methods

Results

Conclusions

Keywords

Get full access to this article

References