Abstract
Background
Diabetic Retinopathy (DR) remains a leading cause of blindness among diabetic patients worldwide, necessitating early and accurate diagnostic interventions. While traditional screening methods rely heavily on manual ophthalmologic evaluations, recent advancements in machine learning (ML) and deep learning (DL) have opened new avenues for automated, scalable, and interpretable diagnostic tools. However, challenges persist in developing models that are not only high-performing but also transparent enough to gain clinical trust.
Objective
This study introduces a novel, standardized, and interpretable ML framework designed specifically to enhance diagnostic efficiency and accuracy for DR risk prediction. By prioritizing model interpretability alongside predictive performance, our approach aims to bridge the gap between cutting-edge AI technology and clinical applicability.
Methods
We evaluated eleven ML algorithms, optimizing hyperparameters via grid search and five-fold cross-validation to identify top-performing models. A key innovation lies in our dynamic weighted voting ensemble (Voting_soft), which integrates multiple classifiers based on model confidence, thereby leveraging the strengths of diverse algorithms. Model performance was rigorously assessed using accuracy, sensitivity, and area under the curve (AUC) metrics, with ROC and PR curves comparing performance across varying training dataset proportions. Crucially, we employed SHAP (SHapley Additive exPlanations) for interpretability analysis, providing clinicians with actionable insights into feature contributions.
Results
Through LightGBM-based correlation analysis and AUC curve determination, fourteen clinical features were identified as optimal predictors. Notably, the CatBoost model achieved superior performance on a 20% test set, while the Extreme Random Tree model demonstrated robustness on a 30% test set. Our dynamic weighted voting ensemble (Voting_soft) outperformed individual models in terms of AUC across both datasets. SHAP analysis revealed that age, triglycerides, sex, and HDL-C were key predictors of DR prevalence, offering clinically meaningful explanations for model decisions.
Conclusions
This study presents a groundbreaking ML-based DR risk prediction system that excels in both accuracy and interpretability. The integration of SHAP analysis not only enhances model transparency but also empowers clinicians with a deeper understanding of diagnostic decision-making, ultimately improving the precision and efficiency of DR screening. Our dynamic voting ensemble approach sets a new benchmark for interpretable, multi-model integration in medical diagnostics.
Get full access to this article
View all access options for this article.
