Abstract
The glycemia risk index (GRI) is an emerging metric designed to quantify the risk of both hypo- and hyperglycemia, providing a combined assessment of glycemic control quality. A high GRI is associated with an increased risk of diabetic complications. In this study, we leverage long-term continuous glucose monitoring (CGM) data to develop and validate predictive models for a high GRI (>60) in individuals with T1D. We assessed over 250 000 days of measurements collected over four years from 736 patients with type 1 diabetes. Our modeling approach shows promise for predicting glycemic control quality (area under the receiver operating characteristic curve [ROC-AUC] of 0.87) six to nine months from baseline. However, additional analysis and validation are imperative to determine its full clinical utility.
Introduction
Type 1 diabetes (T1D) is a chronic autoimmune condition characterized by the destruction of insulin-producing beta cells in the pancreas, leading to lifelong dependency on exogenous insulin therapy. Despite advances in diabetes management, maintaining optimal glycemic control remains a significant challenge for individuals with T1D, often resulting in periods of hyperglycemia (high blood glucose levels) and hypoglycemia (low blood glucose levels).1,2 Continuous glucose monitoring (CGM) technology has revolutionized diabetes management by providing real-time glucose readings, enabling more precise insulin dosing and timely interventions to prevent extreme glycemic events. The CGM also serves as a useful tool for assessing and quantifying the quality of glycemic control. 2
The glycemia risk index (GRI) 3 is an emerging metric designed to quantify the risk of both hypo- and hyperglycemia, offering a combined assessment of glycemic control quality. A high GRI is associated with increased risks of diabetic complications, including cardiovascular diseases, nephropathy, retinopathy, and reduced quality of life, increased diabetes-related stress, and lower satisfaction with treatment.3-6 Accurate prediction of glycemic control quality is important for developing personalized treatment strategies, minimizing complications, and improving the quality of life for individuals with T1D. 7
To date, most predictive initiatives in this domain have focused on short-term prediction horizons, typically within minutes or hours.8-11 While these models provide valuable insights for immediate glucose management, they fall short of addressing longer-term glycemic patterns. A limited number of studies have explored mid-range prediction horizons, such as week-to-week forecasts, which are crucial for understanding and managing more extended glycemic trends.12-14 A longer prediction horizon could be particularly beneficial in targeting patients at risk, allowing for better timely interventions and more effective long-term management strategies.
In this study, we leverage long-term CGM data to develop and validate predictive models for high GRI in individuals with T1D. By utilizing advanced machine-learning algorithms, we aim to identify key patterns and features within the CGM data that indicate elevated glycemic risk.
Methods
Data Material
We analyzed data from the previously published T1DiabetesGranada study, 15 which encompasses over 250 000 days of measurements collected over four years from 736 patients with T1D in Granada, Spain. The primary flash glucose monitoring (FGM) device used during the study was the FreeStyle Libre 2, with some initial use of the first version, FreeStyle Libre 1.
The cohort’s characteristics included a mean age of 40.3 ± 15.8 years, ranging from 12 to 81 years, with 373 female patients (50.68%) and 363 male patients (49.32%). On average, patients had 350.2 ± 284.2 days of glucose measurements.
Patients were included in our analysis if FGM data were available for at least 14 days of monitoring with ≥70% wear time 16 at baseline (0-30 days) and six to nine months past the baseline end.
Approach
We developed binary classification models to predict a high17,18 GRI (> 60, zone D-E) or low GRI (≤60) within a six- to nine-month period from baseline using predictors extracted from the FGM data at baseline. The threshold of GRI >60 was chosen based on prior studies that have identified GRI >60 as indicative of increased glycemic instability and a higher risk of adverse glycemic outcomes. 3 XGBoost was selected for its nonlinear modeling capabilities and its previously demonstrated high performance in the medical domain. 19 XGBoost is an ensemble learning method that combines multiple decision trees to produce a prediction. The output is a probability for, in our case, having a high GRI at six to nine months follow-up.
A 5-fold cross-validation approach was employed to estimate the performance of the modeling approach. The data set was divided into five subsets (or folds). In each iteration, four folds were used to train the model, and the remaining fold was held out for testing. This process is repeated five times, such that each fold is used as the test set exactly once. This validated approach ensures complete separation between training and testing data, reducing the risk of overfitting and providing a reliable estimate of the model’s performance on unseen data. It also maximizes the use of available data for modeling while maintaining strict evaluation standards. 20 To achieve optimal performance, tuning of the XGBoost model was performed on hyperparameters (learning rate, depth, n estimators) using a subsequently cross-validation on the training set for each iteration of the outer cross-validation.
To assess the clinical usefulness of our model, we used a net benefit approach,
21
which balances the advantages of correctly identifying at-risk individuals (true positives) against the drawbacks of incorrectly labeling individuals as high risk (false positives) at different threshold probabilities (
Predictors
Several metrics which have been linked to abnormal glucose control were calculated based on the data from the FGM baseline. The predictors included conventional statistical metrics (mean, median, standard deviation, coefficient of variation, interquartile range), GRI metrics 3 (GRI, GRI hyper component, GRI hypo component), metrics related to time in ranges16,22 (time in range [TIR] between 70 and 180 mg/dL, time below range [TBR1], between 54 and 69 mg/dL, time below range [TBR2] below 54 mg/dL, time above range [TAR1], between 181 and 250 mg/dL, and time above range [TAR2], exceeding 250 mg/dL, as well as time in tight range [TITR]). 23 Furthermore, age and gender at baseline were included as predictors.
Assessment and Explainability
Assessment of model performance involved the use of the area under the receiver operating characteristic curve (ROC-AUC) across folds, distribution of predicted values, calibration plot including the Brier score, 24 and net benefit plot. 21 . To enhance model interpretability, feature importance and explainability were evaluated through SHAP (SHapley Additive exPlanations) values. 25
All analyses were performed using MATLAB (R2021b), Python (v3), the Scikit-learn (v1.5.0) for machine-learning utilities, SHAP (v 0.42.1), and the XGBoost (v1.7.5).
Results
The analysis included 434 patients with T1D who had sufficient FGM measurements at baseline and at a six to nine-month follow-up. At follow-up, 138 patients exhibited high GRI scores (>60).
Figure 1 presents the assessment characteristics of the models for predicting patients with high GRI scores. The models achieved an average ROC-AUC of 0.87 (SD = 0.05) using cross-validation—indicates good discrimination ability of the approach, meaning it can effectively distinguish between patients with high and low GRI scores. In other words, it can predict a high proportion of patients having high GRI (true positive rate) at follow-up without falsely predicting a high proportion of patients with low GRI score at follow-up (false positive rate).

Illustrates (a) the ROC for each and the average cross-validation folds, (b) normalized predictions for meeting the low or high GRI (>60), (c) calibration plot and Brier score for the prediction model, (d) the Net Benefit curve for the model, treat all, or treat none, (e) violin plot for GRI in the predicted groups from baseline to 21 months, and (f) violin plot for TAR2 in the predicted groups from baseline to 21 months.
The Brier score of 0.13 reflects the model’s calibration or accuracy in predicting probabilities. It measures the mean squared difference between the predicted probabilities and the actual outcomes, with lower values indicating better calibration. A score of 0 indicates a perfectly calibrated model, while a score of 1 represents poor calibration. In addition, the net benefit plot compares the clinical utility of the model with two strategies: treating all patients and treating none. The net benefit quantifies the trade-off between the true positives captured and the false positives incurred at different threshold probabilities. A higher net benefit line for the model demonstrates its utility in identifying at-risk patients while minimizing unnecessary interventions. Furthermore, a suggested cutoff probability of 0.5 identified a group of patients at risk who exhibited persistently higher GRI values from baseline through the intervals of six to nine months, 12 to 15 months, and 18 to 21 months in the future, as observed in Figure 1e. The proposed cut-off demonstrates good classification performance, achieving a true positive rate of 0.70, a false positive rate of 0.10, a positive predictive value of 0.76, a negative predictive value of 0.86, and an overall accuracy of 0.83. A total of 11% had an initial baseline GRI < 60.
For the explainability analysis, the mean SHAP value of GRI at baseline accounted for 49% of the model’s total impact, while the subsequently most important predictors, standard deviation, TIR, TITR, and interquartile range (IQR) contributed 12%, 7%, 6%, and 6%, respectively, to the total impact. This indicates that additional features increase the models’ predictive capabilities beyond the baseline GRI score.
Conclusions
In this investigation, we have successfully developed and internally validated a machine-learning approach to predict high GRI scores at six to nine months from baseline within a cohort of individuals with T1D. This underscores the potential for identifying patients at risk of sustained poor glucose control over an extended prediction horizon. While managing patients with baseline high GRI is important, a six to nine months predictive model offers added clinical utility by identifying at-risk patients earlier, enabling more tailored and timely interventions. The model should not replace baseline management strategies but augment them with a longer-term view. A six to nine months predictive window could enable health care providers to identify patients who might currently have a GRI ≤60 but are at risk of exceeding this threshold in the future. This allows earlier intervention, such as lifestyle modifications, medication adjustments, or increased monitoring, potentially preventing a decline in health. In addition, it could support better planning and resource allocation, as high-risk patients can be flagged earlier for targeted care.
Within our study, the group predicted to be at high risk of high GRI scores at six to nine months continued to exhibit elevated GRI scores 21 months from baseline, despite a regression toward the predicted low-risk group.
The approach demonstrated clear differentiation between the two classes, with a relatively low Brier score suggesting the potential clinical utility of the model’s output probabilities for assessing long-term risk of poor glucose control in individual patients. In addition, a net benefit effect was observed when compared to simplistic strategies such as treating all patients as at-risk or none at all.
Baseline GRI score emerged as the most significant predictor for future poor glucose control, and the inclusion of additional features in the combined model improved its predictive capability substantially. To our knowledge, this is the first study to explore the identification of individuals at risk of poor glucose control quality using the GRI metric. However, our findings align with similar studies, such as those reported by Hilliard et al, 26 who employed latent group-based trajectory modeling to identify subgroups exhibiting sustained elevated HbA1c levels over 18 to 24 months in 150 T1D patients.
Despite the valuable insights provided by our study, it is important to acknowledge its limitations. The relatively small number of events within the analyzed cohort presents a challenge, affecting the robustness of our findings. Therefore, caution should be exercised when extrapolating these results to a broader population. Further validation is necessary, highlighting the importance of replicating our model’s performance across diverse data sets and populations of individuals with T1D. Also, a key limitation of this study is that all CGM data were derived exclusively from the FreeStyle Libre system. While this approach ensures consistency in glucose measurement and GRI calculation, it may limit the generalizability of our findings to individuals using other CGM devices, such as Dexcom or Medtronic systems. Future studies should explore whether our predictive modeling approach remains robust across different CGM technologies. Another limitation of our study is that we did not include demographic or socioeconomic factors as predictors in our model. While these variables can influence diabetes management and glycemic outcomes, our primary focus was on CGM-derived metrics, which provide direct, continuous physiological insights into glycemic variability. In addition, demographic and socioeconomic data were not comprehensively available in our data set, which may have introduced bias if included selectively. Future research should explore the integration of these factors to determine whether combining physiological and social determinants of health can enhance long-term risk prediction and improve personalized diabetes management strategies.
In conclusion, while our approach shows promise for predicting glycemic control quality, additional analysis and validation are imperative to determine its full clinical utility.
Footnotes
Abbreviations
CGM, continuous glucose monitoring; FGM, flash glucose monitoring; GRI, glycemia risk index; ROC-AUC, receiver operating characteristic curve; SHAP, SHapley Additive exPlanations; TAR, time above range; TBR, time below range; TIR, time in range.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Ethical Approval
Ethical approval and informed consent were obtained. The T1DiabetesGranada study was reviewed and approved by the Ethics Committee of Biomedical Research of the Province of Granada (CEIm/CEI GRANADA), protocol code K134665CRL, ethics portal code 0698-N-21.
