Abstract
Background:
Hypoglycemia is common in insulin-treated type 2 diabetes (T2D) patients, which can lead to decreased quality of life or premature death. Deep learning models offer promise of accurate predictions, but data scarcity poses a challenge. This study aims to develop a deep learning model utilizing transfer learning to predict hypoglycemia.
Methods:
Continuous glucose monitoring (CGM) data from 226 patients with type 1 diabetes (T1D) and 180 patients with T2D were utilized. Data were structured into one-hour samples and labeled as hypoglycemia or not depending on whether three consecutive CGM values were below 3.9 [mmol/L] (70 mg/dL) one hour after the sample. A convolutional neural network (CNN) was pre-trained with the T1D data set and subsequently fitted using a T2D data set, all while being optimized toward maximizing the area under the receiver operating characteristics curve (AUC) value, and it was externally validated on a separate T2D data set.
Results:
The developed model was externally validated with 334 711 one-hour CGM samples, of which 15 695 (4.69%) were labeled as hypoglycemic. The model achieved an AUC of 0.941 and a positive predictive value of 40.49% at a specificity of 95% and a sensitivity of 69.16%.
Conclusions:
The transfer learned CNN model showed promising performance in predicting hypoglycemic episodes and with slightly better results than a non-transfer learned CNN model.
Keywords
Introduction
Hypoglycemia describes a state of low blood glucose and is defined by a glucose level of <3.9 [mmol/L] (70 mg/dL). 1 This state is an acute complication of insulin treatment, observed in late-stage patients with type 2 diabetes (T2D). Recurrent hypoglycemic episodes increase the risk of severe hypoglycemia, which is associated with premature death.2-4 Furthermore, hypoglycemic episodes are associated with anxiety related to the treatment of diabetes, thereby leading to a diminished quality of life. In some cases, this anxiety may be linked to reduced or omitted insulin dosages to avoid hypoglycemia. This could increase the risk of developing late-stage complications. 5
Patients treated with insulin have shown to benefit from continuous glucose monitoring (CGM) devices for glucose management. 6 Among the reported benefits in type 1 diabetes (T1D) patients is reduced time spent in hypoglycemia.7,8 However, achieving the recommended lower limit of time spent in hypoglycemia as per guidelines still proves difficult for many CGM users. 8 Furthermore, there are only inconsistent reports of reduced rates of hypoglycemia for T2D patients, 9 although they have shown benefit in glucose control similar to the observations for T1D patients.9,10
Prediction models may as a technology help reduce both occurrence of and time spent in a hypoglycemic state. Warning patients of imminent hypoglycemia would provide an opportunity to take preventive actions and could thereby reduce the risk of hypoglycemia. Prediction models for this already exist; however, the effectiveness of these models varies considerably based on the methods employed and their ability to accurately predict hypoglycemia.11-17
Deep learning has emerged as a promising approach for developing accurate and effective predictive models in various fields. In a study by Li et al, 11 the potential of this type of prediction model is demonstrated for blood glucose prediction. Deep learning refers to a subtype of machine learning algorithm that involves training artificial neural networks to make predictions and classifications. However, a major challenge in deep learning is the requirement for large amounts of data to train models. 14 This challenge is especially pertinent when considering CGM data for T2D, where the scarcity of publicly available data can hinder the development of models. This limitation may reflect a disparity in the perceived significance of CGM devices within management strategies. While CGM devices are gaining prominence for managing efficacy and safety in treatment for individuals with T1D, their relevance is acknowledged only for a selected group of individuals with T2D. 1
A potential solution to address this challenge is to apply transfer learning. This solution is based on the observation that the prevalence of severe hypoglycemia in adults with T2D who have been using insulin for more than five years is very similar to that in adults with T1D. 18 Thus, one could assume that CGM features are transferable between these populations. By leveraging transfer learning, a model can be pre-trained using readily available T1D CGM data to detect general features of hypoglycemia and subsequently fitted using the more limited T2D CGM data. 19 This approach could help overcome the CGM data scarcity and enable the development of more accurate and effective predictive models for T2D.
To the best of the authors’ knowledge, no generalized CGM-based model utilizing deep learning and transfer learning exists for hypoglycemia prediction in T2D. Based on this approach, the objective of the present study was to develop a prediction model for hypoglycemia in T2D patients, thereby increasing safety for patients.
Method
Data Collection
CGM data from three data sets were utilized as part of this study: ExBG (T1D), DiaMonT (T2D), and NN3853 (T2D). ExBG and DiaMonT were used for training and internal validation of the model, while NN3853 was used for external validation. All the referenced studies featured well-controlled patient cohorts under continuous CGM monitoring. For ExBG, the only controlling factors were the monitoring devices themselves, while DiaMonT and NN3853 necessitated adherence to prescribed insulin therapy as an additional factor.
The ExBG data were collected by the REPLACE-BG Study Group (clinicaltrials.gov NCT number NCT02258373). The trial aimed to determine the effectiveness and safety of using CGM alone compared to CGM in conjunction with blood glucose measurements (BGM) in a six-month randomized trial, on 226 adults with T1D, using Dexcom G4 Platinum CGM (Dexcom INC, San Diego, California).
The DiaMonT data were collected by the study group ADAPT-T2D (clinicaltrials.gov NCT number NCT04981808). The trial is an ongoing open-label randomized control trial aimed at exploring the impact of a telemonitoring intervention, conducted over a period of three months. It is expected to include a total of 400 participants using Dexcom G6 CGMs (Dexcom INC, San Diego, California). The intervention group, 200 participants, will be equipped with CGMs throughout the entire period, while the control group, 200 participants, will only be equipped with CGMs for the first and last 20 days of the period. However, the trial was still ongoing at the time this work was carried out, and 123 participants from the DiaMonT data set were utilized in this study.
The NN3853 data were collected by Novo Nordisk A/S (clinicaltrials.gov NCT number NCT01819129) The trial compared fast-acting insulin faster aspart (FIAsp) to NovoRapid®/NovoLog® (insulin aspart) during mealtime in combination with insulin glargine and metformin over a 26-week period. During this period, participants were required to adhere to the prescribed dose and frequency. Out of these participants, 67 used a Dexcom G4 Platinum CGM for 10 to 14 days both before and after the comparison period.
Preprocessing
All three CGM data sets were organized into one-hour samples. Each sample was labeled as either hypoglycemic or not, depending on whether three consecutive CGM values were below 3.9 [mmol/L] (70 mg/dL) one hour after the sample. 20 In effect, the classification problem uses one hour of CGM data to predict hypoglycemia, with a lead time of exactly one hour. Samples were excluded from the data if the sample contained missing entries or if one of the entries used for labeling was missing. The ExBG data set was split into training and test set in a ratio of 70/30. The same ratio for the split was used for the DiaMonT data set when the convolutional neural network (CNN) was fitted to T2D. When splitting, it was ensured that no patient had data in both training and test set. The NN3853 data set was used for external validation. To account for class imbalance during training, class weights were determined for both the ExBG and DiaMonT data sets.
CNN Architecture
A CNN was implemented to predict hypoglycemia using the Keras library version 2.11.0 in Python version 3.10. The model was built based on the simple CNN architecture described in the study by O’Shea and Nash. 21 The model architecture can be seen in Figure 1.

The process of deriving the weights and biases for the developed CNN transfer learned model (CNN-TL). First, the base and task-specific layers were trained using the ExBG data set. The base layer was then fixed, and further training was performed on the task-specific layer using the DiaMonT data set, resulting in an internal validation result. Finally, the model was externally validated using the NN3853 data set. More details about the base and task-specific layers can be found on the right side of the figure.
The base layer of the CNN utilized a 1-dimensional convolutional layer as the input layer of the model with a rectified linear unit (ReLU) activation function applied after the convolutional operation. Batch normalization and max pooling were utilized to improve stability of the model and downsample the feature maps. A flatten layer was added to convert the output of the previous layers into a one-dimensional vector, which was fed to the task-specific layer. The task-specific layer consisted of two dense layers, which were separated by a ReLU activation function to improve performance, as suggested by O’Shea and Nash. 21 The output layer used a Sigmoid activation function. For the model, 32 filters with a kernel size of 3 were used for the 1D-convolutional layer, and the first dense layer in the task-specific layer had 32 dense units. These values were chosen based on a hyperparameter search prioritizing simplicity due to no significant difference in the tested combinations.
Initial Weights and Biases
The initial weights and biases were instantiated and optimized using the training and test split of the ExBG data set, as seen in Figure 1. Binary cross-entropy was implemented to monitor the loss and optimized using the default adaptive moment estimation (Adam) optimizer from the Keras library. The training was set to a maximum of 100 epochs, with early stopping for a patience of five epochs implemented. This was done to reduce redundant training and thereby reduce computational time. When the training ended, the weights for the epoch with the lowest internal validation loss were restored and saved.
Fitting of Task-Specific Layers
The initial weights and biases for the base layers were made non-trainable, and the task-specific layers were fitted using the DiaMonT data set. For the fitting of the task-specific layers, the same loss function and optimizer as those used in the initial training were used. However, a lower learning rate of 1·10-5 was chosen. A lower learning rate helps gradually fit the task-specific layers to the DiaMonT data and prevents overfitting and a degradation of performance.
Model Evaluation
To evaluate the performance of the developed transfer learning prediction model (CNN-TL), sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), confusion matrix, and area under the receiver operating characteristics curve (AUC) were reported for both internal and external validation. A prediction threshold closest to a specificity of 95% was chosen. Furthermore, the distribution of false positives, presented by the lowest CGM value of three entries used for target classification, was visualized to evaluate the potential risk of harm.
To assess the effectiveness of a transfer learning approach, a T2D-specific CNN model (CNN-T2D) was developed and trained on DiaMonT data, as opposed to first learning from T1D data. The model was trained identically to the CNN-TL model with the same performance metrics reported for both internal and external validation.
Results
After preprocessing, the hypoglycemic samples accounted for 3.45% of approximately 12.5 million samples from 226 patients in the ExBG data set. In the DiaMonT data set, the hypoglycemic samples were 0.33% of approximately two million periods from 123 patients, and for the validation data set NN3853, they were 4.69% of approximately 334 000 periods from 67 patients. The baseline characteristics for the patients in the data sets can be seen in Table 1.
Baseline Characteristics for Patients in the Three Data sets: ExBG, DiaMonT, and NN3853.
Categorical variables are given by percentage (number) for each category, and continuous parameters are given by mean ± (standard deviation). Parameters not applicable to the data set are marked by “-.”
Abbreviations: T1D, type 1 diabetes; T2D, type 2 diabetes.
The CNN-TL was the best performing model, achieving an AUC of 0.941 and a sensitivity of 69.16% at a fixed specificity of 95% with a PPV of 40.49%, as shown in Table 2. The receiver operating characteristics curves and the confusion matrices for both models are presented in Figure 2 and Table 3.
Model Results Presented by Accuracy, Sensitivity, Specificity, Negative Predictive Value (NPV), Positive Predictive Value (PPV), and ROC-AUC for the Developed Models.
Abbreviations: CNN-T2D, type 2 diabetes-specific convolutional neural network; CNN-TL, CNN transfer learned model; ROC-AUC, area under the receiver operating characteristic curve.

Receiver operating characteristics curve for each of the external validations made on each of the developed models.
Confusion Matrices for CNN-TL and CNN-T2D.
Abbreviations: CNN-T2D, type 2 diabetes-specific convolutional neural network; CNN-TL, CNN transfer learned model.
The CNN-TL model made a total of 15 951 (4.77%) false-positive predictions, which were predominantly distributed between 3 and 5 [mmol/L] (54-90 [mg/dL]) and with a median value of 4.43 [mmol/L] (79.74 [mg/dL]), as shown in Figure 3.

Histogram of false positives predicted in the external validation by the convolutional neural network transfer learned model (CNN-TL) model.
Discussion
This study has presented how deep learning and transfer learning can be utilized to develop a prediction model of hypoglycemia in patients with T2D with promising results. To the best of the authors’ knowledge, no studies have investigated the use of a generalized model utilizing a deep learning framework to make binary predictions of hypoglycemic episodes in T2D. In addition, no studies have been found that describes a comparable data size to develop a model for hypoglycemia predictions in patients with diabetes.
One study by Deng et al 14 investigated whether transfer learning could aid in developing a patient-specific CNN model for classifying hypoglycemia in 40 adult patients with T2D. The model was initially trained with the data of 39 patients, fitted to each patient using 1000 samples from that specific patient, and internally validated using the remaining data from that patient. They report a sensitivity and specificity of 59.19% and 98.15%, respectively, when validating their model using a secondary T1D data set with a 30-minute prediction horizon. Using CNNs on small data sets increases the risk of overfitting; however, it is difficult to determine the likelihood of overfitting as the study lacks information regarding the total sample size and specific performance metrics for internal validation. The study describes a sensitivity range of 80% to 96% for internal validation, which is significantly higher than the reported performance. This suggests that the model was overfitted. Furthermore, the chosen threshold definition for hypoglycemia does not adhere to consensus, as it is set at 4.4 mmol/L (80 mg/dL).
The study by Fleischer et al 15 presents an ensemble learning approach to predicting hypoglycemia events in T1D patients using the publicly available ExBG data set, also used in this study. The authors externally validated their machine learning model on a data set of real and synthetic patients and achieved an AUC of 0.988 for both data sets. Furthermore, the authors performed an event-based validation, yielding a sensitivity of 90%, a lead time of 17.5 minutes, and a false-positive rate of 38%.
Compared to the findings in the study by Fleischer et al, 15 this study resulted in a lower AUC when it comes to sample-based validation. However, it is important to note that using AUC as the sole evaluation metric may not fully capture the complexity of the prediction challenges. While the study presented by Fleischer et al 15 provides meaningful event-based prediction metrics, the methodological disparities between their research and ours render direct comparisons inconclusive.
A recent study by Dave et al 16 developed feature-based machine learning models using contextual patient information and CGM data to predict hypoglycemic events. The model was built using data collected from 110 patients over a range of 30 to 90 days under normal living conditions. For the patient-based validation of the best model, the authors report a sensitivity of 97.61% and a specificity of 98.09%, with a false alert rate of 26.36% for a 60-minute prediction horizon. In comparison to their results, this study achieved slightly lower sensitivity and specificity. However, our model attained a false alert rate of only 5%, resulting in a 21.36 percentage-point improvement over the model developed by Dave et al. 16 Furthermore, it is important to highlight that, unlike their study, we conducted external validation, adding another layer of credibility to our findings.
In another study by Dave et al, 17 a Random forest prediction model for classification of hypoglycemia in T1D was investigated. The model was developed using the CGM data of 112 T1D patients collected for 90 consecutive days using 70% of data for training and 30% for internal validation. The study reports a sensitivity of 86.28% at a specificity of 93.07% using four hours of CGM data and a prediction horizon of 60 minutes. The results obtained by Dave et al. 17 seemingly confirm that the performance achieved within the present study is acceptable, while also indicating there is no observable advantage in performance to using transfer learning.
Results indicate that CNN-TL and CNN-T2D are comparable, even when externally validated using the NN3853 data set. This observation suggests that the CNN models have identified similar features even through the use of transfer learning, resulting in both models demonstrating generalizability and robust performance beyond internal validation scenarios. Although the results are similar, the increased performance of the CNN-TL model, especially in sensitivity and PPV, should not be underestimated in a highly imbalanced data set.
The present study was strengthened by the use of multiple high-quality and high-quantity data sets, which enables the enhancement of the CNN models learning of features that correctly identify hypoglycemic episodes. In addition, the use of an external validation set confirms the results demonstrated in the internal validation are generalized across all data sets and further supports the inference that the model would perform satisfactorily outside training and validation settings.
Contrary to the stated strengths, the utilization of high-quality data sets can introduce homogeneous parameters, such as the duration of hypoglycemic episodes, and diagnosis duration between ExBG and DiaMonT, as well as age, body mass index, and baseline HbA1c between DiaMonT and NN3853. These parameters may restrict the model’s applicability in real-world settings where well-controlled diabetes or strict management strategies cannot be expected, as is the case for the three data sets. Therefore, caution must be taken when applying the presented model to a different population or in a different setting.
In addition, it is important to note that if the variance for the population is not adequately represented in the distribution, it could further contribute to restricted applicability of the model. Consequently, such limitations may result in improper predictions, due to diverse behaviors and responses concerning blood glucose management and hypoglycemic episodes. Another factor which may reduce the applicability of the model outside the present study is the lack of interpolation methods employed to replace missing CGM measurements. In the present study, samples containing missing CGM measurements were removed without utilizing interpolation techniques. This limitation in handling missing data could further hinder the model’s effectiveness.
To account for any uncertainties as to the performance of the model, future work should include the further validation of the model using more heterogeneous data sets. Moreover, since the present study is limited to retrospective analysis of data, the model also must be tested and validated as part of a clinical trial. In addition, the model could potentially benefit from incorporating additional inputs, such as activity levels, carbohydrate intake, and insulin dosages. However, careful consideration should be given to whether the potential improvement in performance outweighs the inconvenience of relying on multiple devices. Lastly, including exact times, and blood glucose values, in the prediction would enable the patients to consider their risk of a hypoglycemic episode and react accordingly, either by consuming carbohydrates or remaining cautious in the subsequent hour.
Conclusion
The present study has successfully demonstrated the feasibility of developing a CNN model by employing a transfer learning approach and utilizing CGM data for the prediction of hypoglycemic episodes one hour in advance. The model obtained an acceptable performance and with slightly better than a CNN model without transfer learning. Although further validation is necessary, this model exhibits promising potential to improve the safety of patients with insulin-treated T2D.
Footnotes
Acknowledgements
The authors extend their gratitude to REPLACE-BG, ADAPT-T2D, and Novo Nordisk A/S for generously providing access to their data, which played an indispensable role in enabling this study. It is important to emphasize that the content and conclusions drawn within this study are the sole responsibility of the authors.
Abbreviations
T2D, type 2 diabetes; T1D, type 1 diabetes; CNN, convolutional neural network; CGM, continuous glucose monitoring; BGM, blood glucose measurement; FIAsp, fast-acting insulin faster aspart; ReLU, rectified linear unit; Adam, adaptive moment estimation; PPV, positive predictive value; NPV, negative predictive value; AUC, area under the receiver operating characteristics curve; Insulin aspart, NovoRapid®/NovoLog®
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: M.H.J. is employed at Novo Nordisk A/S
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
