Sage Journals: Discover world-class research

Abstract

Background:

In this work, we leverage state-of-the-art deep learning–based algorithms for blood glucose (BG) forecasting in people with type 1 diabetes.

Methods:

We propose stacks of convolutional neural network and long short-term memory units to predict BG level for 30-, 60-, and 90-minute prediction horizon (PH), given historical glucose measurements, meal information, and insulin intakes. The evaluation was performed on two data sets, Replace-BG and DIAdvisor, representative of free-living conditions and in-hospital setting, respectively.

Results:

For 90-minute PH, our model obtained mean absolute error of 17.30 ± 2.07 and 18.23 ± 2.97 mg/dL, root mean square error of 23.45 ± 3.18 and 25.12 ± 4.65 mg/dL, coefficient of determination of 84.13 ± 4.22% and 82.34 ± 4.54%, and in terms of the continuous glucose-error grid analysis 94.71 ± 3.89% and 91.71 ± 4.32% accurate predictions, 1.81 ± 1.06% and 2.51 ± 0.86% benign errors, and 3.47 ± 1.12% and 5.78 ± 1.72% erroneous predictions, for Replace-BG and DIAdvisor data sets, respectively.

Conclusion:

Our investigation demonstrated that our method achieved superior glucose forecasting compared with existing approaches in the literature, and thanks to its generalizability showed potential for real-life applications.

Keywords

glucose forecasting decision support systems continuous glucose monitoring (CGM)artificial deep neural networks convolutional neural network (CNN)long short-term memory (LSTM)

Introduction

Over the past 40 years, major efforts have been undertaken in the field of diabetes technology to develop decision support systems and automated insulin delivery systems for blood glucose (BG) management in individuals with type 1 diabetes (T1D). These systems are generally built around a model of glucose-insulin metabolism that is used to forecast future BG levels utilized to compute therapy interventions.

In such contributions, past glucose values, typically recorded by a continuous glucose monitoring (CGM) device, along with various physiological data including carbohydrate (CHO) intakes, fast-acting and slow-acting insulin doses,¹ or recorded physical activity (PA) and daily routines² are considered to predict future BG levels for different prediction horizons (PHs).

Numerous approaches have been taken in the diabetes technology literature to propose glucose predictive models, the majority being based on polynomial and state-space models.^3-10 Thanks to the increased availability of real-life data collected for long-period trials, machine learning (ML) techniques have become increasingly popular and have been successfully employed to solve the BG prediction problem. In particular, early prior studies exploited random forest, multivariate adaptive regression splines,¹¹ k-nearest neighbor,¹² decision tree,¹³ gradient-boosted regression tree,¹⁴ and support vector regression (SVR).^15,16 However, the primary shortcoming of these classic ML methods is that they all require human feature extraction, and their success is highly dependent on the quality of feature engineering and statistical analysis of the data used to train them.¹⁷

More recently, the diabetes technology community has witnessed a booming interest in the application of Artificial Neural Networks to the purpose of BG prediction, mainly due to their ability to perform automatic feature extraction, and hence eliminating the need of feature engineering. For example, Zhu et al¹⁸ proposed a convolutional neural network (CNN)-based model and Li et al¹⁹ introduced GluNet, a deep neural network (DNN) framework, leveraging dilated convolutional layers, to improve the CNN performance and demonstrated that GluNet outperformed rival models, including autoregressive with exogenous inputs (ARX), SVR, and neural network for predicting glucose on in silico patients. Mirshekarian et al²⁰ proposed a recurrent neural network (RNN) architecture with long short-term memory (LSTM) units trained and evaluated on a data set of five patients containing approximately 400 days’ worth of data, for predicting BG up to 60-minute PH. They claimed that an LSTM trained on raw data from real patients may outperform SVR and polynomial models that were trained using manually derived features from the same data set. Li et al²¹ introduced a convolutional recurrent neural network (CRNN) to estimate the BG level for up to 60-minute PHs based on prior CGM data and information on meal and insulin intakes, presenting results for both simulated and real patients.

Some of these studies have used simulated data for the evaluation of their proposed methodologies, which although are important for evaluating the feasibility of the models, they lack some challenges that are in real-life data, such as missing points in data, extreme or unusual environmental conditions, and medication interferences outliers. Moreover, in cases where data from real T1D patients were used, the sample size was small, which, albeit good for initial evaluation, precludes application to a large number of T1D patients due to the high rate of intersubject variability in glucose trends.

That said, our objective in this research is to build on our past work²² to develop a new forecasting model for BG, built using a CNN-LSTM stacked architecture, overcoming the limitation of manual feature engineering, and evaluate its effectiveness on real patients’ data collected both in-hospital and in the outpatient setting for predicting future BG after 30, 60, and 90 minutes of PH.

The remainder of the article is structured in the following manner. The "Experimental Condition" section describes the details and preprocessing of the data sets. The "Methods" section goes into the details about our proposed architecture and prediction approach. Then, the obtained results are presented in the "Experimental Results" section and discussed in detail in the "Discussion" section . Finally, the "Conclusion and Future Work" section summarizes this study.

Experimental Conditions

Two distinct data sets of patients with T1D were used in this study, each of which contained information about meals, insulin boluses (slow and fast acting), and CGM values:

Replace-BG data set:²³ The data were collected from 168 adults (93 males/75 females), aged 47.2 ± 13.1 years, with a history of T1D ranging from 1 to 40 years, who were using an insulin pump, had HbA1c <8.5%, participating in a 26-week clinical trial in free-living conditions. We used three features from this data set: interstitial glucose (mg/dL) measured by Dexcom G4 Platinum, insulin boluses (U), and CHO content (mg).

DIAdvisor data set: In the research project DIAdvisor,²⁴ data were collected from 59 T1D patients (37 males/22 females), aged 43.4 ± 11.7 years, disease duration 18.8 ± 10.7 years, BMI 23.9 ± 2.4 kg/m², HbA1C 7.8 ± 1.6%, 27 multiple daily injections, and 32 continuous subcutaneous insulin infusion participating in a three-day in-hospital study. We considered seven features from this data set: interstitial glucose (mg/dL) measured by Abbott Freestyle; self-reported insulin intakes (U) for basal, bolus, and correction doses; and meal nutrients content (g) for CHO, protein, and lipids.

Data Preprocessing

Missing CGM data points in the Replace-BG data set, for gaps less than 60 minutes, were estimated using linear interpolation.²⁵ No interpolation was attempted in circumstances when the interval between two consecutive points was greater than 60 minutes, as this could cause the model to learn the estimated sequence rather than the genuine values, and when this was the case the entire corresponding day of the missing data was discarded.²⁶ Following that, the CGM variable was uniformly resampled every 15 minutes in both data sets, and insulin and meal characteristics were averaged within each 15-minute time interval. Next, we normalized each time series with regard to its own lowest and maximum values, resulting in the entire data set being in the range (0,1), to improve the prediction accuracy.²⁷ In the output of the model, an inverse transform was used to map the features back to their actual values.

Train-Test Splitting: Forward Chaining (FC)

In timeseries prediction problems, conventional cross-validation, that is typical random train/test splitting strategies, cannot be used because there is a temporal dependency between the data samples. Hence, to account for this temporal dependency as well as to avoid data leakage from training to test sets, we employ FC, a technique to split the data on a rolling basis, well-suited for sequential and timeseries data sets. For this purpose, the model is trained on an 80% subset of patients and forecast for the remaining 20% which are test patients. Through a rolling basis process, the same forecasted patients are then included as part of the next training data set and model is tested on a new 20% subset of patients. It should be noted that, in each splitting scenario, the model is trained/tested independently on each subset, and the final result is computed by averaging over all split subsets to obtain a robust estimate of the model’s performance.²⁸ It should be emphasized that for each train-test partition created by FC, preprocessing steps are performed only to the train set; the test set is excluded from the preprocessing steps because it is presumed to be unseen data.

To train the model, the train-set is divided into 80/20 training and validation subsets, using FC algorithm, which allows for an unbiased evaluation of a model’s fit to the training data set during the training process and helps ensure consistent performance.

Methods

Figure 1 depicts the proposed CNN-LSTM model. Windowed Samples of past data with length three times the PH ( $3 \times PH)$ , which was the optimal length of input for all PHs, are input to a stack of 1D convolutional and pooling layers, followed by an LSTM block containing two layers of LSTM, each containing 100 LSTM units, followed by two fully connected layers with [32, 2], [32, 4], and [64, 6] neurons for 30-, 60-, and 90-minute PH, respectively. We boost the chances of obtaining underlying features of the input samples depending on their significance by using two groups of two back-to-back convolutional layers. Extracted features are then flattened into a single long vector and sent to a group of LSTM layers, which can assess and forecast sequences using a recurrent approach due to their short-term memory. Finally, the LSTM section’s output is transmitted via a pair of fully connected layers that operate as a buffer between the learnt features and the intended output, interpreting them to produce an array of predicted future glucose value for up to PH. The model parameters were determined by minimizing the mean absolute error (MAE) loss function in the output, derived as follows:

MAE = \frac{\sum_{i = 1}^{N} | Y_{i} (t) - {\hat{Y}}_{i} (t) |}{N}

(1)

where N is the total number of windowed samples and are the actual and predicted glucose concentration (GC) values at time.

Figure 1.

The proposed CNN-LSTM architecture for 90-minute PH BG prediction.

To avoid overfitting the model to the training data set and obtain a more generalizable model, dropout layers are used between each LSTM layer. Also, after each convolutional layer, a batch normalization operation is performed to re-center and rescale the layer’s input to reduce the internal covariance shift.

Training Hyperparameters

We use the TensorFlow 2.0 framework²⁹ to implement the proposed CNN-LSTM model. Manual hyperparameter tuning with expert knowledge was performed such that more than 20 different sets of hyperparameters, including number of neurons, activation functions, optimizers, learning rates, and batch sizes, were tested to have more control over the process and see how different hyperparameter scenarios may affect the model performance. Due to the length differences between the two data sets, batch sizes of 32 and 256 are used to optimize the parameters using the root mean square propagation method with an initial learning rate of 0.0001 and a moving average value of 0.9. Training is carried out for 500 epochs with an early stop point by monitoring changes in validation loss throughout a 50-epoch period.

To evaluate the accuracy of the prediction, three metrics were employed: MAE, the root mean square error (RMSE), and the coefficient of determination (R2), given in Equations (1-3):

RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {(Y_{i} (t) - {\hat{Y}}_{i} (t))}^{2}}{N}}

(2)

R 2 = 1 - \frac{\sum_{i = 1}^{N} {(Y_{i} (t) - {\hat{Y}}_{i} (t))}^{2}}{\sum_{i = 1}^{N} {(Y_{i} (t) - {\bar{Y}}_{i} (t))}^{2}},

(3)

where N is the total number of windowed samples and denotes the actual and predicted GC values at time, respectively, and in R2 is the average of all samples.

The choice of the abovementioned metrics enables us to compare our suggested predictor to previously published methods. In addition, to assess the suggested algorithm’s clinical acceptability, we used the continuous glucose-error grid analysis (CG-EGA).^30,31

Comparison With Other Methods

The predictive performance of our proposed CNN-LSTM model was compared against the ARX in,⁶ the SVR model proposed in,³² the LSTM model and the CRNN model proposed in.²¹ Autoregressive with exogenous inputs is a good reference model since it has been used in many studies in the diabetes literature. Moreover, the choice of SVR models was motivated by the fact that among ML methods SVR has demonstrated promising clinical acceptability,³³ whereas LSTM, designed with the same architecture as the LSTM section of the proposed CNN-LSTM model, allows us to demonstrate the advantage of the added convolutional layers for prediction. In addition, since CRNN, which is comprised of three convolutional and one LSTM layer, and its modifications has shown promising results in various studies,^21,34 we implemented it based on the code repository provided in.³⁴ For the SVR model, we use the radial-basis function as the activation function, with a value of 0.0002 and C = 0.5, while for the LSTM we use a two-layered LSTM architecture with 100 memory units per layer, followed by two fully connected layers with [32, 2], [32, 4], and [64, 6] number of neurons for 30-, 60-, and 90-minute PH, respectively. Exponential linear unit and tangent hyperbolic activation functions were used for LSTM and fully connected layers, respectively, the same as the LSTM block of the proposed CNN-LSTM model.

Experimental Results

Population-Wise Analysis

One of the major challenges for the glucose predictive models is the intersubject variability of the effect of insulin intake, meals, and other events on glucose dynamics. In this study, given the two data sets with large sample size, we can investigate the robustness of our model in dealing with this challenge. Table 1 shows population mean and standard deviation (STD) values of MAE (mg/dL), RMSE (mg/dL), and R2 (%) metrics when predicting future BG level for all T1D patients in both data sets. The highlighted values show the best performing algorithm for each PH. We observe that the performance of the proposed CNN-LSTM model is comparable with that of the LSTM for 30-minute PH for both data sets; however, for longer PHs, our method consistently outperforms all the other models by significantly reducing MAE and RMSE and increasing R2 values, in both data sets.

Table 1.

Population Mean ± (Standard Deviation) of Accuracy Metrics (MAE, RMSE, and R2) of Forecasting the BG Level for 30-, 60-, and 90-Minute PH for the Replace-BG and DIAdvisor Data sets.

PH, minutes	Model	MAE, mg/dL	RMSE, mg/dL	R2, %
Replace-BG data set
30	ARX	7.71 ± (1.19)	16.04 ± (1.09)	94.98 ± (1.76)
	SVR	7.96 ± (1.32)	17.75 ± (1.78)	94.81 ± (2.04)
	LSTM	6.60 ± (0.76)	9.28 ± (1.31)	97.32 ± (1.03)
	CRNN	7.04 ± (1.22)	10.36 ± (1.31)	96.86 ± (1.55)
	CNN-LSTM	6.92 ± (1.08)	9.73 ± (1.34)	97.16 ± (1.64)
60	ARX	19.23 ± (2.87)	27.09 ± (3.44)	70.56 ± (4.01)
	SVR	18.71 ± (2.23)	25.41 ± (4.21)	72.28 ± (4.23)
	LSTM	13.71 ± (1.82)	19.29 ± (2.48)	88.68 ± (3.43)
	CRNN	12.21 ± (1.41)	17.88 ± (2.21)	90.12 ± (3.02)
	CNN-LSTM	11.74 ± (1.76)	16.51 ± (2.43)	91.97 ± (3.19)
90	ARX	25.16 ± (3.42)	31.87 ± (4.22)	60.11 ± (4.98)
	SVR	22.47 ± (3.97)	28.99 ± (4.89)	63.92 ± (5.21)
	LSTM	19.42 ± (2.51)	25.83 ± (3.28)	80.95 ± (4.28)
	CRNN	18.89 ± (2.67)	24.91 ± (3.16)	82.22 ± (4.07)
	CNN-LSTM	17.30 ± (2.47)	23.45 ± (3.18)	84.13 ± (4.22)
DIAdvisor data set
30	ARX	7.04 ± (0.98)	16.89 ± (1.11)	93.44 ± (1.32)
	SVR	7.42 ± (1.12)	18.12 ± (1.28)	92.12 ± (1.88)
	LSTM	5.63 ± (0.96)	9.88 ± (1.41)	97.92 ± (2.14)
	CRNN	6.21 ± (1.15)	10.04 ± (1.50)	96.67 ± (2.11)
	CNN-LSTM	5.65 ± (1.13)	9.81 ± (1.21)	97.03 ± (1.24)
60	ARX	20.05 ± (2.01)	29.19 ± (3.51)	69.65 ± (3.45)
	SVR	19.32 ± (2.01)	27.41 ± (3.51)	70.04 ± (3.13)
	LSTM	14.25 ± (2.06)	22.54 ± (3.11)	84.62 ± (3.87)
	CRNN	13.21 ± (2.01)	20.17 ± (2.89)	90.43 ± (1.98)
	CNN-LSTM	12.06 ± (1.87)	18.32 ± (2.76)	91.23 ± (3.32)
90	ARX	28.76 ± (2.93)	34.54 ± (4.13)	58.96 ± (4.76)
	SVR	23.03 ± (3.27)	29.54 ± (4.63)	62.82 ± (5.21)
	LSTM	22.54 ± (2.24)	28.12 ± (4.11)	77.54 ± (5.05)
	CRNN	19.94 ± (2.12)	26.36 ± (1.31)	79.23 ± (4.11)
	CNN-LSTM	18.23 ± (2.97)	25.12 ± (4.65)	82.34 ± (4.54)

The bold values show the best performing algorithm for each PH.

Abbreviations: MAE, mean absolute error; RMSE, root mean square error; R2, coefficient of determination; PH, prediction horizon; BG, blood glucose; ARX, autoregressive with exogenous inputs; SVR, support vector regression; LSTM, long short-term memory; CRNN, convolutional recurrent neural network; CNN, convolutional neural network.

Table 2 summarizes the results in terms of the CG-EGA. The CNN-LSTM model that we propose outperforms the reference models across all PHs, with a higher percentage of accurate predictions (notice for instance 94.71 ± 3.89% for Replace-BG data set and 91.71 ± 4.32% for DIAdvisor data set, on the 90-minute PH) and less erroneous predictions (3.47 ± 1.12% for Replace-BG and 5.78 ± 1.72% for DIAdvisor data set, on the 90-minute PH) making it acceptable for clinical applications. Figure 2 illustrates an example of the CG-EGA plot of a representative patient from Replace-BG data set. It is observable that a significant proportion of predicted samples fall within the range A of both P-EGA and R-EGA, when compared with the associated true glucose values.

Table 2.

Predictions for 30-Minute PH by the ARX, SVR, LSTM, CRNN, and CNN-LSTM Models for a Representative Patient From the Replace-BG Data set, for a Three-Day Duration.

PH, minutes	Model	AP, %	BE, %	EP, %
Replace-BG data set
30	ARX	87.22 ± (3.95)	10.68 ± (2.43)	2.10 ± (1.45)
	SVR	86.35 ± (3.95)	10.58 ± (2.43)	3.07 ± (1.33)
	LSTM	85.34 ± (3.52)	12.52 ± (2.91)	2.14 ± (1.10)
	CRNN	94.11 ± (2.75)	4.21 ± (1.63)	1.68 ± (1.07)
	CNN-LSTM	98.73 ± (2.52)	0.80 ± (0.51)	0.47 ± (0.23)
60	ARX	83.89 ± (2.95)	12.65 ± (2.93)	3.46 ± (1.03)
	SVR	83.17 ± (3.45)	13.58 ± (2.57)	3.38 ± (1.12)
	LSTM	83.05 ± (3.87)	13.83 ± (3.01)	3.12 ± (1.08)
	CRNN	91.31 ± (3.44)	5.98 ± (1.51)	2.71 ± (1.06)
	CNN-LSTM	96.53 ± (3.24)	1.13 ± (0.76)	2.34 ± (1.13)
90	ARX	74.87 ± (4.54)	9.87 ± (3.06)	11.26 ± (3.11)
	SVR	79.87 ± (4.43)	13.32 ± (3.44)	6.81 ± (2.43)
	LSTM	79.80 ± (4.12)	13.05 ± (2.31)	7.15 ± (2.89)
	CRNN	88.05 ± (3.92)	7.51 ± (1.81)	4.44 ± (2.12)
	CNN-LSTM	94.71 ± (3.89)	1.81 ± (1.06)	3.47 ± (1.12)
DIAdvisor data set
30	ARX	85.56 ± (3.48)	10.98 ± (2.98)	3.46 ± (1.10)
	SVR	84.22 ± (4.03)	12.29 ± (2.78)	3.49 ± (1.33)
	LSTM	83.97 ± (3.67)	12.89 ± (2.91)	3.14 ± (1.43)
	CRNN	92.45 ± (2.82)	5.11 ± (1.87)	2.44 ± (1.14)
	CNN-LSTM	97.41 ± (3.12)	2.10 ± (1.12)	0.48 ± (0.23)
60	ARX	80.29 ± (3.11)	15.61 ± (3.34)	4.1 ± (2.14)
	SVR	81.56 ± (3.89)	14.46 ± (2.88)	3.98 ± (2.02)
	LSTM	80.67 ± (3.92)	15.74 ± (3.08)	3.59 ± (1.45)
	CRNN	90.76 ± (3.65)	6.16 ± (1.87)	3.08 ± (1.22)
	CNN-LSTM	95.93 ± (2.84)	2.13 ± (1.16)	1.94 ± (0.93)
90	ARX	75.56 ± (4.67)	14.19 ± (3.15)	10.25 ± (3.04)
	SVR	77.87 ± (4.63)	14.71 ± (3.84)	7.42 ± (3.13)
	LSTM	77.67 ± (4.45)	14.87 ± (0.86)	7.46 ± (1.72)
	CRNN	84.91 ± (4.21)	8.36 ± (2.11)	6.73 ± (2.08)
	CNN-LSTM	91.71 ± (4.32)	2.51 ± (0.86)	5.78 ± (1.72)

The bold values show the best performing algorithm for each PH.

Abbreviations: PH, prediction horizon; ARX, autoregressive with exogenous inputs; SVR, support vector regression; LSTM, long short-term memory; CRNN, convolutional recurrent neural network; CNN, convolutional neural network; BG, blood glucose; AP, accurate predictions; BE, benign errors; EP, erroneous prediction.

Figure 2.

A representative example of CG-EGA with P-EGA (left) and R-EGA (right) components.

Our results demonstrate that our suggested strategy outperforms the polynomial, ARX, and conventional ML-based techniques, SVR as well as the LSTM model, acknowledging the advantage of adding up the CNN to the LSTM network. In addition, due to the optimized model structure and hyperparameter selections, our model performs better in terms of both the prediction accuracy and the clinical acceptability comparing with CRNN, which comprised CNN and RNN.

Figure 3 depicts a three-day period of recorded data for a representative participant from the Replace-BG data set, compared against the 60-minute-ahead predictions obtained with the considered models. It is observable that CNN-LSTM outperforms all the other methods in capturing the trends and fluctuations with greater accuracy, which is congruent with the corresponding MAE, RMSE, and R2 values. We would like to remind readers that inference results are generated by testing each model on data from previously unseen patients.

Figure 3.

Actual CGM value compared with the predictions for 60-minute PH by the ARX, SVR, LSTM, CRNN, and CNN-LSTM models for a representative patient from the Replace-BG data set, for a three-day duration.

The boxplots in Figure 4 depict results obtained from our proposed model for each metric across all patients in the Replace-BG (top) and DIAdvisor (bottom) data sets, respectively. Although degrading for longer PHs, predictions remain within a somewhat reasonable range, demonstrating the robustness of the proposed model against intersubject variability of glucose dynamics in both data sets.

Figure 4.

Performance evaluation of the proposed model on the Replace-BG (top panel) and DIAdvisor (bottom panel) data sets for three different PHs over test patients. Left: MAE, middle: RMSE, and right: R2.

To assess the performance of the proposed model for different PHs, Figure 5 demonstrates examples of CGM prediction using CNN-LSTM with different PHs for six representative patients. In each case, the solid red line represents the historical window, while the blue, orange, and green lines represent CNN-LSTM-estimated forecast time series for 30-, 60-, and 90-minute PH, respectively. While the model for 30-minute PH clearly outperforms the others, the models for longer PHs continue to perform well by forecasting the rapid increases and decreases in ground truth (dashed red line), which may result in hyperglycemia or hypoglycemia, respectively, which is in our best interest.

Figure 5.

Representative examples of the CGM prediction by the proposed CNN-LSTM for different PHs, for six patients from Replace-BG data set.

For completeness, Supplemental Table 1 in Online Appendix provides the range of RMSE (mg/dL) obtained by applying other methodologies proposed in the literature to different data sets collected from real subjects.

Patient-Wise Analysis

To account for the intra-subject variability of BG dynamics, we train and test the proposed method with each patient’s data set separately, that is the model is trained, evaluated, and tested on each patient’s data using an 80/20 split between the train and the test data sets. Table 3 illustrates the results in terms of mean and STD values of MAE (mg/dL), RMSE (mg/dL), and R2 (%) metrics while predicting the future BG level of each patient given the training set data for the same patient. Figure 6 exhibits the distribution of metrics acquired for each patient in the Replace-BG and DIAdvisor data sets. It is noticeable that, although the model performance deteriorates comparing with the population-wise metrics, it still produces encouraging results, particularly for short-term PH. However, as the PH increases, the distribution of the results increases as well, which makes sense given the small data set utilized for patient-wise training.

Table 3.

Patient-Wise Mean ± (Standard Deviation) of Accuracy Metrics (MAE, RMSE, and R2) of Forecasting the BG Level for 30-, 60-, and 90-Minute PH for the Replace-BG and DIAdvisor Data sets.

PH, minutes	Model	MAE, mg/dL	RMSE, mg/dL	R2, %
Replace-BG data set
306090	CNN-LSTM	9.59 ± (2.44)	14.04 ± (4.47)	94.13 ± (2.59)
		23.40 ± (7.37)	27.19 ± (5.59)	76.83 ± (6.51)
		31.18 ± (5.85)	40.85 ± (6.13)	51.67 ± (9.15)
DIAdvisor data set
306090	CNN-LSTM	9.30 ± (1.68)	15.38 ± (2.88)	98.13 ± (2.59)
		18.94 ± (5.43)	29.08 ± (5.40)	77.78 ± (11.06)
		32.52 ± (6.29)	38.85 ± (6.13)	53.67 ± (9.15)

Abbreviations: MAE, mean absolute error; RMSE, root mean square error; R2, coefficient of determination; BG, blood glucose; PH, prediction horizon; CNN, convolutional neural network; LSTM, long short-term memory.

Figure 6.

Patient-wise performance evaluation of the proposed model on the Replace-BG (top panel) and DIAdvisor (bottom panel) data sets for three different PHs over the test data set. Left: MAE, middle: RMSE, and right: R2. In each boxplot, the central mark is the median.

Discussion

Advantages and Limitation of the Proposed CNN-LSTM Model

In this article, we proposed a hybrid CNN-LSTM algorithm, for the prediction of BG concentration in people with T1D. Consisting of an automatic feature extraction component, CNN, and a sequence learner part, LSTM, our proposed CNN-LSTM demonstrated superior performance in extracting hidden features and correlations between various physiological variables, as well as learning their causal effect, to be used for forecasting future BG values.

Nonetheless, as seen in the “Patient-Wise Analysis” subsection, to obtain acceptable performance, CNN-LSTM, like all other DNN-based architectures, needs to be trained on a large enough data set. That is why the model generally performs better on the Replace-BG than the DIAdvisor data set. On the other hand, one of the primary drawbacks of dealing with physiological data sets is the large number of missing data points, which can have a substantial impact on model performance. As noted in the “Data Preprocessing” subsection, we used linear interpolation on the training set to address this issue when gaps in CGM data were shorter than 60 minutes; however, it would be more efficient to have a data set with fewest possible missing data points.

Comparison With Existing Algorithms

This study demonstrated the superiority of our proposed CNN-LSTM model over the ARX, SVR, LSTM, and CRNN models, in terms of both predictive accuracy metrics and clinical acceptability. This higher performance is due to a more sophisticated architecture comprised of stacks of convolutional and LSTM layers, which results in a more robust method for learning complex and hidden features in multivariate data sets as well as learning to predict abrupt changes in the CGM level caused by alteration in other variables, like food or insulin intakes. In addition, as illustrated in Figure 3, our proposed CNN-LSTM model is more capable of capturing rapid and abrupt changes in the CGM trend, owing to its capacity for learning the complex dynamics and correlations between variables in the data set. Sufficient data are required to produce the desired results, however, which accordingly raises the computational cost of the CNN-LSTM model as opposed to the reference models.

Conclusion and Future Work

In this article, we proposed a hybrid deep learning–based model, comprised of convolutional and LSTM layers, and proved its superior performance in predicting future BG levels, for two multivariate in vivo data sets of T1D patients, Replace-BG and DIAdvisor, respectively, over previously published models in the literature.

To account for intersubject variability, we used the FC, and trained the model on different train/test subsets of data set on a rolling basis with ratio of 80/20, to be able to leverage all patients both in train and in test subsets of both data sets. We found that the proposed method worked well for both short-term, 30-minute PH (Replace-BG: MAE 6.60 ± 0.76 mg/dL, RMSE 9.28 ± 1.31 mg/dL, and R2 97.92 ± 2.14%; DIAdvisor: MAE 6.92 ± 0.68 mg/dL, RMSE 9.81 ± 0.91 mg/dL, and R2 97.03 ± 1.24%) and long-term, 60-minute PH (Replace-BG: MAE 11.74 ± 1.66 mg/dL, RMSE 16.51 ± 2.19309 mg/dL, and R2 91.97 ± 3.33%; DIAdvisor: MAE 12.06 ± 1.87 mg/dL, RMSE 18.32 ± 2.76 mg/dL, and R2 91.23 ± 3.32%) and 90-minute PH (Replace-BG: MAE17.30 ± 2.07 mg/dL, RMSE 23.45 ± 3.18 mg/dL, and R2 84.13 ± 3.66%; DIAdvisor: MAE 18.23 ± 2.97 mg/dL, RMSE 25.12 ± 4.65 mg/dL, and R2 82.34 ± 4.54%). The clinical acceptability of the proposed model was further assessed using CG-EGA measures, as shown in Table 2. The suggested model was then trained and evaluated patient-by-patient to assess its robustness to intra-subject variability. Based on the results, it is observable that our suggested technique, like any other DNN-based methodology, is extremely dependent on the quality and size of the data set. In the future, we aim to leverage the transfer learning methodology, that is transferring the knowledge resulted from training on a large data set, to generalize the trained model on any unseen patient. Last, intensity and type of PA and stress level were not taken into account for BG prediction in this work. While we acknowledge the limitations introduced by this, we are currently planning to incorporate these signals into future studies.

Supplemental Material

sj-docx-1-dst-10.1177_19322968221092785 – Supplemental material for Long-term Prediction of Blood Glucose Levels in Type 1 Diabetes Using a CNN-LSTM-Based Deep Neural Network

Supplemental material, sj-docx-1-dst-10.1177_19322968221092785 for Long-term Prediction of Blood Glucose Levels in Type 1 Diabetes Using a CNN-LSTM-Based Deep Neural Network by Mehrad Jaloli and Marzia Cescon in Journal of Diabetes Science and Technology

Footnotes

Abbreviations

CGM, continuous glucose monitoring; CHO, carbohydrate; ANN, Artificial neural network; DNN, deep neural network; ML, machine learning; CNN, convolutional neural network; LSTM, long short-term memory; RNN, recurrent neural networks; MAE, mean absolute error; RMSE, root mean square error; SVR, support vector regression; R2, coefficient of determination; STD, standard deviation; CRNN, convolutional recurrent neural network; CG-EGA, continuous glucose-error grid analysis.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr Cescon serves on the advisory board for Diatech Diabetes, Inc. Mehrad Jaloli declares no conflict of interest relevant to this project.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the University of Houston through a start-up grant.

ORCID iD

Mehrad Jaloli

Supplemental Material

Supplemental material for this article is available online.

References

Kirchsteiger

Jørgensen

Renard

del Re

. Prediction Methods for Blood Glucose Concentration. Switzerland: Springer International Publishing. 2016.

Rodríguez-Rodríguez

Chatzigiannakis

Rodríguez

J-V

Maranghi

Gentili

Zamora- Izquierdo

M-Á.

Utility of big data in predicting short-term blood glucose levels in type 1 diabetes mellitus through machine learning techniques. Sensors. 2019;19:4482

Sparacino

Zanderigo

Corazza

Maran

Facchinetti

Cobelli

Glucose concentration can be predicted ahead in time from continuous glucose monitoring sensor time-series. IEEE Trans Biomed Eng. 2007;54:931-937.

Gani

Gribok

Ward

Vigersky

Reifman

Universal glucose models for predicting subcutaneous glucose concentration in humans. IEEE Trans Inf Technol Biomed. 2009;14:157-165.

Eren-Oruklu

Cinar

Quinn

Smith

Estimation of future glucose concentrations with subject-specific recursive linear models. Diabetes Technol Ther. 2009;11:243-253.

Finan

Doyle

III Palerm

, et al. Experimental evaluation of a recursive model identification technique for type 1 diabetes. J Diabetes Sci Technol. 2009;3:1192-1202.

Cescon

Renard

. Adaptive Subspace-based prediction of T1DM glycemia. In: 2011 50th IEEE Conference on Decision and Control and European Control Conference, Orlando, Florida, 12-15 December 2011, IEEE; 5164-5169

Cescon

Ståhl

Landin -Olsson

Johansson

Subspace-based model identification of diabetic blood glucose dynamics. IFAC Proceedings Volumes. 2009;42:233-238.

Cescon

Johansson

Renard

Predicting Glycemia in Type 1 Diabetes Mellitus with Subspace-Based Linear Multistep Predictors. Springer. Cham: Prediction Methods for Blood Glucose Concentration 2016 .

10.

Cescon

Johansson

Renard

Subspace-based linear multi-step predictors in type 1 diabetes mellitus. Biomed Signal Process Contr. 2015;22:99-110.

11.

Jacobs

Young

Tyler

, et al. Using machine learning to predict glucose changes during aerobic, anaerobic and mixed forms of exercise in patients with type 1 diabetes. Diabetes Tech Therapeut. 2020;22:A11–A11.

12.

Alfian

Syafrudin

Anshari

, et al. Blood glucose prediction model for type 1 diabetes based on artificial neural network with time-domain features. Biocybern Biomed Eng. 2020;40:1586-1599.

13.

Zou

Luo

Yin

Tang

Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018;9:515

14.

Jeon

Leimbigler

Baruah

Fossat

Whitehead

AJ.

Predicting glycaemia in type 1 diabetes patients: experiments in feature engineering and data imputation. J Healthc Informatics Research. 2020;4:71-90.

15.

Bunescu

Struble

Marling

Shubrook

Schwartz

. Blood glucose level prediction using physiological models and support vector regression. In 2013 12th International Conference on Machine Learning and Applications (ICMLA’13) Miami, Florida, vol. 1, 4-7 December, 2013:135-140. IEEE.

16.

Plis

Bunescu

Marling

Shubrook

Schwartz

A machine learning approach to predicting blood glucose levels for diabetes management. Workshops at the Twenty-eighth AAAI Conference on Artificial Intelligence, Quebec, Canada, 27-28 July 2014.

17.

Rodríguez-Rodríguez

Rodríguez

J-V

Woo

Wei

Pardo-Quiles

D-J.

A comparison of feature selection and forecasting machine learning algorithms for predicting glycaemia in type 1 diabetes mellitus. Applied Sciences. 2021;11:1742

18.

Zhu

Herrero

Chen

Georgiou

. A deep learning algorithm for personalized blood glucose prediction, 3rd International Workshop on Knowledge Discovery in Healthcare Data co-located with the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence (IJCAI-ECAI), Stockholm, Sweden, 13-19 July, 2018: 64-78.

19.

Liu

Zhu

Herrero

Georgiou

GluNet: a deep learning framework for accurate glucose forecasting. IEEE J Biomed Health Inform. 2019;24:414-423.

20.

Mirshekarian

Bunescu

Marling

Schwartz

. Using LSTMs to learn physiological models of blood glucose behavior. 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Korea, 11-15 July, 2017: 2887-2891. IEEE.

21.

Daniels

Liu

Herrero

Georgiou

Convolutional recurrent neural networks for glucose prediction. IEEE J Biomed Health Inform. 2019;24:603-613.

22.

Mehrad

Jaloli

Marzia

Cescon.

Predicting Blood Glucose Levels Using CNN-LSTM Neural Networks. 2020 Diabetes Technology Meeting Abstracts, Vol. 15. 2nd ed., Los Angeles, CA: Sage; 2020.

23.

Aleppo

Ruedy

Riddlesworth

, et al. REPLACE-BG: a randomized trial comparing continuous glucose monitoring with and without routine blood glucose monitoring in adults with well-controlled type 1 diabetes. Diabetes Care. 2017;40:538-545.

24.

Cescon

Modeling and Prediction in Diabetes Physiology. Department of Automatic Control, Lund University, Lund, Sweden; 2013 Nov.

25.

Bertachi

Viñals

Biagi

, et al. Prediction of nocturnal hypoglycemia in adults with type 1 diabetes under multiple daily injections using continuous glucose monitoring and physical activity monitor. Sensors. 2020;20:1705

26.

Chen

Herrero

Zhu

Georgiou

Dilated recurrent neural network for short-time prediction of glucose concentration, 3rd International Workshop on Knowledge Discovery in Healthcare Data co-located with the 27th International Joint Conference on Artificial Intelligence and the 23rd European Conference on Artificial Intelligence, Stockholm, Sweden, 13-19 July 2018: pp. 69-73

27.

Sola

Sevilla

Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans Nucl Sci. 1997;44:1464-1468.

28.

Bergmeir

Benítez

JM.

On the use of cross-validation for time series predictor evaluation. Information Sciences. 2012;191:192-213.

29.

Abadi

Barham

Chen

, et al. Tensorflow: a system for large-scale machine learning. 12th {USENIX} symposium on operating systems design and implementation (OSDI 16), Savannah, GA, 2-4 November, 2016: 265-283

30.

Kovatchev

Gonder -Frederick

Cox

Clarke

Evaluating the accuracy of continuous glucose-monitoring sensors: continuous glucose–error grid analysis illustrated by TheraSense freestyle navigator data. Diabetes Care. 2004;27:1922-1928.

31.

Oviedo

Vehí

Calm

Liobe

AJ.

A review of personalized blood glucose prediction strategies for t1dm patients. Int J Numer Method Biomed Eng. 2017;6:2833

32.

Georga

Protopappas

Ardigo

, et al. Multivariate prediction of subcutaneous glucose concentration in type 1 diabetes patients based on support vector regression. IEEE J Biomed Health Inform. 2012;17:71-81.

33.

de Bois

el Yacoubi

Ammi

GLYFE: review and benchmark of personalized glucose predictive models in type-1 diabetes. Med Biol Eng Comput. 2021;60:1-17.

34.

Daniels

Herrero

Georgiou

. Personalised glucose prediction via deep multitask networks. Paper presented at the KDH@ ECAI, Santiago de Compostela, Spain, 29-30 August 2020: 110-114

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

Long-Term Prediction of Blood Glucose Levels in Type 1 Diabetes Using a CNN-LSTM-Based Deep Neural Network

Abstract

Background:

Methods:

Results:

Conclusion:

Keywords

Introduction

Experimental Conditions

Data Preprocessing

Train-Test Splitting: Forward Chaining (FC)

Methods

Training Hyperparameters

Comparison With Other Methods

Experimental Results

Population-Wise Analysis

Patient-Wise Analysis

Discussion

Advantages and Limitation of the Proposed CNN-LSTM Model

Comparison With Existing Algorithms

Conclusion and Future Work

Supplemental Material

sj-docx-1-dst-10.1177_19322968221092785 – Supplemental material for Long-term Prediction of Blood Glucose Levels in Type 1 Diabetes Using a CNN-LSTM-Based Deep Neural Network

Footnotes

Abbreviations

Declaration of Conflicting Interests

Funding

ORCID iD

Supplemental Material

References

Supplementary Material