A Conditional Generative Adversarial Network for Synthesis of Continuous Glucose Monitoring Signals

Abstract

This report describes how a Conditional Generative Adversarial Network (CGAN) was used to synthesize realistic continuous glucose monitoring systems (CGM) from healthy individuals and individuals with type 1 diabetes over a range of different HbA1c levels. The results showed that even though the CGAN generated data, did not perfectly reflect real world CGM, many of the important features were captured and reflected in the synthetic signals. It is briefly discussed how heterogenous data sources constitutes a challenge for comparison of predictive CGM models. Therefore 40,000 CGM days were generated by the trained CGAN, equivalent to 940,000 hours of synthetic CGM measurements. These data have been made available in a public database, which can be used as a reference in future studies.

Keywords

generative adversarial networks CGM type 1 diabetes artificial intelligence

Introduction

The introduction of the continuous glucose monitoring systems (CGM) has proven to be a paradigm shift in both the management and understanding of diabetes.¹ CGM offers patients with diabetes and clinicians, the opportunity to improve the patient’s glycemic control by monitoring glucose levels on continuous basis.² This makes it possible to identify events of hypoglycemia, hyperglycemia and glycemic variability, which would not be detected with self-measurement of blood glucose (SMBG).^3-5 As continuous blood glucose data become available, diabetes technologies such as an artificial pancreas/closed-loop system, personalized decision algorithms, and low/high blood glucose alarms has seen a significant increase in research interest and development. Methods such as predicting future blood glucose levels and modeling glucose dynamics are central to the development of these diabetes management technologies. Many studies have been published on these topics in recent years.^6-10 However, the differences in reporting and data used for assessment, makes these systems hard to compare.^6,11 There is a need for publicly available CGM databases, which could serve as common benchmark for these studies.¹¹ This report describes how a Generative Adversarial Network (GAN) was used to synthesize realistic CGM from healthy individuals and individuals with type 1 diabetes over a range of different HbA1c levels. Furthermore, a large database of synthetic CGM was made available, which can be used as a reference for future CGM modeling studies.

Methods

Modeling Approach

GAN is one of the most promising developments in deep learning and it can be used to produce synthetic data that resemble real data input. GAN was first introduced by Goodfellow et al. in 2014,¹² the publication describes the problem of unsupervised learning by training 2 deep (convolutional) neural networks, called generator and discriminator. The networks contest with each other in the form of a zero-sum game where the generator is trained to generate new data and the discriminator model tries to classify input as either real data or generated. This training procedure continues until the discriminator model is unable to distinguish between the real and the simulated data, which indicate that the generator model is generating a plausible signal.

Conditional GANs (CGANs) is an adjustment of regular GANs which can use data labels to generate data belonging to specific categories. In this study we trained a CGAN based on 24-hour CGM profiles from healthy individuals and individuals with type 1 diabetes. HbA1c levels, categorized into 4 labels: below 6.5% (healthy without diabetes), between 6.5% and <7%, between 7% and <8% and above 8%, were used as the condition. The architecture of the CGAN is illustrated in Figure 1.

Figure 1.

The architecture of the CGAN model.

Data Sources

The cohort for this study was combined from 2 studies, one on type 1 diabetes and one on healthy individuals - the T1D Exchange Severe Hypoglycemia in Older Adults with Type 1 Diabetes Study and a multicenter study on continuous glucose monitoring profiles in healthy participants without diabetes.^13,14 A total of 786 full days of measurements were available from the 153 healthy individuals and a total of 1191 full days were available from the 200 individuals with type 1 diabetes. Among the participants with diabetes, 374 days were available from individuals with an HbA1c level of 6.5% to <7%, 421 days with an HbA1c level of 7% to <8% and 228 days with an HbA1c level above 8%. A single CGAN was trained on all available CGM data from the 2 studies.

Synthetic CGM Data Generation

Based on the trained CGAN, we generated 40,000 CGM days, equivalent to 940,000 hours of synthetic CGM values, which we made available as a public database. The CGAN was modeled and trained using Matlab R2020b (The Mathworks Inc., Natick, Massachusetts). The data are available at Mendeley data repository.¹⁵

Statistical Comparison

To evaluate the similarities of the CGAN generated CGM measurements compared to the real CGM profiles we calculated several statistical measures often used to characterize glycemic control from CGM. 24-h (whole day) and night means were used to assess the general glucose level; this is highly correlated to HbA1c.¹⁶ Time above 180 mg/dL and below 70 mg/dL were calculated to respectively assess hyperglycemia and hypoglycemia durations. Standard deviation of the 24-h mean (SD) and Continuous overlapping net glycemic action (CONGA) were used to assess the glycemic variability in the datasets.

CONGA¹⁷ is calculated by determining the difference between values at different intervals. $G_{t}$ is the glucose value to the time t, n is the time interval between observations, in our case 1 hour. k is the amount of total discrete time steps.

C O N G A_{n} = \sqrt{\frac{\sum_{t = t_{1}}^{t_{k}} {(G_{t} - G_{t + n})}^{2}}{k - 1}}

Results and Comments

A total of 40,000 days of synthetic CGM measurements were successfully generated using the trained CGAN. Examples of CGAN generated CGM data are illustrated in Figure 2. The assessment of blood glucose characteristics for both the synthetic CGM and the real CGM datasets are presented in Table 1. The 24-h means for each of the 4 HbA1c groups were comparable between both datasets. The SD of the mean was lower in the synthetic data compared to the real CGM dataset, indicating that not all the 24-h variability was modeled by the CGAN. Time spend in hyperglycemia and hypoglycemia within each HbA1c group were also similar between the synthetic and real dataset. Both SD and CONGA were higher for each group in the synthetic dataset but the correlation between HbA1c level and increased glycemic variability was well captured by the modeling.

Figure 2.

Examples of 4 synthetic generated CGM signals, one for each of the classes; healthy, diabetes with A1C <7%, diabetes with A1C 7-8%, and diabetes with A1C >8%.

Table 1.

Comparison of Characteristics Between Real CGM and Synthetic CGM Presented as Average (SD) or 50th (25th;75th) Percentile.

	Real CGM	Synthetic CGM
N, days	1,959	40,000
Mean whole day, mg/dL
Healthy (A1c <6.5%)	98.2 (92.6;104.3)	98.8 (95.1;102.4)
Diabetes (A1c <7%)	146.1 (123.8;169.5)	149.4 (134.4;166.3)
Diabetes (A1c ≥7%; A1c <8%)	165.7 (140.2;191.9)	164 (146.6;182.4)
Diabetes (A1c ≥8%)	192.9 (159.4;225.3)	196.1 (174;219.3)
Mean night, mg/dL
Healthy (A1c <6.5%)	95.8 (89.2;104.1)	96.4 (88.8;104.6)
Diabetes (A1c <7%)	130.2 (91.2;175.9)	139.3 (101.1;182.4)
Diabetes (A1c ≥7%; A1c <8%)	150 (109.3;192.6)	145.4 (104.6;192.2)
Diabetes (A1c ≥8%)	176.6 (124.1;223.1)	177.2 (128;233.1)
SD whole day, mg/dL
Healthy (A1c <6.5%)	14 (11.7;17.1)	15.8 (13;19.5)
Diabetes (A1c <7%)	51.6 (41.7;68.8)	55.8 (44.2;70.8)
Diabetes (A1c ≥7%; A1c <8%)	56.6 (44.9;73.1)	62.4 (49.6;78.5)
Diabetes (A1c ≥8%)	61.6 (49.2;78.3)	73.7 (59;92)
Time above 180 mg/dL, %
Healthy (A1c <6.5%)	0 (0;0)	0 (0;0)
Diabetes (A1c <7%)	26.4 (12.2;42.4)	27.8 (17.4;39.2)
Diabetes (A1c ≥7%; A1c <8%)	35.8 (19.9;51.7)	37.2 (26;49)
Diabetes (A1c ≥8%)	53.8 (35.2;72.4)	53.8 (42;66)
Time below 70 mg/dL, %
Healthy (A1c <6.5%)	0 (0;2.3)	0 (0;2.1)
Diabetes (A1c <7%)	6.8 (0;16.3)	4.2 (0;13.5)
Diabetes (A1c ≥7%; A1c <8%)	2.8 (0;10.8)	3.2 (0;13.2)
Diabetes (A1c ≥8%)	0 (0;6.8)	1.4 (0;10.1)
CONGA, mg/dL
Healthy (A1c <6.5%)	16.7 (13.2;20.7)	16.9 (14.4;20.2)
Diabetes (A1c <7%)	36.6 (28.4;45.9)	38.2 (32.6;44.8)
Diabetes (A1c ≥7%; A1c <8%)	37 (29.4;46.3)	40.7 (35.1;47.2)
Diabetes (A1c ≥8%)	38.1 (31.6;48.1)	44.1 (38.5;50.5)

In summary, general increased mean levels of glucose, increased glycemic excursions, increased time spend in hyperglycemia and decreased time spend in hypoglycemia are known to be related to increased HbA1c in patients with diabetes. These characteristics were preserved in the synthetic CGM dataset.

Discussion

In this report, we propose a method to synthesize CGM signals, using a supervised machine learning approach, CGAN. The method was originally proposed as a method to generate synthetic images resembling real images. This study shows, how this method can be used to efficiently generate synthetic CGM signals that resemble real world data. While the CGAN generated data, does not perfectly reflect real world measurements, many of the important features from the individuals modeled are captured and present in the synthetic signals. The synthesized data is a readily accessible source of CGM measurements, that can be used as a common external comparator for predictive models developed on often heterogenous data sources. Predictive models based on CGM measurements could be a major advancement in technologies for the treatment of diabetes and have several scientific and clinical implications. The predictive performance of these models is often assessed by splitting the initial dataset into a training and a test set, with the test set thus considered an independent sample. This method is proven to be inefficient and risks incorrect assessment of actual performance.¹⁸ Authors should consider using truly independent data for more correct assessment of potential overfitting as well as external validity. There is a need for an available database which can be used as a common benchmark in developing and assessing the performance of predictive models based on CGM. Furthermore, synthetic data could be used for clinical training of doctors and nurses working with diabetes patients. One of the potential use cases could be in a decision support system where the synthetic CGM is used as a reference for people being actively treated.

While most important characteristics are maintained in the proposed model, this type of model could be further developed, incorporating even more characteristics of the patients to accurately resemble specific patient characteristics and behaviors; for example, insulin dosages, meals and physical activity. This could have applications in many areas beyond predictive research. For example, in the pharmaceutical industry where large expensive cohort studies are often needed to validate findings, CGAN generated realistic data could be used as a part of in silico testing to reduce stress on patients and expensive need for in vivo testing.

Footnotes

Acknowledgements

N/A

Abbreviations

CGAN, Conditional Generative Adversarial Network; CGM, continuous glucose monitoring systems; CONGA, continuous overlapping net glycemic action; SD, standard deviation; SMBG, self-measurement of blood glucose.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Disclaimer

The source of the data is the T1D Exchange, but the analysis, content and conclusions presented herein are solely the responsibility of the authors and have not been reviewed or approved by the T1D Exchange.

ORCID iDs

Simon Lebech Cichosz

Alexander Arndt Pasgaard Xylander

References

Aleppo

Ruedy

Riddlesworth

, et al. REPLACE-BG: a randomized trial comparing continuous glucose monitoring with and without routine blood glucose monitoring in adults with well-controlled type 1 diabetes. Diabetes Care. 2017;40(4):538-545. doi:10.2337/dc16-2482

Ehrhardt

Chellappa

Walker

Fonda

Vigersky

RA.

The effect of real-time continuous glucose monitoring on glycemic control in patients with type 2 diabetes mellitus. J Diabetes Sci Technol. 2011;5(3):668-675. doi:10.1177/193229681100500320

Fleischer

Cichosz

Hansen

TK.

Association of glycemic variability in type 1 diabetes with progression of microvascular outcomes in the diabetes control and complications trial. Diabetes Care

2017;40:777-783. Diabetes Care. 2017;40(11):e164. doi:10.2337/dc17-1339

Fleischer

Laugesen

Cichosz

, et al. Continuous glucose monitoring adds information beyond HbA1c in well-controlled diabetes patients with early cardiovascular autonomic neuropathy. J Diabetes Complications. 2017;31(9):1389-1393. doi:10.1016/j.jdiacomp.2017.06.013

Desalvo

Buckingham

Continuous glucose monitoring: current use and future directions. Curr Diab Rep. 2013;13(5):657-662. doi:10.1007/s11892-013-0398-4

Woldaregay

Årsand

Walderhaug

, et al. Data-driven modeling and prediction of blood glucose dynamics: machine learning applications in type 1 diabetes. Artif Intell Med. 2019;98:109-134. doi:10.1016/j.artmed.2019.07.007

Cichosz

Frystyk

Hejlesen

Tarnow

Fleischer

A novel algorithm for prediction and detection of hypoglycemia based on continuous glucose monitoring and heart rate variability in patients with type 1 diabetes. J Diabetes Sci Technol. 2014;8(4):731-737. doi:10.1177/1932296814528838

Cichosz

Frystyk

Tarnow

Fleischer

Combining information of autonomic odulation and CGM measurements nables prediction and improves etection of spontaneous hypoglycemic vents. J Diabetes Sci Technol. 2015;9(1):132-137. doi:10.1177/1932296814549830

Liu

Vehí

Avari

, et al. Long-term glucose forecasting using a physiological model and deconvolution of the continuous glucose monitoring signal. Sensors. 2019;19(19):4338. doi:10.3390/s19194338

10.

Amar

Shilo

Oron

Amar

Phillip

Segal

Clinically accurate prediction of glucose levels in patients with type 1 diabetes. Diabetes Technol Ther. 2020;22(8):562-569. doi:10.1089/dia.2019.0435

11.

Cichosz

Johansen

Hejlesen

Toward big data analytics. J Diabetes Sci Technol. 2016;10(1):27-34. doi:10.1177/1932296815611680

12.

Goodfellow

Pouget-Abadie

Mirza

, et al. Generative adversarial nets. Accessed November 12, 2020. http://www.github.com/goodfeli/adversarial

13.

Weinstock

DuBose

Bergenstal

, et al. Risk factors associated with severe hypoglycemia in older adults with type 1 diabetes. Diabetes Care. 2016;39(4):603-610. doi:10.2337/dc15-1426

14.

Shah

Dubose

, et al. Continuous glucose monitoring profiles in healthy nondiabetic participants: a multicenter prospective study. J Clin Endocrinol Metab. 2019;104(10):4356-4364. doi:10.1210/jc.2018-02763

15.

Cichosz

Xylander

Synthetic continuous glucose monitoring (CGM) signals. Mendelay Data. 2021. doi:10.17632/chd8hx65r4.1

16.

Rohlfing

Wiedmeyer

H-M

Little

England

Tennill

Goldstein

DE.

Defining the relationship between plasma glucose and HbA(1c): analysis of glucose profiles and HbA(1c) in the diabetes control and complications trial. Diabetes Care. 2002;25(2):275-278.

17.

Mcdonnell

Donath

Vidmar

Werther

Cameron

FJ.

A novel approach to continuous glucose analysis utilizing glycemic variation. Diabetes Technol Ther. 2005;7(2):253-263. doi:10.1089/dia.2005.7.253

18.

Steyerberg

EW.

Validation in prediction research: the waste by data splitting. J Clin Epidemiol. 2018;103:131-133. doi:10.1016/j.jclinepi.2018.07.010