Abstract
Background:
Nocturnal hypoglycemia poses significant risks to individuals with insulin-treated diabetes, impacting health and quality of life. Although continuous glucose monitoring (CGM) systems reduce these risks, their poor accuracy at low glucose levels, high cost, and availability limit their use. This study examined physiological biomarkers associated with nocturnal hypoglycemia and evaluated the use of machine learning (ML) to detect hypoglycemia during nighttime sleep using data from consumer-grade smartwatches.
Methods:
This study analyzed 351 nights of 36 adults with insulin-treated diabetes. Participants wore two smartwatches alongside CGM systems. Linear mixed-effects models compared sleep and vital signs between nights with and without hypoglycemia during early and late sleep. A ML model was trained to detect hypoglycemia solely using smartwatch data.
Results:
Sixty-six nights with spontaneous hypoglycemia were recorded. Hypoglycemic nights showed increased wake periods, heart rate, stress levels, and activity during early sleep, with weaker effects during late sleep. In nights when hypoglycemia occurred during early sleep, the ML model performed comparable or better than prior studies with an area under the receiver operator curve of 0.78 for level 1 and 0.83 for level 2 hypoglycemia, with sensitivity of 0.78 and 0.89, specificity of 0.64 for both, negative predictive value of 0.94 and 0.99, and positive predictive value of 0.25 and 0.13 for level 1 and level 2 hypoglycemia, respectively.
Conclusions:
Consumer-grade smartwatches demonstrate promise for detecting nocturnal hypoglycemia, particularly during early sleep. Refining models to reduce false alarms could enhance their clinical utility as low-cost, accessible tools to complement CGM.
Introduction
Hypoglycemia remains a critical challenge for individuals with insulin-treated diabetes mellitus. It is characterized by reduced blood glucose concentrations, which mainly result from an excess of exogenous insulin. Hypoglycemia may lead to adverse symptoms, including anxiety, impaired cognitive function, sleep disturbance, and diminished productivity, and has been associated with increased cardiovascular risk. 1 Nocturnal hypoglycemia, typically defined as hypoglycemia occurring during nighttime sleep, poses unique challenges in terms of its recognition, prevention, and treatment.2,3 Sleep affects the physiological defense mechanisms against hypoglycemia, with counterregulatory responses attenuated during sleep in both healthy individuals and those with type 1 diabetes mellitus (T1DM),4,5 and the onset threshold for neuroendocrine counterregulation shifted to lower glucose levels during early nocturnal sleep. 6 Furthermore, hypoglycemia is often asymptomatic,7-10 which may lead to prolonged, unrecognized episodes and hypoglycemia unawareness, 11 thereby increasing the risk of severe hypoglycemia. 12 An estimated 43% of all hypoglycemic episodes and 55% of severe cases occur during sleep. 13
Continuous glucose monitoring (CGM) systems have been shown to reduce the risk and number of hypoglycemic events.14,15 However, CGMs have limitations including reduced accuracy at low glucose levels,16-18 particularly within the first 12 hours of sensor placement, susceptibility to temporal shifts, and compression artifacts.19,20 The latter is problematic during sleep, as false hypoglycemia alarms disrupt sleep. Furthermore, the Hypo-METRICS study recently highlighted that many reported hypoglycemic events occur above sensor thresholds, underscoring the disconnection between sensor data and personal experiences. 21 These limitations of CGM-detected hypoglycemia may lead to confusion and frustration for users in managing their diabetes, 22 while also increasing the risk of missed or delayed detection of hypoglycemia. Despite these limitations, CGM remains the state-of-the-art method for nocturnal hypoglycemia detection. However, CGM availability is limited, mainly due to socioeconomic disparities or insurance barriers.23,24 In addition, challenges such as limited sensor placement options, insufficient subcutaneous tissue, and adherence issues are prevalent in young children. 19
Therefore, alternative approaches that use physiological changes induced by hypoglycemia are being explored. For example, combining heart rate variability (HRV) patterns from electrocardiography (ECG) signals with CGM data has improved hypoglycemia detection accuracy compared with with CGM alone. 25 In a study by Olde Bekkink et al, 12 wearable ECG devices identified an increased heart rate (HR) and changes in HRV during hypoglycemia. Nocturnal hypoglycemia also affects cardiac regulation, leading to prolonged QT intervals and reduced low-frequency HRV components.26,27 Using ECG to monitor HR and QT intervals overnight in children with T1DM, Ling et al 28 reported a sensitivity of 78% and specificity of 60% in detecting nocturnal hypoglycemia. However, for imbalanced datasets with relatively few hypoglycemic episodes, sensitivity and specificity alone can be challenging to interpret because they do not fully reflect the performance trade-offs in such contexts, particularly the potential impact of false positives. The primary limitations of ECG monitoring are its short usage duration and limited usability. Smartwatches, which are becoming increasingly popular, offer a more accessible and cost-effective solution with long battery lifetimes while still providing sufficient accuracy for monitoring physiological biomarkers. Lehmann et al 29 proposed a machine learning (ML) approach using wrist-worn wearable devices, combining cardiac, motion, and electrodermal activity (EDA, a measure of skin conductance used as an indicator of physiological arousal) 30 data to detect hypoglycemic events, achieving an area under the receiver operator curve (AUROC) of 0.76 ± 0.07 under real-life conditions. Another study 31 demonstrated the potential of combined smartwatch data and EDA for detecting hypoglycemia while driving, with AUROCs of 0.66 ± 0.12, 0.80 ± 0.2, and 0.72 ± 0.2 for mild (54-63 mg/dL), pronounced (36-45 mg/dL), and overall hypoglycemia, respectively. However, the accuracy of the standalone smartwatch-based detection of nocturnal hypoglycemia remains unclear.
This study aimed to investigate potential biomarkers of nocturnal hypoglycemia derived from consumer-grade smartwatches and evaluate the accuracy of an ML approach for hypoglycemia detection during nighttime sleep solely based on smartwatch data, offering a scalable, cost-effective, and noninvasive approach to complement current methods for managing hypoglycemia.
Methods
Study design
Underlying data were collected during a prospective observational study approved by the Ethics Committee of the Canton of Bern (2022-01525) and registered at ClinicalTrials.gov (NCT05609175). This study complied with the principles of the Declaration of Helsinki. Participants were recruited during outpatient visits in our clinic or by distributing study flyers through various channels, including relevant organizations and events, and online posts. All participants provided written informed consent by signature. Thirty-seven adults living with diabetes on an insulin therapy, and with a glycated hemoglobin value less or equal to 10.0%, were included in this study. Main exclusion criteria were known cardiac arrhythmia; use of antiarrhythmic drugs; use of a pacemaker or implantable cardioverter defibrillator, drug, or alcohol abuse; and any illness or use of medication that could affect sleep patterns, including sleep apnea. Details are provided in the Supplement Material (Table S1).
Data were collected at the participants’ homes under free-living conditions over a study duration of 10 to 15 nights. The participants received two smartwatches, a Fitbit Sense 2 (Fitbit Inc., San Francisco, California) and Garmin Venu 2 (Garmin Ltd., Schaffhausen, Switzerland), and were instructed to wear them during nighttime sleep. Each participant was equipped with an unblinded CGM system (Dexcom G6 or G7). Due the observational nature of this study, participants’ CGM alerts were retained according to their usual practice. Daily fasting capillary blood glucose measurements were performed to calibrate the CGM system.
Data preprocessing
First, all the devices were time-synchronized to align the physiological and glucose data. Continuous glucose monitoring readings with more than 15 minutes of missing data were marked as “Not a Number,” while shorter gaps were linearly interpolated to 1-minute intervals. Moreover, data from nine nights were omitted because of either patient-reported malfunction of the CGM system or insufficient readings, with more than half of the nights lacking CGM coverage.
Furthermore, one participant was excluded from the analysis because of insufficient Fitbit data (no sleep was detected during the first few hours of sleep). Sleep phases and the associated start and end times were defined using data from the Fitbit smartwatch, as proposed by Martine-Edith et al. 32 Only data during sleep were retained, excluding nights with less than 3.5 hours of sleep. Fitbit’s proprietary algorithms estimate sleep stages, including light, deep, and rapid eye movement (REM), using a combination of accelerometer data and HRV metrics. 33 Intervals with missing sleep stage information were excluded from the analysis.
Labfront (Boston, Massachusetts) was used to extract physiological data from the Garmin smartwatch. These data included stress levels (indication of overall stress estimated using HR and HRV),34,35 HR, beat-to-beat intervals (BBIs), and activity (quantified as the accumulated accelerometer magnitude over time). In addition, HRV parameters were derived from BBIs using a feature-generation toolkit. 36 A detailed list of smartwatch parameters used in the analyses is provided in the Supplementary Material (Table S2).
Finally, data were resampled into 1-minute intervals. In accordance with international consensus guidelines,37,38 each sample was labeled as “Level 1 hypoglycemia” when CGM values were <70 mg/dL for at least 15 minutes, “Level 2 hypoglycemia” when CGM values were <54 mg/dL, and “non-hypoglycemia” otherwise.
Statistical analysis
Descriptive statistics were used to summarize participant and sleep characteristics, with means and standard deviations calculated for continuous variables and frequencies and percentages for categorical variables.
In the primary analysis, a linear mixed-effects model (LMM) was employed to examine the average time spent in different sleep stages (wake, light, deep, and REM). This model included nights with and without hypoglycemia as fixed effects. In a subsequent analysis, another LMM was used to model the physiological parameters of stress levels, HR, HRV, and activity. This model included nights without hypoglycemia, nights with hypoglycemia in periods of non-hypoglycemia, and nights with hypoglycemia in periods of hypoglycemia as fixed effects. In both models, participants were treated as random effects, with nights nested within them. Sleep architecture, along with physiological processes, including brain activity, HR, and muscle tone, differs between early and late sleep. While deep sleep predominates during the first half of the night, REM sleep becomes more dominant as night progresses. Furthermore, analyses revealed a significant interaction between hypoglycemia and late sleep when assessing their effects on both sleep stage and physiological parameters (Supplemental Material Tables S3-S10). Therefore, both analyses were stratified into early sleep (from sleep onset to 3.5 hours) and late sleep (from 3.5 hours after sleep onset to the end of sleep).
Classification model for hypoglycemia detection
A Light Gradient Boost Machine (LightGBM) classifier, 39 which is an ensemble algorithm based on decision trees, was trained to detect hypoglycemia. During training, each decision tree partitioned the smartwatch data based on self-learned thresholds for numerical features and optimal splits for categorical features (eg, sleep stages) to effectively distinguish between the 1-minute “non-hypoglycemia” and “hypoglycemia” samples. LightGBM works by sequentially building multiple decision trees, where each tree corrects the error of the previous tree and contributes to the final class probability prediction. LightGBM was selected as the final model after testing random forest, logistic regression, and LightGBM on a subset of the data, and LightGBM performed best.
Prior to model training, all numerical features were scaled using the RobustScaler class from the scikit-learn library 40 with the default configuration. To prevent overfitting, the number of estimators (decision trees) was limited to 10, maximum tree depth was set to 10, and maximum number of tree leaves was capped at 20. The minimum number of samples required to form a leaf was set to 40 and the remaining parameters were maintained at their default settings. A summary of the modified parameters is provided in the Supplemental Material (Table S11). After model training, the AUROC was used to evaluate the overall performance of the model across all classification thresholds. Subsequently, Youden’s J statistic was employed to identify an optimal threshold that balanced the trade-off between sensitivity and specificity. 41 This threshold was used to calculate the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and false positive rate (FPR). The FPR was calculated by dividing false positives by the total non-hypoglycemic samples, reflecting the proportion of non-hypoglycemic minutes falsely flagged.
To assess the performance of the model, a double (or nested) cross-validation (CV) strategy 42 was used among participants who experienced at least one night with hypoglycemia. In this approach, the classifier was trained on data from all participants except for one participant, who was held out for testing. This approach provided an unbiased performance estimate regardless of the sample size, 43 effectively simulating the model’s performance on previously unseen data. This procedure was repeated such that each participant served once as the test set. Bootstrapping was used to calculate the means and confidence intervals (CIs) of performance measures across all test participants. Figure 1 illustrates the training and evaluation approaches of the model. While all recorded nights from the training population were used for model training, the model was evaluated on test participants’ nights stratified by the occurrence of hypoglycemia during early sleep (more than two-thirds of hypoglycemia samples occurred within early sleep) or late sleep (more than two-thirds of hypoglycemia samples occurred within late sleep) and separately for level 1 and level 2 hypoglycemia.

Model training and evaluation approach.
Results
Baseline characteristics
The primary analysis included data from 36 participants, encompassing 351 nights. Thirty-three participants slept alone in a bed, while three shared a bed with their partners. Of the participants, 25 experienced at least one night with level 1 hypoglycemia, and 19 experienced at least one night with level 2 hypoglycemia. Participant demographics are summarized in Table 1, and the Supplemental Material (Tables S12 and S13) provides an overview of participant sleep recordings and hypoglycemic events.
Participant Demographics.
Displayed values denote means ± standard deviation or absolute and relative frequencies unless otherwise specified.
MDI, multiple daily injection.
CSII, continuous subcutaneous insulin infusion.
HCL, hybrid closed-loop.
TDD, total daily insulin dose.
A Gold and/or Clarke score ≥4 indicates impaired awareness of hypoglycemia.
At least one severe hypoglycemia event within 12 months prior to study entry.
RIS, Regensburg Insomnia Scale. A RIS score ≥13 indicates clinically relevant insomnia symptoms, whereas a score ≥25 suggests severe insomnia symptoms.
Sleep and physiological differences during nocturnal hypoglycemia
As shown in Table 2, during early sleep, the percentage of wake periods was significantly higher during nights with hypoglycemia than during nights without hypoglycemia (10.1% vs 8.5%, P = .039). In contrast, during late sleep, the percentage of wake periods on nights with hypoglycemia was significantly lower (5.4% vs 6.9%, P = .034). No significant differences were observed in light, deep, and REM sleep, except for a higher percentage of late REM sleep (26.7% vs 23.6%, P = .046).
Percentage of Time Spent Awake and in the Different Sleep Stages (Light, Deep, and REM) for Early and Late Sleep for Nights With Hypoglycemic Episodes (n = 66) and Nights Without Hypoglycemia (n = 285).
Values are presented as the overall mean percentage across participants with corresponding 95% CI.
Abbreviation: REM, rapid eye movement.
P values <.05 are marked with *.
As presented in Figure 2, on hypoglycemic nights (dashed bars only), a significant increase in stress levels, HR, LF/HF ratio (ratio between low- and high-frequency bands of HRV), and activity was observed during early sleep in hypoglycemic episodes, while in late sleep, only stress levels, HR, and LF/HF ratio were significantly elevated during hypoglycemic episodes. Furthermore, compared with nights without hypoglycemia, individuals in hypoglycemia episodes had significantly higher activity during early sleep and higher stress and HR levels during late sleep. Associations with other HRV parameters are provided in the Supplemental Material (Figure S1).

Physiological parameters during early and late sleep for nights without hypoglycemia (n = 285), and nights with hypoglycemia (n = 66).
Model performance for nocturnal hypoglycemia detection
Data from 249 nights of 25 participants who had experienced at least one hypoglycemic episode were included in this sub-analysis. The LightGBM model showed variable performance depending on the hypoglycemia level and sleep phase, as summarized in Table 3.
Performance of the Nocturnal Hypoglycemia Detection Model for Nights With Level 1 Hypoglycemia Occurring During Early Sleep (n = 40) and Late Sleep (n = 19), and for Nights With Level 2 Hypoglycemia Occurring During Early Sleep (n = 26) and Late Sleep (n = 8).
Values are presented as the overall mean percentage across nights with corresponding 95% (CI).
AUROC, area under the receiver operator curve.
Positive predictive value.
Negative predictive value.
False positive rate.
For level 1 hypoglycemia, the model performed best during early sleep, achieving an AUROC of 0.78 (95% CI: 0.73-0.82). The sensitivity was 78%, specificity was 64%, and the NPV was 94%, demonstrating the model’s reliability in ruling out hypoglycemia. However, the PPV remained low at 25%, and the FPR was 36% reflecting the presence of false positives. During late sleep, performance declined considerably, with an AUROC of 0.34 (95% CI: 0.25-0.44), sensitivity of 15%, and specificity of 61%, leading to an even lower PPV of 10%.
For level 2 hypoglycemia, model performance improved overall but exhibited similar trends across sleep phases. During early sleep, the model achieved its strongest performance, with an AUROC of 0.83 (95% CI: 0.77-0.88), a sensitivity of 89%, specificity of 64%, and an exceptionally high NPV of 99%. However, the PPV remained low (13%), whereas the FPR was 36%. During late sleep, the AUROC dropped to 0.35 (95% CI: 0.16-0.55), with sensitivity declining to 16% and specificity increasing slightly to 72%. The PPV decreased to 4%, and the FPR was 28%.
The corresponding average glucose profiles from the CGM data for early and late sleep hypoglycemia are shown in Figure 3, illustrating differences in glucose trends across sleep phases. A detailed depiction is provided in the Supplemental Material (Figure S2). Figure 4 presents minute-by-minute predictions for two representative test nights, alongside CGM glucose values and physiological parameters. The contribution of individual physiological parameters to the model’s predictions is detailed in the Supplemental Material (Table S14 and Figure S3).

Average glucose profiles for nights with hypoglycemia.

Nocturnal hypoglycemia detection for two sample test nights.
Discussion
Our smartwatch-based ML approach has demonstrated potential for detecting nocturnal hypoglycemia, particularly during early sleep. Hypoglycemic nights showed significant increases in stress levels, HR, LF/HF ratio, and activity during early sleep, aligning with sympathetic nervous system responses to hypoglycemia.8,12,25,44 These physiological changes supported strong model performance during early sleep, with AUROC of 0.78 and 0.83, sensitivity of 78% and 89%, specificity of 64%, and NPV of 94% and 99% for Level 1 and Level 2 hypoglycemia, respectively, achieving metrics comparable to or exceeding those reported in prior studies.28,29,31 Most hypoglycemic events occurred during early sleep (40 vs 19 nights), further strengthening the model’s utility in this phase.
In contrast, late sleep detection was less effective due to attenuated physiological responses, lower hypoglycemia prevalence, and reduced counterregulatory responses.4,6 Sensitivity dropped to 15% and 16% and specificity to 61% and 72% for Level 1 and Level 2 hypoglycemia, respectively, and the AUROC decreased significantly. Larger datasets will be needed to improve detection performance during late sleep, where the physiological signal-to-noise ratio is lower.
The moderate FPR (28%-39%) reflects a common trade-off in hypoglycemia detection models: prioritizing sensitivity to minimize missed events increases the likelihood of false alarms. While high sensitivity ensures patient safety by reliably ruling out hypoglycemia, false positives can disrupt sleep and reduce quality of life. The dataset’s imbalance, common in studies of this nature where non-hypoglycemic episodes dominate, poses a challenge for ML models. The low prevalence of true hypoglycemia amplifies the impact of false positives. Another challenge arises from recurring hypoglycemia during the same night. For example, a participant may recover from hypoglycemia only to re-enter a hypoglycemic state later. Implementing exclusion periods following confirmed hypoglycemic events could reduce redundant wake-up alarms without compromising safety. Similarly, incorporating time-dependent smoothing strategies, such as majority voting or sustained prediction thresholds, could improve usability. Despite these strategies, frequent exposure to alarms, especially false alarms, can lead to alarm fatigue, 45 reducing a person’s likelihood of responding to hypoglycemia alerts during sleep. In this study, hypoglycemia went unnoticed in 12% of hypoglycemic nights where participants remained asleep despite CGM alarms. Whether smartwatch-based detection could enhance alarm effectiveness in such cases remains an open question and should be investigated in future studies.
Inaccuracies in detecting the onset and end of sleep by the Fitbit may have further contributed to misclassifications, as physiological parameters during wakefulness differ markedly from those during sleep. Fitbit was used for sleep detection and Garmin for physiological data collection, leveraging the strengths of each device. However, as smartwatch technology advances, improvements in sensor accuracy and algorithms may reduce these differences, making it increasingly feasible to use a single device for both functions without compromising detection performance. The observational nature of the study introduces additional challenges. Glucose alarms may have influenced both sleep patterns and physiological responses, while the use of nonblinded devices could have subtly altered participant behavior. However, because data were collected exclusively during sleep, these biases are expected to be minimal. Furthermore, the heterogeneity in demographic and clinical characteristics may have introduced variability in detection performance. Specifically, older individuals and individuals with longer diabetes duration may have attenuated autonomic responses, which could reduce detection accuracy. Also differences in insulin therapy regimens may lead to variations in glycemic patterns that influence detection accuracy. However, this reflects real-world conditions, as physiological responses to hypoglycemia naturally vary across individuals.
Finally, this study relied on CGM as the ground truth for hypoglycemia detection. However, CGMs are prone to errors during sleep, including sensor compression, which may lead to false low readings (“compression lows”). Future work could explore additional algorithms to differentiate true hypoglycemia from such errors, thereby improving the robustness of the detection models.
Comparing accuracy across studies is inherently challenging due to the heterogeneity of hypoglycemic episodes and the physiological differences between daytime and nighttime hypoglycemia. Unlike prior studies, our approach focused on spontaneous nocturnal hypoglycemia in real-world settings. While studies such as those by Ling et al 28 and Lehmann et al 29 reported comparable performance using ECG-capable devices or EDA sensors, our reliance on consumer-grade devices ensures scalability and accessibility. Similarly, Maritsch et al 31 achieved a similar performance during induced hypoglycemia in laboratory settings, which elicit more uniform physiological responses. By capturing spontaneous events, our study reflects the variability inherent to real-world hypoglycemia and highlights its associated challenges.
Conclusions
Consumer-grade smartwatches have demonstrated strong potential for detecting nocturnal hypoglycemia, particularly during early sleep, where physiological changes were most pronounced.
Future work should focus on producing larger datasets to improve the late-sleep detection. Refining detection models through time-dependent strategies, such as majority voting, exclusion periods, and advanced algorithms like recurrent neural networks or other temporal ML methods, will better account for time-series patterns and help mitigate false positives. In addition, exploring factors such as hormonal activity, mental and physical status, duration of diabetes, nocturnal wake periods (eg, bathroom visits or other disturbances), impaired awareness of hypoglycemia, and neuropathy could further enhance the detection accuracy and clinical applicability.
While challenges remain in addressing false alarms and improving late sleep detection, this study underscores the feasibility of smartwatches as scalable and cost-effective complements to CGM systems in real-world settings and provides an additional layer of physiological monitoring.
Supplemental Material
sj-docx-1-dst-10.1177_19322968251319800 – Supplemental material for Toward Detection of Nocturnal Hypoglycemia in People With Diabetes Using Consumer-Grade Smartwatches and a Machine Learning Approach
Supplemental material, sj-docx-1-dst-10.1177_19322968251319800 for Toward Detection of Nocturnal Hypoglycemia in People With Diabetes Using Consumer-Grade Smartwatches and a Machine Learning Approach by Camilo Mendez, Ceren Asli Kaykayoglu, Thiemo Bähler, Juri Künzler, Aritz Lizoain, Martina Rothenbühler, Markus H. Schmidt, Markus Laimer and Lilian Witthauer in Journal of Diabetes Science and Technology
Footnotes
Acknowledgements
We thank the study participants and clinical study personnel for their invaluable contributions and support.
Abbreviations
AUROC, area under the receiver operator curve; BBIs, beat-to-beat intervals; CI, confidence interval; CGM, continuous glucose monitoring; ECG, electrocardiography; EDA, electrodermal activity; FPR, false positive ate; HR, heart rate; HRV, heart rate variability; LF/HF: ratio between low- and high-frequency bands of heart rate variability; LMM, linear mixed-effects model; LightGBM, Light Gradient Boosting Machine; ML, machine learning; NPV, negative predictive value; PPV, positive predictive value; REM, rapid eye movement; T1DM, type 1 diabetes mellitus.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: MR and AL are employees of the Diabetes Center Berne. The other authors have no conflicts of interests to declare.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Diabetes Center Berne, Vontobel-Stiftung under application no. 0847/2022, and internal funding from the University of Bern.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
