Abstract
Many continuous glucose monitoring (CGM) systems provide functionality which alerts users of potentially unwanted glycemic conditions. These alerts can include glucose threshold alerts to call the user’s attention to hypoglycemia or hyperglycemia, predictive alerts warning about impeding hypoglycemia or hyperglycemia, and rate-of-change alerts. A recent review identified 129 articles about CGM performance studies, of which approximately 25% contained alert evaluations. In some studies, real alerts were assessed; however, most of these studies retrospectively determined the timing of CGM alerts because not all CGM systems record alerts which necessitates manual documentation. In contrast to assessment of real alerts, retrospective determination allows assessment of a variety of alert settings for all three types of glycemic condition alerts. Based on the literature and the Clinical and Laboratory Standards Institute’s POCT05 guideline, two common approaches to threshold alert evaluation were identified, one value-based and one episode-based approach. In this review, a critical discussion of the two approaches, including a post hoc analysis of clinical study data, indicates that the episode-based approach should be preferred over the value-based approach. For predictive alerts, fewer results were found in the literature, and retrospective determination of CGM alert timing is complicated by the prediction algorithms being proprietary information. Rate-of-change alert evaluations were not reported in the identified literature, and POCT05 does not contain recommendations for assessment. A possible approach is discussed including post hoc analysis of clinical study data. To conclude, CGM systems should record alerts, and the episode-based approach to alert evaluation should be preferred.
Introduction
Continuous glucose monitoring (CGM) systems see increasingly widespread use among people with diabetes. As such, their performance and functionality are crucial because they affect the quality of their user’s diabetes therapy. A recent review 1 identified several endpoints by which the performance of CGM systems can be assessed in a clinical study. One of these endpoints is alert reliability, which was addressed in approximately 25% of the studies covered by the review.
Continuous glucose monitoring systems not only display glucose concentrations and trend arrows but also provide some kind of alert functionality. The Clinical and Laboratory Standards Institute’s (CLSI) guideline POCT05 on performance metrics for CGM systems defines alerts as signals “intended to call the user’s attention to the presence of a hazardous or nonhazardous condition.” 2 This can include technical alerts, for example, indicating the end of sensor lifetime or signal loss. However, many systems also provide glucose threshold alerts, alerts for impending hypoglycemia or hyperglycemia (sometimes called “predictive alerts”), or alerts for rapidly changing glucose concentrations. This article will focus on alerts regarding glycemic conditions. The alert functionality of the current-generation CGM systems of the three principal manufacturers Abbott, Dexcom, and Medtronic is summarized in Table 1.
Alert Functionality of Three Current-Generation CGM Systems, as Provided in Their Respective Instructions for Use.3-5
The second alert can be delayed by 15 minutes to 4 hours if glucose levels remain above/below threshold setting
The first alert can be delayed by 15 minutes to 4 hours
RoC alerts can be combined with specific glucose levels.
The second alert can be delayed by 5 minutes to 60 minutes if glucose levels remain below threshold setting
The second alert can be delayed by 5 minutes to 3 hours if glucose levels remain above threshold setting
Alert functionality can be beneficial for users as it has been shown to improve glycemic control. 6 Users can be alerted to current or impending hypoglycemia and thus react sooner than without alerts.7,8 Some populations, like people with hypoglycemia unawareness, can especially benefit from these alerts. Parents and legal guardians may experience ease of mind if they can rely on CGM systems alerting them and their children of potentially hazardous glycemic conditions. 9 If, however, alerts occur too often, this can become a nuisance to users, especially if parallel blood glucose measurements do not confirm these alerts. This can lead to “alarm fatigue,” which can for some people be a barrier to CGM use. 10
It is therefore important to assess the reliability of CGM alerts. In routine care, CGM users would typically react to any alerts a CGM system indicates. Therefore, alert evaluation should be part of suitably designed clinical evaluations of CGM performance. In CGM performance studies, high and low comparator concentrations and high rates of change (RoCs) are typically induced through suitable study procedures, which allows the comprehensive assessment of a CGM system’s alert functionality. 11 As alert evaluations are based on comparison of data provided by the CGM and comparator data (the details depend on the specific analysis, as described in more detail below), comparator data have to be sampled with sufficiently high frequency, that is, one comparator value every 15 ± 5 minutes.
The articles identified in the aforementioned literature review 1 indicate that different approaches to alert evaluation are used. This alone may not be an issue, but clear descriptions of the evaluation methods are often missing, and nomenclature is used inconsistently between publications. This article aims at reviewing and summarizing current approaches to alert reliability evaluations incorporating a post hoc analysis of clinical study data, and to provide recommendations on how future evaluations could be performed.
Assessment of Threshold Alerts
Approaches to the Assessment of Threshold Alerts
Threshold alerts are intended to inform the user of unwanted glucose concentrations, that is, when crossing from the euglycemic range into the hypoglycemic or hyperglycemic range. In the literature, threshold alert evaluations are often not described in sufficient detail. There are different approaches for the assessment of CGM readings and comparator values above or below a specific hypoglycemia or hyperglycemia threshold. For the sake of readability in this article, “above or below a specific hypoglycemia or hyperglycemia threshold” will be abbreviated as “in the alert range.” Based on the identified literature, two main approaches were identified: (1) concurrence of comparator values and CGM readings in the alert range within a specific time frame (Figure 1) (called “value-based approach” in this article), and (2) concurrence of episodes of consecutively sampled comparator values or recorded CGM readings in the alert range within a specific time frame (Figure 2) (called “episode-based approach” in this article).

Example of alert evaluation based on individual comparator values (blue) and CGM readings (black) below a hypoglycemia threshold of 70 mg/dL with an allowed time frame of ±15 minutes. A more detailed explanation is provided in the Supplemental Material. CGM: continuous glucose monitoring, comp.: comparator.

Example of alert evaluation based on episodes below a hypoglycemia threshold of 70 mg/dL with an allowed time frame of 15 minutes showing combinations of (a) confirmed comparator episode (blue) and false CGM episode (black), (b) confirmed comparator episode and true CGM episode, and (c) missed comparator episode and true CGM episode. A more detailed explanation is provided in the Supplemental Material. CGM: continuous glucose monitoring, comp.: comparator, conf.: confirmed.
In total, 34 articles identified in the review1 addressed alert evaluation (Supplemental Table),12-45 and 29 of these reported how alerts were assessed: In 15 of these 29 articles, the alert evaluation was based on individual values,12-26 6 articles assessed episodes,27-32 2 articles assessed both episodes and values,33,34 and 6 seemed to use actually recorded alerts with a value-based approach.35-40 Especially in the latter case, the language was often not sufficiently clear to make a definitive statement. Notably, there were articles, where the wording “events” was used, while the reported data or other statements in the article indicate that the analysis was based on individual values. This is why a clearer distinction between individual values and series of values in the alert range, like using the words “value” and “episode” in this article, is needed when addressing alert evaluations.
Typically, both the value-based approach and the episode-based approach incorporate a specific time frame, within which CGM readings and comparator values in the alert range have to coincide. This means that instead of looking at the exact time stamps of CGM readings and comparator values to assess concurrence, time stamps plus or minus the prespecified time frame are used. This time frame does not necessarily have to be the same for hypoglycemia and hyperglycemia alerts, as hypoglycemia typically requires urgent user reaction, whereas hyperglycemia for an hour or more can conceivably be tolerated, especially after meals. The specific time frames could also be adapted depending on the specific threshold and its clinical relevance. Furthermore, it may potentially be asymmetrical, for example, allowing a longer time frame before a hypoglycemic comparator value than after it. In the articles covering alert evaluation, which were identified in a recent review, 1 this time frame tended to be ±15 minutes (12 articles) rather than ±30 minutes (7 articles), if it was reported at all.
Both the value-based approach and the episode-based approach can be used with actual CGM alerts. In some CGM systems, however, alerts are not part of the downloadable data, so that cumbersome manual documentation would be required for analysis of alerts for these systems. Furthermore, most CGM systems only allow setting of one hypoglycemia threshold and one hyperglycemia threshold, apart from a mandatory “urgent low” setting, making the assessment of multiple alert thresholds simultaneously not feasible (Table 1). An alternative is to retrospectively determine CGM alerts based on CGM readings, for example, assuming that a CGM alert is triggered if CGM readings cross the alert threshold into the alert range. In this case, a set of multiple hypoglycemia and hyperglycemia thresholds can be assessed at the same time with the same methodology. The simulated alerts can also incorporate multiple sets of intervals between initial and repeat alerts, reflecting the manufacturer’s instructions for use (Table 1). However, this can increase the complexity of alert evaluations substantially.
In the value-based approach, the alert reliability is characterized as the concurrence of individual comparator values in the alert range with individual CGM readings in the alert range (Figure 1). A comparator value in the alert range is confirmed if there is at least one CGM reading also in the alert range within the prespecified time frame before or after the comparator value (at 30 minutes in Figure 1), otherwise it is a missed value (at 15 minutes in Figure 1). A CGM reading in the alert range is a true alert, if at least one comparator value also is in the alert range within the prespecified time frame before or after the CGM reading (at 35-45 minutes in Figure 1), otherwise it is a false alert (at 50 minutes in Figure 1). A more detailed explanation is provided in Supplemental Figure 1.
The alternative approach to alert reliability assessment is assessing the concurrence of episodes of CGM readings or comparator values in the alert range. In this approach, series of CGM readings in the alert range or series of comparator values in the alert range are assessed as a whole, instead of as individual values (Figure 2). When a CGM episode starts, that is, the first alert should be triggered, the comparator values are assessed for being in the alert range (ie, a starting or ongoing episode) within a prespecified time frame as well. In Figure 2a, the time to the nearest comparator value in range is too long (−25 minutes instead of −15 minutes), so that it is a false CGM episode. In contrast, Figure 2b and c show true CGM episodes, where comparator values in the alert range are within ±15 minutes. For starts of comparator episodes, the CGM readings are assessed accordingly. In Figure 2a and b, there are CGM readings in the alert range within ±15 minutes of the start of the (confirmed) comparator episode, whereas the comparator episode in Figure 2c was missed because the nearest CGM value in the alert range occurred 20 minutes after the start of the episode. A more detailed explanation is provided in Supplemental Figure 2 for comparator episodes and Supplemental Figure 3 for CGM episodes. This assessment could be limited to the concurrence of episode starts, which would categorize the comparator episode in Figure 2a as missed and the CGM episode in Figure 2c as false. However, the CGM system is already penalized for the unacceptably late start of the comparator episode in Figure 2a and the unacceptably late start of the CGM episode in Figure 2b, and requiring the concurrence of episode starts would penalize the CGM system twice.
Figure 2a also highlights the potential benefit of asymmetric time frames in alert evaluations because had the CGM episode triggered an alert, it would have led to a beneficial clinical outcome (earlier hypoglycemia response), although the methodology used for the assessment categorizes the CGM episode as “false.” For the sake of consistency with established analyses, the examples in this article will use a symmetrical time frame. The specific clinical benefit may also depend on whether glucose concentrations returned to normal levels with or without intervention.
The guideline CLSI POCT05 2 provides recommendations for alert evaluations. However, they are somewhat inconsistent. On one hand, they seem to also emphasize the episodic nature of comparator values. POCT05 mentions alerts set off by the CGM system (ie, actual alerts) and their concurrence with “reference events,” that is, one or multiple consecutive comparator values in the alert range. On the other hand, another definition in the same chapter relates to situations where a “CGM measurement” is in the alert range within a specific time frame of a “reference measurement” (ie, comparator value) being in the alert range, thus implying a value-based analysis.
Comparison of the Value-Based and Episode-Based Approaches
The episode-based approach addresses the questions “If my CGM system triggers a threshold alert (ie, crosses into the alert range), is my blood glucose actually in the alert range?” and “If my blood glucose crosses into the alert range, does my CGM system actually trigger an alert?” In contrast, the value-based approach addresses the questions “If my CGM values are in the alert range, is my blood glucose also in the alert range?” and vice versa. Considering that alerts are intended to call the user’s attention to the presence of a potentially hazardous situation, the episode-based approach should be preferred over the value-based approach, as it explicitly addresses the crossing into the alert range rather than the state of being in the alert range, and optimal diabetes therapy aims at not being in the alert range in the first place.
The value-based and the episode-based approaches led to different results in previous studies assessing hypoglycemia alerts and detection.33,34 This was confirmed by post hoc analysis of data from a recently published study, 46 in which four hypoglycemia and four hyperglycemia thresholds were assessed (Table 2, Figure 3). Depending on the design of the CGM performance study in which alerts are evaluated, this difference can vary. Some studies are designed to induce relatively slow RoCs, often associated with either hypoglycemic or hyperglycemic values lasting for an hour or more, but not both. In these cases, the duration of hypoglycemic episodes may be not representative of real life, as a recent analysis of real-world data indicates that CGM hypoglycemia episodes have a median duration of only 30 minutes. 11 If alert evaluations in these studies are based on individual values rather than episodes, the CGM system might exhibit favorable alert reliability merely because the CGM system reaches low glucose values eventually, after sufficiently long exposure to actual hypoglycemia. For example, if comparator values, measured every 15 minutes, drop ≤70 mg/dL for six consecutive values and the first CGM reading ≤70 mg/dL occurs 46 minutes after hypoglycemia onset (ie, immediately after the third of six comparator values), 66.7% of comparator values (four out of six) would be counted as “confirmed,” although the hypoglycemia episode was not detected soon enough.
Example Results for the Same CGM System for the Same Set of Hypoglycemia and Hyperglycemia Threshold Values (in mg/dL), With Analyses Based on Individual Values (Top) or Episodes (Bottom).
CGM: continuous glucose monitoring, comp.: comparator.

Example results (from Table 2) for the same CGM system for the same set of hypoglycemia and hyperglycemia alert thresholds. Numbers at the bottom of the bars indicate numbers of values/readings (“V”/”R,” blue) or episodes (“E,” orange) in the alert range. (a) Assessment of comparator values/episodes, (b) assessment of CGM readings/episodes. CGM: continuous glucose monitoring, Comp.: comparator.
Assessing episodes rather than values also requires the definition of what constitutes an episode. In the past, different definitions were used. For example, the DirecNet study 27 defined a hypoglycemic CGM episode starting with at least two CGM values below 70 mg/dL and ending with the first CGM value above 80 mg/dL. Successive phases of CGM values below 70 mg/dL had to be spaced at least 30 minutes to count as distinct episodes. Multiple comparator values ≤70 mg/dL within 30 minutes of each other were considered a single comparator episode. Mahmoudi and colleagues, 33 on the other hand, defined the start of a hypoglycemic episode with the first comparator value in the alert range, and the end as a period of at least 30 minutes with no comparator value in the alert range.
In the post hoc analysis presented here, a time frame of ±15 minutes was selected. For the episode-based approach, the start of the episode was defined as the first CGM reading or comparator value in the alert range, ending with the first CGM reading or comparator value outside the alert range, if there are at least two consecutive readings/values outside the alert range.
Assessment of Predictive Alerts
Predictive alerts are a feature that is intended to warn users some time before unwanted glucose concentrations occur. The functionality of this feature differs between models of CGM systems (Table 1). For Dexcom G7, the only predictive alert is triggered if CGM readings ≤55 mg/dL are expected within 20 minutes. FreeStyle Libre 3 does not provide predictive alerts. In contrast, Guardian 4 provides great flexibility for predictive alerts. Here, they are tied to the user-specified hypoglycemia and hyperglycemia thresholds, and the prediction horizon can be set to values ranging from 10 to 60 minutes.
If the evaluation of predictive alerts is based on actual alerts, the information provided in the instructions for use is sufficient to assess reliability. However, retrospectively determining predictive CGM alerts can be difficult, as the specific way in which a CGM system triggers predictive alerts may be proprietary information. A strict evaluation of predictive alerts will thus be associated with extensive manual documentation, as some CGM systems do not store these alerts in the downloadable data of the CGM system. Additional to that, CGM systems may only allow the setting of one hypoglycemic and one hyperglycemic alert threshold (like Guardian 4), and one prediction horizon (if it is not pre-set like for Dexcom G7), so that alert evaluation based on actual alerts might only cover one of many combinations available to users. A retrospective evaluation of multiple thresholds and prediction horizons would have to make assumptions about how predictive alerts are triggered, which markedly limits the usefulness of this kind of analysis. If details on the evaluation of predictive alerts are included in publications from manufacturer-funded studies or articles coauthored by manufacturer employees, they could be assumed to be sanctioned by the manufacturer and, thus, sufficiently representative to allow an adequate assessment. For Guardian 4, no publication about predictive alerts could be identified. In case of Dexcom G7, the principal performance publication 26 explicitly describes how the predictive alert was analyzed. That study was funded by Dexcom and the article coauthored by Dexcom employees. Therefore, that article could serve as a precedent for predictive alert evaluation of this device, as it only features one predictive alert, the “urgent low soon” alert (Table 1). Occurrences of this alert, which indicates a prediction of CGM ≤55 mg/dL within the next 20 minutes, were documented, and comparator values in the following 30 minutes were assessed by whether they were ≤70 or ≤55 mg/dL. Note that the prediction horizon was not identical to the time frame of the assessment, a decision possibly affected by the measurement frequency of 15 ± 5 or 10 ± 5 minutes, depending on the glucose concentration. The contrasting “detection” analysis, that is, rates of occurrence of comparator values ≤55 or ≤70 mg/dL concurrent (or not) with “urgent low soon” alerts, was not reported, so that adequate calculations cannot be derived from the article.
A possible simplified calculation of predictive alerts could, for example, take the current CGM RoC (in mg/dL/min) multiplied by the prediction horizon (here, 20 minutes) to calculate an expected glucose concentration change (in mg/dL), which could then be added to the current CGM value to check whether the result is below the specified hypoglycemia threshold of ≤55 mg/dL. Using a variable time frame, similar to threshold alerts, could reduce the impact of the assumptions made about the prediction algorithm. Using simplifications like this and analogous for other hypoglycemia and hyperglycemia thresholds would allow assessment similar to threshold alerts, also either in a value-based or episode-based approach. If a manufacturer performs an alert evaluation study for their own device, they would not have to rely on these simplifications and could rather use their own proprietary algorithm.
Regarding guideline recommendations, POCT05 acknowledges that there is no consensus regarding the assessment of predictive alerts and provides only surface-level examples of predictive alert evaluation without going into details.
Assessment of RoC Alerts
As discussed in another article in this special issue, 47 the topic of CGM RoCs is complex. It is likely that RoC alerts are tied to the RoC information provided in the form of trend arrows. Trend arrows, however, are often not part of the downloadable CGM data, and they cannot easily be determined from the recorded glucose values. Thus, their strict evaluation would require manual documentation.
The CLSI POCT05 contains recommendations on how to assess the RoC in the context of trend accuracy evaluation. These could be used as a simplified substitute for trend arrows. The RoCs calculated from the selected approach then have to be checked against the comparator RoC. As is the case for threshold alerts, relevant comparator RoCs can be either confirmed or missed, based on whether CGM RoCs are sufficiently fast in sufficiently close temporal proximity. Relevant CGM RoCs, on the other hand, can be either true or false, depending on whether they are confirmed by sufficiently fast comparator RoCs within a prespecified time frame. Similar to threshold alerts, this time frame should be prespecified based on clinical considerations. Finally, the analysis can again be value or episode based, with the episode-based approach having similar benefits as for the threshold alert evaluation. Note that this assessment, even with the value-based approach, is different from trend accuracy evaluations because the latter only compare concurrent RoCs, without allowing a specific time frame. While episodes may be shorter than those for threshold alerts, it should be kept in mind that for each 15-minute-spaced comparator RoC, there are three CGM RoCs with the CGM systems mentioned in Table 1. A post hoc analysis of data from a recently published study 46 is presented in Table 3 and Figure 4. As for threshold alerts, a time frame of ±15 minutes was selected. For the episode-based approach, the start of the episode was defined as the first CGM or comparator RoC in the alert range, ending with the first CGM or comparator RoC outside the alert range, if there are at least two consecutive RoCs outside the alert range. The episode-based approach should be preferred in RoC alert evaluation because it reflects the temporal dependence of successive RoCs better than the value-based approach.
Example Results for the Same CGM System for the Same Set of Rate-of-Change (RoC) Threshold Values (in mg/dL/min), With Analyses Based on Individual Values (Top) or Episodes (Bottom).
CGM, continuous glucose monitoring, comp., comparator.

Example results (from Table 3) for the same CGM system for the same set of rate-of-change (RoC) alert values. Numbers at the bottom of the bars indicate numbers of values/readings (“V”/“R,” blue) or episodes (“E,” orange) in the alert range. (a) Assessment of comparator RoC values/episodes, (b) assessment of CGM RoC readings/episodes. CGM: continuous glucose monitoring, Comp.: comparator.
Conclusion
Many CGM systems provide alert functionality. Its reliability not only directly affects its benefit to diabetes therapy but also indirectly affects the willingness of CGM users to integrate alerts into their therapy. Alert reliability has been addressed in various different ways in the literature, but descriptions of the methodology are often incomplete. The CLSI POCT05 covers an approach to evaluate glucose threshold alerts in detail, while there is only a brief statement regarding predictive threshold alerts, and no mention of RoC alerts.
Ideally, alert evaluations are based on actual alerts triggered by the CGM system. Many CGM systems, however, do not record alerts in the device data that can retrospectively be exported. Manual recordkeeping is cumbersome and can substantially add to the complexity of a study. Even if alerts are stored, this typically limits the analysis to one set of hypoglycemia and hyperglycemia threshold alerts, predictive hypoglycemia or hyperglycemia alerts, and RoC alerts. Thus, alerts are often determined retrospectively based on certain assumptions, for example, as being triggered when CGM readings cross into or are in the alert range.
In the literature, evaluation of threshold alerts was often based on individual values. This approach is limited because it does not take into account the chronological order and interdependence of the CGM results, where CGM alerts are triggered on the first crossing into the alert range and, in some cases, after certain time spent in the alert range, rather than with each individual CGM reading in the alert range. A standardized definition for what constitutes an episode is currently missing, and would be helpful. Nevertheless, an episode-based analysis should be preferred over a value-based approach independent from whether actually triggered alerts or modeled alerts are assessed, as it is more representative of everyday use of CGM systems.
Supplemental Material
sj-docx-1-dst-10.1177_19322968241236504 – Supplemental material for A Critical Discussion of Alert Evaluations in the Context of Continuous Glucose Monitoring System Performance
Supplemental material, sj-docx-1-dst-10.1177_19322968241236504 for A Critical Discussion of Alert Evaluations in the Context of Continuous Glucose Monitoring System Performance by Stefan Pleus, Manuel Eichenlaub, Delia Waldenmaier and Guido Freckmann in Journal of Diabetes Science and Technology
Footnotes
Acknowledgements
The authors would like to thank Dr Stephanie Wehrstedt and Dr Cornelia Haug for their valuable feedback.
Abbreviations
CGM, continuous glucose monitoring; CLSI, Clinical and Laboratory Standards Institute; RoC, rate of change.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: G.F. is a general manager and medical director of the Institute for Diabetes Technology (Institut für Diabetes-Technologie, Forschungs- und Entwicklungsgesellschaft mbH an der Universität Ulm, Ulm, Germany), which carries out clinical studies, for example, with medical devices for diabetes therapy on its own initiative and on behalf of various companies. G.F./IfDT has received research support, speakers’ honoraria, or consulting fees in the last 3 years from Abbott, Ascensia, Berlin Chemie, Boydsense, Dexcom, Lilly, Metronom, Medtronic, Menarini, MySugr, Novo Nordisk, PharmaSens, Roche, Sanofi, and Terumo. S.P., M.E., and D.W. are employees of IfDT.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
