Abstract

Keywords
Introduction
Systems for continuous glucose monitoring (CGM) and automated insulin delivery (AID) are widely used by patients with diabetes (PwD) these days, with impressive improvements in glucose outcomes and increases in the number of users year by year. The accuracy of CGM systems (which are a critical component of AID systems) has improved a lot in the last 20 years. However, even with the most recent generations of such systems, it is not clear if the accuracy is good in all glucose ranges, especially the low glucose range (<70 mg/dL). Comparison of the accuracy of different CGM systems is hampered by the fact that the study procedures for clinical evaluation of CGM performance are not standardized yet. In this regard, a working group of the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) has recently recommended a respective procedure.1,2
In relatively short time intervals, the manufacturers of CGM systems bring new generations of their devices onto the market. These generations differ from each other in several aspects, like easier handling, smaller housings, longer usage time, etc.; however, usually, the quality of glucose measurement is also improved. At least this is what is stated in the marketing announcements for the new products, documented by lower mean absolute relative difference (MARD) numbers in the clinical studies performed for regulatory approval by the manufacturer with manufacturer specific procedures. The analytical performance of (new) CGM systems is evaluated in clinical studies that are required for market approval by the regulatory agencies in the United States and Europe (and other parts of the world). The manufacturer evaluates the performance of their CGM systems during the clinical development process by comparing the measurement results to those measured in parallel with conventional glucose analyzers in capillary or venous blood samples. The study procedures differ between manufacturers and are most probably optimized for the device and algorithm tested. 1 The differences in the measurement results are given as MARD values. An MARD value <10% is regarded as sufficient for reliable diabetes therapy performance and usage in AID systems; however, the MARD is a parameter that has several limitations, in particular the dependency on the study protocol.3-5
In the past, reports were published in which the performance of a new product (= generation of a CGM system) was compared with the previous generation of a given manufacturer or has tested the same sensor in combination with different algorithms; however, most often, no such “internal” comparisons are published. Reports, in which the performance of a given product from one manufacturer is compared with the product made by another manufacturer, are more frequently performed; however, there are a limited number of respective publications (Table 1). Such “external” comparisons are of high interest for PwD, healthcare professionals, and also for payers, especially as CGM/AID-derived parameters like time in range appear in therapy guidelines. Ideally, such evaluations are performed with the devices attached to the same PwD, to reduce the impact of all factors influencing the observed performance. In addition, the study design should not be optimized to get a good MARD for a specific device, but optimized to the patient need and challenge the CGM to get reliable information for daily use over the whole measurement range including higher rates of change as requested by the IFCC working group. 2 Another very important topic is the choice of the comparison measurement method and evaluation procedure. 6 In such studies, the same “generation” of systems should be tested.
Studies Reporting Results From Head-to-Head Comparisons of Different CGM Systems Published in the Last Five Years With Otherwise Healthy PwD.
CGM: continuous glucose monitoring; PwD: patients with diabetes.
Studies conducted in the hospital or with special patient groups, such as pregnant women, dialysis patients, etc. were excluded.
Such so-called “head-to-head” studies provide helpful insights into the performance of a given CGM system and allow a reasonable interpretation of the results, like “Is the analytical performance of the tested CGM systems truly comparable/different?” In addition, if the study is well-designed, finer distinctions like “Is the analytical performance better across the whole range of glucose values/all daily life situations/over duration of usage/patient groups?” can be made. “Head-to-head” studies can thus diminish the main weakness of MARD values and other outcome metrics in points of comparability and allow for a more objective judgement of CGM system accuracy.
Practical Aspects of Performing Head-to-Head Studies
Clinical studies during which the glucose sensors of 2, 3, or more CGM systems are fixed to the abdomen, arm, or thigh of volunteers are, in principle, conducted in the same way as a “regular” CGM/AID study. However, the study protocol needs to be adjusted to account for the differences in sensor lifetime of different CGM systems. As the accuracy of a given CGM system might change over time, it is important to schedule frequent sampling periods with dynamic glucose excursions at the same point regarding sensor lifetime to ensure objective comparability. Wearing several CGM systems at the same time poses an additional burden on the study subjects (most often PwD), and there is a limitation of how many CGM systems can be attached at the same time. This issue has been more relevant in the past though, since nowadays most CGM systems are approved for several application sites and not only limited to the abdomen and the size of the sensors has also significantly decreased over the years (Figure 1).

Example of six CGM sensors attached to one subject during a clinical study conducted in 2013.
A clear advantage of this approach is, that all systems measure identical glucose levels and changes in a given subject. It is known that glucose sensors located at different body sites (arm, abdomen, and thigh) provide some differences in measurement results (also differences between left and right arm were reported); however, the application sites labelled by the manufacturer should be adhered to in studies. What will always remain is a certain sensor-to-sensor variability that cannot be excluded and has rather technical than physiological causes.
The term “head-to-head studies” sometimes also refers to studies during which a group of volunteers uses a given CGM system for some time (weeks to months) and subsequently switches to a different system. Alternatively, different devices can be randomized to multiple groups of volunteers. However, in a strict sense, such studies are not head-to-head studies as the measurements are not performed on the same subject at the same point in time.
“Head-to-head” studies can also be performed under outpatient conditions, reflecting more real-world experience. However, the parallel measurement of blood glucose values in quality and quantity in parallel can be hampered under such conditions.
Publications of the Results of Head-to-Head Studies Performed in the Last Five Years
Looking at the past five years, there were a total of 16 studies that compared the performance of more than one CGM device in otherwise healthy PwD (eg, studies conducted with pregnant women or in the hospital were excluded) (Table 1). Until recently, there have been no publications using the most recent generations of CGM devices, eg, FreeStyle Libre 3 and Dexcom G7, though.
One might ask “Why do we not see the publication of such studies more regularly with new CGM/AID systems?” The products are freely available on the market and therefore in principle, everybody who is interested and experienced can perform such studies. However, the conduction of clinical studies is associated with considerable costs. Manufacturers of CGM/AID systems are reluctant to sponsor such studies as they are carrying the risk that the study outcome might not be beneficial for their product, ie, the manufacturer of a given competitor product might benefit more from the study results. In case the manufacturer of a given CGM/AID system does not support study performance, who is willing to sponsor/support such studies?
One option for support would be payers. Healthcare insurances pay a lot of money every year for CGM/AID systems (with massive increases year by year), why do they not invest in a thorough investigation of system performance before paying for them? They rely on the regulatory approval process etc., while knowing that their needs—and that of their customers—(effectiveness, ease of use) differ from those that are in the focus of regulatory agencies (safety). One wonders why such studies are not required by regulatory agencies. In principle, such studies should be performed by independent research institutions to avoid bias; however, this would also mean independent support for such studies.
Looking at the funding sources of the published “head-to-head” studies comparing devices from different manufacturers conducted during the last five years, there was one study funded each by Abbott and Dexcom. Without surprise, the results of both studies were positive in favor of the product of the sponsor of the study. It can be assumed that more studies were performed but results were not published as they were not beneficial for the funder, emphasizing the need for independent funding.
“Head-to-Head” Comparison of Current CGM Systems
In this issue of JDST, a study is published to present the results of an evaluation of the point accuracy between two different current CGM systems (Dexcom G7 vs FreeStyle Libre 3). 23 The results of this multicenter, single-arm, prospective, nonsignificant risk evaluation with 55 PwD with type 1 diabetes (T1D) or type 2 diabetes (T2D) showed lower MARD with the Libre 3 CGM system compared with the G7 system (8.9% vs 13.6%). The authors conclude that the Abbott CGM system is more accurate than the Dexcom system in all metrics evaluated. They also ask for additional head-to-head accuracy studies of competitive CGM systems, using standardized metrics and methodologies. This study was sponsored by Abbott.
“Head-to-Head Study” With AID Systems
Looking at AID systems, all issues raised and discussed above remain, but there is the additional limitation that only one system can really “do the job,” ie, infuse insulin and modify glycemia. This limitation makes it impossible to conduct true “head-to-head studies” with AID systems even if they are so-called.
At the EASD 2023, the results of the first large-scale head-to-head study with three different AID systems were presented. 24 In a center in Barcelona, 132 adults with type 1 diabetes prospectively used an AID system from Diabeloop (DBLG-1), Medtronic (MiniMed 780G), or Tandem (Control-IQ) over 12 months, although this study was not randomized. Excitingly, the results—overall—were not significantly different: HbA1c levels fell by 0.9% after 12 months, from a mean baseline of 7.5% to 6.6%. Time in target range (TiR) increased from 60% to 74%, ie, by 3.4 hours per day. The participants achieved a reduction in time above the range of 2.9 hours per day, from a baseline value of 36% to 24% after 12 months. The time below target also decreased from 3.9% at baseline to 1.9% at 12 months, which corresponds to a decrease in time in hypoglycemia of 13 minutes per day. A breakdown of the results by AID system showed that users of the 780G system achieved the highest TiR (79%), followed by Control-IQ users (76%) and Diabeloop users (69%). All three systems achieved a low time below the TiR: 2.6%, 2%, and 1.4%. The largest reduction in HbA1c was achieved by the Control-IQ users (1.1%), the Diabeloop users by 0.9%, and the 780G users by 0.6%. The question is to what degree the results of this nonrandomized study would have been different if the allocation of patients to the respective AID system had been randomized. The initial HbA1c value of the users of the 780G AID system was already lower at the beginning.
Summary
In summary, up to now “head-to-head” studies with a thoroughly selected study design remain to be the only way to truly compare the accuracy of different CGM systems making them especially valuable. It is encouraging to see the first “head-to-head study” of the most recent generations of CGM systems published in this issue. However, the need for more independent funding for further “head-to-head” studies and for more standardization of study design which would also enhance comparability between different CGM and AID systems remains.
Footnotes
Acknowledgements
The helpful comments of Dr Stephanie Wehrstedt and several other clinical colleagues are fully acknowledged.
Correction (April 2024):
Editorial updated; for further details, please see the Article Note at the end of the article.
Abbreviations
AID, automated insulin delivery; CGM, continuous glucose monitoring; PwD, patients with diabetes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: G.F. is the general manager and medical director of the Institute for Diabetes Technology (Institut für Diabetes-Technologie Forschungs- und Entwicklungsgesellschaft mbH an der Universität Ulm, Ulm, Germany), which carries out clinical studies, eg, with medical devices for diabetes therapy on its own initiative and on behalf of various companies. G.F./IfDT has received research support, speakers’ honoraria, or consulting fees in the last three years from Abbott, Ascensia, Berlin Chemie, Boydsense, Dexcom, Lilly, Metronom, Medtronic, Menarini, MySugr, Novo Nordisk, PharmaSens, Roche, Sanofi, and Terumo. D.W. is an employee of IfDT. L.H. is a consultant for several companies that are developing novel diagnostic and therapeutic options for diabetes treatment. He is a shareholder of the Profil Institut für Stoffwechselforschung GmbH, Neuss, Germany.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Article Note
The following updates were made to this article:
In Table 1, the values in the column “MARD (%)” have been updated for the following rows corresponding to the “First Author/Title of Publication” column: Kumagai et al.8; Comparison of glucose monitoring between Freestyle Libre Pro and iPro2 in patients with diabetes mellitus; Performance of the Eversense versus the Free Style Libre Flash glucose monitor during exercise and normal daily activities in subjects with type 1 diabetes mellitus; Kölle et al.21; Continuous Glucose Monitoring Systems in Adults With Type 1 Diabetes
Reference ‘Hanson K, Kipnes M, Tran H’ earlier mentioned in the text has been numbered as 23 in text and has been added to the reference list.
Earlier reference 23 ‘Gregori ARF, Andújar DS, Flores IC’ has been renumbered as 24 in both text and reference list.
