Abstract
Continuous glucose monitoring and flash glucose monitoring technologies measure glucose in the interstitial fluid and are increasingly used in diabetes care. Their accuracy, key to effective glycaemic management, is usually measured using the mean absolute relative difference of the interstitial fluid sensor compared to reference blood glucose readings. However, mean absolute relative difference is not standardised and has limitations. This review aims to provide a consensus opinion on assessing accuracy of interstitial fluid glucose sensing technologies. Mean absolute relative difference is influenced by glucose distribution and rate of change; hence, we express caution on the reliability of comparing mean absolute relative difference data from different study systems and conditions. We also review the pitfalls associated with mean absolute relative difference at different glucose levels and explore additional ways of assessing accuracy of interstitial fluid devices. Importantly, much data indicate that current practice of assessing accuracy of different systems based on individualised mean absolute relative difference results has limitations, which have potential clinical implications. Healthcare professionals must understand the factors that influence mean absolute relative difference as a metric for accuracy and look at additional assessments, such as consensus error grid analysis, when evaluating continuous glucose monitoring and flash glucose monitoring systems in diabetes care. This in turn will ensure that management decisions based on interstitial fluid sensor data are both effective and safe.
Keywords
Introduction
Continuous glucose monitoring (CGM) and flash glucose monitoring are increasingly used in the management of patients with diabetes, particularly individuals receiving insulin therapy. 1 These systems measure glucose in the subcutaneous interstitial fluid (ISF), rather than in capillary blood as measured by traditional self-monitoring of blood glucose (SMBG) using finger stick meters.
Confidence in newer CGM systems and in flash glucose monitoring is gradually increasing, both with patients and healthcare professionals.2,3 Clear advantages of CGM, and more recently flash glucose monitoring, are patient convenience and the comprehensive glycaemic data provided. For those on intensive insulin therapy, frequent glucose monitoring is a prerequisite for tight glycaemic control. Unfortunately, for many people with diabetes, poor engagement with SMBG can be a barrier to optimal glucose control. 4
Encouragingly, clinical accuracy of ISF glucose monitoring systems has improved to the point that regulatory authorities are now approving systems for insulin dosing decisions, without the need for users to perform an adjunct SMBG test, except in defined situations (Table 1). Currently, the Abbott FreeStyle Libre system and the Dexcom G5 system are both approved in Europe (CE marked) and in the United States for non-adjunctive use. 5 A key difference between the two systems is the need to calibrate Dexcom G5 twice daily, whereas Abbott FreeStyle Libre is factory calibrated. Infrequent or incorrect calibration by patients can potentially reduce the accuracy of Dexcom G5, an issue that does not affect FreeStyle Libre.
FreeStyle Libre and Dexcom G5 devices are approved for non-adjunctive insulin dosing with the following caveats.
BG: blood glucose; SMBG: self-monitoring of blood glucose; MARD: mean absolute relative difference.
Manufacturer’s guidance, https://freestyleserver.com/Payloads/IFU/freestyle_libre/2017_mar/ART28697-501_rev-A_WEB.pdf (accessed January 2018).
Manufacturer’s guidance, www.dexcom.com/fingersticks, accessed October 2017.
Approval for non-adjunctive use reflects the findings of three key recent clinical studies which have assessed flash glucose monitoring and CGM devices for use independently of SMBG.
The Dexcom G4 Platinum CGM was assessed in the REPLACE-BG study. 6 A total of 226 subjects with type 1 diabetes were randomised to CGM only (n = 149) or CGM plus SMBG (n = 77) for insulin dosing. At 26 weeks, individuals using CGM alone for insulin dosing had spent a mean of 63% ± 13% of time in the target glucose range [70–180 mg/dL (3.9–10.0 mmol/L)], and those using CGM plus SMBG had spent 65% ± 11% of time in the target range. In both arms, the time in target range was the same as at baseline. Metrics of glucose control, such as haemoglobin A1c (HbA1c), time in hypoglycaemia and glycaemic variability also showed little change from baseline to 26 weeks in either CGM alone or CGM plus SMBG groups. Thus, REPLACE-BG concluded that the Dexcom CGM system was safe for non-adjunctive use.
There are two main randomised studies to date investigating the effects of FreeStyle Libre on glycaemic markers. The first study is IMPACT, which was a 6-month trial in participants with well-controlled type 1 diabetes, randomised to SMBG alone or to Abbott FreeStyle Libre. 7 In this study, adjunctive SMBG testing in the Libre group was reduced to a mean of 0.5 tests/day, while subjects scanned their Libre device an average of 15.1 times/day. Patients in the control group continued to perform SMBG more than 5 times/day. Time in hypoglycaemia (⩽3.9 mmol/L) was reduced in the Libre arm from 3.38 to 2.03 h/day (38% reduction compared with the control group; p < 0.0001). The second study is REPLACE, which randomised participants with type 2 diabetes on intensive insulin therapy to either FreeStyle Libre or SMBG alone for 6 months. 8 In the Libre group, adjunctive SMBG was reduced to 0.3 tests/day. Overall, time in hypoglycaemia was reduced from 1.30 to 0.59 h/day (43% reduction compared with the control group; p = 0.0006). HbA1c was similar in the two study groups but analysis of those younger than 65 years of age showed a significant improvement in HbA1c compared with the SMBG group. Both studies support the safe use of the FreeStyle Libre system for non-adjunctive use. Moreover, both IMPACT and REPLACE showed significant improvement in quality of life measures and treatment satisfaction in the Libre arm compared with SMBG, indicating that the new sensor technology improves patient well-being in general.
It is worth highlighting that the above studies included patients at relatively low risk of severe hypoglycaemia. Nonetheless, there is growing evidence to support flash and CGM for non-adjunctive use in clinical practice.
As use of glucose sensing technologies increase, and their management extends beyond the expert clinical setting, it is important to maintain scrutiny of the accuracy of each system and understand how this accuracy is assessed. Unlike SMBG testing, there are currently no internationally agreed methods or standards for ISF glucose measurement. In this context, it is important for healthcare professionals to understand the benefits and limitations of assessing accuracy in ISF glucose monitoring because this has implications for the clinical decisions that are made using these technologies.
Lack of standardised assessment for ISF glucose sensors
In the European Union (EU), SMBG systems are assessed and compared by how well they meet the minimum standards for accuracy and reliability of glucose measurement as set out in ISO 15197:2013. 9
The Clinical and Laboratory Standards Institute (CLSI) has published guidelines on Performance Metrics for Continuous Interstitial Glucose Monitoring (POCT05-A) that defines some aspects of CGM testing. These guidelines have pointed out the importance of assessing accuracy not only in steady states but also in two common scenarios: (1) during periods of rapid glucose change and (2) at different glucose concentrations, including extremes of glucose levels. 10 However, there is no universally accepted protocol to compare performance among ISF glucose sensors without head-to-head trials based on simultaneous wear.
Accuracy, precision and concordance
CGM systems that sample ISF glucose need to be assessed for accuracy at the point of measurement and also accommodate the rate of change (RoC) of glucose. Currently, there is no accepted reference method that uses ISF glucose, mainly because it is not possible to get a large enough sample of ISF sufficient for in vitro analysis with a reference technology.
Therefore, the accuracy of ISF glucose readings is currently assessed by comparison with blood glucose readings taken at the same time. Any comparison between ISF glucose and blood glucose is primarily assessing the concordance of those readings, that is, how closely they match. This concordance between the two readings is therefore dependent both on the accuracy and the precision of the ISF device (CGM or flash glucose monitor) being tested and, importantly, the reference blood glucose device being used.
In this context, it must also be understood that blood and ISF are different physiological compartments that follow different dynamics. 11 Thus, the concordance between ISF glucose and blood glucose readings is also dependent on the physiological differences between the two compartments that are being sampled, including the lag time it takes the ISF to reflect blood glucose levels.
Defining accuracy: role of mean absolute relative difference
A number of metrics have been used to characterise accuracy in this context, and one, in particular, has emerged as a routine statement of sensor ‘accuracy’ – the mean absolute relative difference (MARD) of the ISF sensor readings when compared to a series of comparator blood glucose reference samples.
MARD is straightforward to calculate and is expressed as a single percentage number; therefore, it is an attractive measure of accuracy. In this context, a lower % MARD is seen as representing better sensor performance. An emerging view is that an arbitrary MARD of 10% represents the level of accuracy required for safe use of CGM readings to make insulin dosing decisions, without the need for an adjunct SMBG blood glucose reading. 12 Given that the MARD of well-established SMBG devices ranges between 4.4% and 13.4%, 13 a cut-off of 10% does not sound unreasonable. However, it can be argued that this approach is too simplistic and requires closer scrutiny.
The MARD of 10% is based on flattening the curve in an in silico simulation of hypoglycaemia risk, 12 but this change in curve inflection may alternatively reflect limited additional benefit from MARD values lower than 10%. Furthermore, MARD is known to vary depending on a wide range of parameters, each of which can affect the ultimate MARD computation (detailed below). Also, the test device and study protocol variables are independent and will further compound each other. It should be noted that MARD varies during sensor life; for example, MARD is higher in the first day of sensor use14,15 (believed to be due to the inflammatory reaction mediated by inserting the sensor subcutaneously 16 ) and this inaccuracy may also be an issue towards the end of sensor life. 17 Finally, the number of paired glucose data points will have a major influence on MARD outcome and this is yet to be standardised.
Taken together, accuracy and concordance of separate ISF glucose sensors can only be compared if study protocols and the inbuilt variation are identical. While the list of such studies is limited, these are on the increase and available head-to-head studies to date are shown in Table 2.
Head-to-head studies comparing CGM and flash glucose monitoring devices.
MARD: mean absolute relative difference; MAD: mean absolute difference; CGM: continuous glucose monitoring.
Numbers in grid = overall % MARD in head-to-head study (number of paired measurements), except Kamecke et al. 25 which reports combined MAD/MARD. All data gathered within manufacturers specified sensor lifespan. All studies are in subjects with type 1 diabetes, n = number of participants in each study.
It should be noted that some studies report the mean absolute difference (MAD) rather than MARD. The former gives an indication of the tendency of a glucose sensor to read high or low compared with a reference, whereas the latter is the relative deviation of a sensor from a reference. MAD is more commonly used to assess accuracy at low glucose levels.
Number of paired measurements
A key measure to inform accuracy of a system is paired glucose measurement between the test system and blood taken from subjects at the same time. This gives confidence that the two systems are providing readings that do not differ significantly from one another. However, MARD only provides a reliable value when the number of data points is sufficiently large. As with all averaging systems, the more data points for comparison, the more confidence can be placed in the computed MARD.
As an example, Figure 1 shows that the degree of uncertainty for a hypothetical MARD becomes tighter as the number of reference measurements increases. However, the larger the number of reference samples, the more burden is placed on patients and study personnel. To generate large sets of paired glucose data for a sensor over a 7- or 14-day wear life is therefore not straightforward.

The impact of the number of paired points on the uncertainty of MARD: upper and lower bounds of the confidence interval with probability γ = 0.95. The constant line represents the value to which it would converge.
When considering the accuracy of CGM or flash glucose monitoring systems, the size of the data set that underpins the MARD calculation should be taken into account. To date, there are no clear guidelines as to the number of paired samples required to have confidence in the accuracy of a particular sensor.
Accuracy of the reference system
MARD is influenced by the method selected for comparing glucose data. When considering a MARD value, it is important to consider the blood glucose reference system used in the accuracy study.
All glucose reference methods have a measurement error of their own that must be taken into account when calculating the MARD. A common laboratory reference system is the Yellow Springs Instrument (YSI) glucose analyser, which provides accurate measurement of reference blood glucose samples and helps minimise MARD, because of its inherent low error.
SMBG meters make it possible to collect a larger number of paired readings based on capillary blood glucose, but SMBG meters also have a lower accuracy than a laboratory reference system. In addition, their accuracy varies widely between different manufacturers 27 and thus potentially modulate the computed MARD. However, SMBG is a ‘real-world’ comparison, used by most people with diabetes, and is the glucose measurement system that CGM would ideally replace, provided SMBG systems used have an appropriate quality control programme to ensure ongoing accuracy.
Accuracy at high RoC and extremes of glucose levels
It is well known that the rate of glucose changes in a study subject will affect MARD calculations. 28 CGM accuracy as defined by MARD is most reliable and lowest when glucose readings are stable. As the rate of glucose change increases, so does the computed MARD.
Figure 2 illustrates this for two specimen systems. As the rate of glucose falling or rising increases, so does the computed MARD. This is an important consideration when using CGM systems, as their accuracy varies as the RoC increases. When enrolling patients into studies who have high glucose variability, MARD will increase, and vice versa.

MARD as defined by rate of change category (adapted from Pleus et al. 28 ). At low rate of change, the accuracy of system A and system B does not look that different. However, with increasing rate of change, the superior accuracy of system B over system A becomes evident.
An important consideration is the accuracy of ISF glucose readings at low glucose concentrations. Estimates of MARD are known to be subject to larger errors as glucose falls towards the hypoglycaemic range. 29 For example, in the study by Aberer et al., 24 overall MARD of FreeStyle Libre was 13.2% with relatively little change in the hypoglycaemic (<3.9 mmol/L) and hyperglycaemic (>10 mmol/L) range (MARD of 14.6% and 10.1%, respectively). In contrast, Dexcom G4 Platinum had an MARD of 16.8%, with a larger difference comparing hypoglycaemic and hyperglycaemic range (MARD of 23.8% and 11.6%, respectively). 24 Another head-to-head study found even larger differences comparing MARD in hypoglycaemic and hyperglycaemic range (21.2% and 11.6% for Dexcom G4 Platinum and 36.5% and 18% for Enlite, respectively). 22
A low ISF glucose reading below 3.9 mmol/L can prompt corrective actions that may be unnecessary if actual blood glucose, as measured by SMBG, is significantly higher. For instance, a user may develop hypoglycaemia and take corrective action. Due to the time lag between blood glucose and ISF glucose, if the user continues to rely only on ISF glucose readings, there may be a lag in the rise of ISF over blood glucose, resulting in further and unnecessary treatment of hypoglycaemia. Similarly, experienced users may become less concerned with ISF low glucose readings than they would be with SMBG readings and take no immediate action. Each of these scenarios potentially creates unwanted risks.
However, the advantage of ISF glucose sensor devices is that they typically provide trend arrow support alongside the current glucose reading, indicating the direction and RoC, to assist decision-making. In the IMPACT and REPLACE studies using the FreeStyle Libre system in people with type 1 or type 2 diabetes on insulin,7,8 ISF glucose sensor data helped users significantly reduce the amount of time that their glucose fell below 3.9 mmol/L, compared to subjects using SMBG. This reduction in hypoglycaemia was achieved without impacting HbA1c or daily insulin doses.
Together, these outcomes suggest that any concerns about concordance of ISF sensor readings at low glucose levels are offset by the utility of having both a glucose reading and a trend arrow displayed on the reader, to support patient self-management and avoidance of hypoglycaemia.
Expert opinion is clear that definitions of hypoglycaemia should not differ, depending on the glucose sensing technology. However, changes in the accuracy of ISF sensors at low glucose must be clearly disclosed for each device, as part of overall reporting bias. Main variables that affect MARD levels are detailed in Table 3.
Variables that contribute to differences in concordance/MARD.
BG: blood glucose; SMBG: self-monitoring of blood glucose; MARD: mean absolute relative difference; MAD: mean absolute difference.
Additional measures of accuracy and precision: Clarke and consensus error grids
Analytical performance of ISF sensor devices is divided into two important areas: accuracy and precision. Accuracy refers to how close the test results are to a reference or standard, and precision refers to the consistency of the system, that is, how close the test results are to each other. These concepts are illustrated in Figure 3.

Precision and accuracy – what is the difference?
MARD is a reported metric for average accuracy but does not reflect any non-linear performance over the complete glucose range, nor does it provide any information about the precision of the system on which it reports.
To help better assessment of ISF sensing technologies, the Clarke error grid was devloped to compare readings of a particular system with reference samples (Figure 4(a)). 30 Readings in zone A represent high accuracy and those in zone B are acceptable accuracy, with the rest of zones showing reduced and clinically questionable accuracy. The Clarke error grid was further refined, and the more streamlined consensus error grid (CEG) was developed that can be generated in combination with MARD for additional accuracy reporting. 31 The CEG compares glucose readings of the test device to reference blood glucose readings and plots them on a grid, which is divided into zones A–E. This aids in visualising accuracy as well as clinical impact of any errors of the system. Readings that fall into zones A and B are accepted for making clinical decisions.

Error grid analysis. (a) Clarke error grid analysis. Zone A: clinically accurate values within 20% of the reference sample; zone B: values outside 20% of the reference sample but would not lead to inappropriate treatment; zone C: values that would lead to overcorrection of glucose levels; zone D: dangerous failure to detect and treat high or low glucose; zone E: values that could lead to treatment contradictory to that needed. Adapted from Clarke. 30 (b) Consensus error grid analysis, comparing the FreeStyle Libre sensor readings with capillary blood glucose reference values collected using the FreeStyle Precision BG meter built into the FreeStyle Libre reader. 32
In the example shown in Figure 4(b), 86.7% of the results are in the clinically accurate zone A of the CEG, and 99.7% of sensor results were in the clinically acceptable zones A and B of the CEG when compared to sensor capillary blood glucose result. 32 The overall MARD in this system was 11.4% for sensor results when compared to capillary blood glucose reference samples.
The importance of looking both at MARD and CEG analysis is illustrated in Figure 5. In these comparisons of 2000 simulated paired measurements between test and reference samples, the MARD of 8.0% in Simulation 1 is achieved with a lower % of readings in zones A and B, compared to Simulation 2 with an MARD of 12.0% and 100.0% of readings in the clinically acceptable zones.

Comparisons of simulated test and reference glucose samples. The MARD and CEG plots of 2000 paired readings can be modelled to illustrate that different methods of analysis may generate different assessments of ‘accuracy’.
Thus, using both these means of quantifying accuracy improves confidence in the efficacy and safety of ISF glucose sensing systems for making treatment decisions.
Areas of caution: when is adjunct blood glucose testing needed?
CGM and flash glucose monitoring systems are now being approved for insulin dosing based on a current glucose reading and RoC arrows, without the need for adjunct SMBG confirmation (see Table 1 for exceptions). This is testimony to the improved accuracy of ISF glucose sensing technologies.
In general, these systems bring an acknowledged benefit for users. They provide quick and discrete feedback on the current level and the direction of travel for glucose, and reduce the need for the inconvenient, and frequently described as anti-social, finger prick testing. This allows users to more confidently live their daily lives and make more informed decisions about diet, exercise, work and study, as well as the timing and dose of insulin injections.
However, in addition to the manufacturers’ own guidance (Table 1) expert opinion recognises that prudent clinical use of CGM and flash glucose monitoring systems should still involve use of SMBG capillary blood glucose testing as summarised in Table 4.
The continued need for adjunct blood glucose testing.
ISF: interstitial fluid; CGM: continuous glucose monitoring; EU: European Union; UK: United Kingdom.
Conclusion and future directions
As real-time glucose monitoring systems become widespread in diabetes care, there is a tendency to focus on MARD as a defining metric for accuracy of a particular system. However, the consensus among a panel of diabetes experts is that, while MARD is useful, we should understand its drawbacks and relying solely on this metric for accuracy is not enough and more robust criteria are required.
The MARD of a system will vary depending on a wide range of parameters, each of which can affect the ultimate computation. Key factors include overall study design, choice of the reference system and number of paired readings that are analysed. It should be remembered that MARD is a metric for the concordance of glucose values from two different physiological compartments, measured with different systems, that is, ISF glucose and blood glucose. Furthermore, there is a dependence on whether the latter is capillary blood, venous blood (or venous plasma), and also on the reference method, which could vary from a handheld patient meter to a laboratory analyser.
The performance (‘accuracy’) of a CGM or flash glucose monitoring device is just one contributor to the concordance. When comparing devices, the only way to minimise or eliminate factors that contribute to non-concordance of each system is to conduct a head-to-head comparison when different ISF devices are worn simultaneously by the same subject and an appropriate, and identical, reference method is used.
In the absence of agreed standardised study protocols, comparison of MARD data obtained from various devices under different study conditions should be avoided, as such direct comparisons may lead to misleading conclusions. The main consensus points on accuracy assessment of CGM and flash glucose monitoring systems can be found in Table 5.
Consensus on assessments of accuracy of CGM and flash glucose monitoring systems.
BG: blood glucose; SMBG: self-monitoring of blood glucose; MARD: mean absolute relative difference; MAD: mean absolute difference.
When assessing the accuracy of any glucose sensing technology, MARD needs to be combined with additional objective measurements. An established tool in this regard is CEG analysis. This reflects not only the mean accuracy but also the utility of the system for making clinical decisions.
Footnotes
Declaration of conflicting interests
RAA declares: Institutional Research Grants from; Abbott Diabetes Care, Bayer, Eli Lilly, NovoNordisk, Roche, Takeda. Honoraria/education support and consultancy fees from; Abbott Diabetes Care, AstraZeneca, Bayer, Boehringer Ingelheim, Bristol-Myers Squibb, Eli Lilly, Glaxo SmithKline, Merch Sharp & Dohme, NovoNordisk, Takeda. MHC declares honoraria for advisory work for Abbott Diabetes Care. PJ declares personal fees from Abbott Diabetes Care outside the submitted work. LL declares speaker honoraria from Minimed Medtronic, Animas, Roche, Sanofi, Insulet and Novo Nordisk, advisory panel activities for: Abbott Diabetes Care, Roche, Sanofi, Minimed Medtronic, Animas and Novo Nordisk, grants to attend educational meetings from Sanofi, Novo Nordisk and Takeda. GR declares personal fees from Abbott Diabetes Care outside the submitted work. EGW declares speaker honoraria from Abbott Diabetes Care, Diasend, Dexcom, Eli Lilly, Minimed Medtronic, Novo Nordisk, Sanofi Aventis and has served on advisory panels for Abbott Diabetes Care, Eli Lilly, Sanofi Aventis, grants to attend educational meetings from Boehringer Ingelheim, Diasend, Novo Nordisk, Roche and Sanofi Aventis.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
