Analysis of “Seven Year Surveillance of the Clinical Performance of a Blood Glucose Test-Strip Product”

Abstract

The article titled “Seven Year Surveillance of the Clinical Performance of a Blood Glucose Test-Strip Product” by Setford and coworkers in this issue of Journal of Diabetes Science and Technology is an impressive study showing that over 7 years in three clinics, using multiple reagent lots, a total of 73 600 samples met the ISO 15197 2015 standard with no results in the D or E zones of a Parkes glucose meter error grid. Three requirements are suggested for a clinically acceptable glucose meter. The authors provide strong evidence for meeting two requirements but fail to provide summarized data about the number of nonnumeric results. Finally, the authors overstate some results, called “spin” by some which is not necessary. The superb results should stand on their own.

Keywords

ISO 15197:2015 self-monitoring blood glucose devices glucose meter error grid FMEA

Recently, it was shown that only 48.3% of FDA cleared glucose meters met the ISO 15197 2013 glucose meter standard when tested with routine reagent strips (e.g., not part of a release for sale study).¹ This in part led to the Diabetes Technology Society creation of a surveillance error grid protocol.² The study by Setford and coworkers³ which started before the surveillance error grid protocol recommendation is a manufacturer’s study to evaluate glucose meter performance with released reagent strips. The study was conducted over 7 years in 3 clinics with a total of 73 600 samples, which is impressive. The study showed that most results were in the A zone of a Parkes error grid, a few in the B zone and only 6 of 73 600 in the C zone. There were no D or E zone results. (It is not clear why the authors didn’t process their data through the surveillance error grid protocol.) Thus, the results are also impressive. Lot to lot reagent error is always a concern because after release for sale, the scrutiny of the company on the new product diminishes. And over time, there can be changes in vendors’ raw materials, vendors’ manufacturing practices, changes in the company’s own manufacturing practices, new personnel, and other reasons why reagents might not produce the same values.

Yet, one can still ask what does such a study show about glucose meter performance required for clinical use. There are several requirements for such performance.

The first follows the ISO 15197 standard (currently 2015) to require that a high percentage of results are clinically acceptable. In addition to meeting limits for 95% of the results, 99% must be within the A and B zones of a glucose error grid. This study clearly shows that this performance has been met.

In addition, there should be no results that have the potential to cause serious harm to patients; namely, results in the D or E region of a glucose meter error grid. Note that it is not sufficient to meet the ISO 15197 standard, which allows up to 1% of results to be in zones C and higher. Consider what that means for a person who tests around 3 times per day, or over 1000 tests per year. Allowing 1% of results to be in zones C or higher translates to about one potentially dangerous result each month! The present study’s result of 0 values in the D and E zones is superb but if one constructs an upper 95% confidence interval for the proportion of 0 out of 73 600 results,⁴ the percentage of results in the D and E zones can be up to 0.005%, hence there is still not proof that no dangerous results are possible. Other internal studies such as interference testing and FMEA (failure mode effects analysis) would complement the performance reported but such studies are typically not made public.

Finally, one would like a very low percentage of times that no result is provided (the authors call these strip errors). The authors noted that strip errors occur but mysteriously fail to provide their frequency. Thus, “Non-numerical values (strip errors) were recorded and reviewed on a batch-by-batch basis, although an amalgamated record of all data was not kept.” Note that user errors were minimized by having trained technicians perform the testing.

An additional worthwhile analysis would have been to analyze average bias by reagent lot. Whereas the bias across all lots was minimal, this does not preclude isolated lots from having significant bias. If a patient had a reasonably long supply of a particular lot that read normal when the true results were high, a persistently elevated glucose would potentially contribute to diabetes complications.⁵

Recently, McGrath et al noted that many articles about product performance contain “spin” or over-interpretation of results and overly optimistic statements or implications of clinical usefulness.⁶ Unfortunately, the Setford paper contains some cases of this.

For example, it is stated: “The clinic performance discussed above is a measure of system performance and thus include all sources of potential inaccuracy, such as system-related factors, patient specific factors and environmental conditions.” This is clearly not true. “System performance” includes user error, but this study has minimized user error by using trained technicians. The reason given for using technicians is misleading. Whereas it is true that many user errors will involve non-meter-related errors, there could be a user error, system interaction that results in an error, such as a short sample and a failure of the system to correctly handle the short sample. And all sources of potential inaccuracy have not been included in the study as stated by the authors. Rather, sources of potential inaccuracy available in the study have been sampled. There is a huge difference between these two statements. For the study to have included all sources of potential inaccuracy just from the standpoint of drug interference would mean that the study included the complete formulary of drugs.

Also, the mountain plots are mysteriously truncated. There is no reason for this—the authors say it is for presentation purposes. Mountain plots for absolute errors and percentage errors should be shown.

Notwithstanding some minor improvements that could be made, the scope of the study and the results are impressive.

Footnotes

Abbreviations

FMEA, failure mode effects analysis; ISO, International Organization for Standardization.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Klonoff

Prahalad

Performance of cleared blood glucose monitors. J Diabetes Sci Technol. 2015;9:895-910.

Klonoff

Lias

Vigersky

et al . The surveillance error grid. J Diabetes Sci Technol. 2014;8:658-672.

Setford

Grady

Phillips

et al . Seven year surveillance of the clinical performance of a blood glucose test-strip product [published online ahead of print April 1, 2017]. J Diabetes Sci Technol.

Hahn

Meeker

WQ.

Statistical Intervals: A Guide for Practitioners. New York, NY: John Wiley; 1991.

Krouwer

Cembrowsk

GS.

The chronic injury glucose error grid. A tool to reduce diabetes complications. J Diabetes Sci Technol. 2015;9:149-152.

McGrath

McInnes

MDF

van Es

Leeﬂang

MMG

Korevaar

Bossuyt

PMM

. Overinterpretation of research findings: evidence of “spin” in systematic reviews of diagnostic accuracy studies. Clin Chem. 2017;63:1353-1362.