Abstract
Many studies determine the performance of blood glucose monitoring (BG) systems. Correct evaluation is, however, complex, and apparent contradiction of results creates confusion. This study aimed to provide an overview of frequently made errors and to develop easy-to-use checklists to verify the quality of such studies. Building on the work from Mahoney and Ellison and subsequent re-evaluation, study designs of accuracy studies were assessed, and best practice and internationally accepted norms were determined. Key issues were collated, and two simplified checklists were developed: one for the assessment of analytical accuracy studies and a second for guidance with studies assessing the influence of interferences. The checklists have been used in a feasibility study with 20 representative studies selected from a literature search between 2007 and 2012. This check revealed that limitations in the designs and methods of studies assessing the performance of BG systems are common. The use of the accuracy checklist with the 20 representative studies showed that only 20% were in agreement with most of the issues deemed important and that 40% showed clear nonconcordance with ISO 15197. The use of the interference checklist showed that only 50% of the publications were in good agreement with the quality checks. In agreement with previous studies, which concluded many evaluations are performed poorly and present questionable conclusions, the use of these checklists demonstrated that few publications adhered to international guidelines and recommendations. Taking this into consideration, it becomes obvious that the publications must be examined in more detail to establish their quality and the validity of conclusions drawn.
Rationale and Objective

Number of publications assessing the performance of several blood glucose systems between 2004 and 2011.
Building on the work from Mahoney and Ellison 1,2 and subsequent re-evaluation, 3 study designs of accuracy studies were assessed, and best practice and internationally accepted norms were determined. The evaluation of BG system accuracy and the influence of interferences is complex, and testing must be carefully designed and performed. 4 Clinicians and technicians performing such studies need to be aware of numerous factors in order to obtain data free of protocol-bias or patient-specific bias. 5 A wide range of variables must be taken into account to ensure any inaccuracy of results is due to the BG system and not due to other factors such as the reference method, variations in the specimens compared, experimental artifact, random patient interferences, or using the meter outside the manufacturer's claims in the instructions for use.
Studies on BG system accuracy and the influence of interferences divide into several categories. Accuracy can be determined under defined conditions in which the study incorporates exclusion criteria determined by limitations quoted by the manufacturer. Such a study can correctly reflect the accuracy possible and claimed by the manufacturer for a BG system. Accuracy studies may, however, be performed under what the investigators consider more “real life” or “routine” conditions. Here, precise conditions are not adhered to or the possibility exists to include data that should be excluded by reference to the manufacturer's limitations and instructions for use. Not adhering to the manufacturer's instructions for use restricts the validity of conclusions about accuracy and the influence of interferences made in such studies.
Studies can also be undertaken to determine influences of potential specific BG system-dependent interferences, such as hematocrit, temperature, reducing substances, etc. Although sharing many common guidelines with accuracy evaluations, to correctly identify interferents and their degree of influence on accuracy, interference studies require specific assessment against ISO 15197, 6 ISO/DIS 15197, 7 and CLSI recommendations for interference testing. 8
Documents that list best practice quality guidelines or recommendations for undertaking and reporting appropriate evaluations of BG system accuracy are available from several sources
1
–3
(see Supplementary Table S1; Supplementary Data are available online at
Although these protocol guidelines and recommendations are readily available, investigators rarely incorporate consensus standards or quality guidelines into BG accuracy evaluations, and a standardized approach has so far not been widely adopted. 9 Strict adherence is needed by using an appropriate reference method, defined protocols, and operating BG systems in accordance with manufacturer's instructions for use. BG system accuracy studies also require a protocol that incorporates the use of fresh human whole blood because a standard reference material for whole blood is not available. 1,2,10
Four potential sources of error must be considered in the evaluation of a BG system
11
: 1. analytical imprecision—controlled by testing product that conforms to specifications (i.e., supplied, handled, and stored according to manufacturer's instructions and so potentially capable of achieving the performance claimed) 2. bias—controlled by testing product that conforms to specifications 3. protocol-specific bias—controlled by adherence to careful study design 4. random patient interferences—controlled by inclusion and exclusion criteria
Although a large number of BG system accuracy studies from the scientific community have been published, the general consensus when compared against recommendations is that most evaluations are performed poorly and present questionable conclusions. 1,2 It seems that publications rarely adhere to all the points outlined in guidelines and that many studies do not follow published recommendations for study design and methodology or do not appear to address many of the variables that can adversely impact validity. To establish if an accuracy study is valid, a wide range of details must be checked, an issue complicated by publication word count restrictions that may necessitate additional protocol information being made available elsewhere.
In 2007, Mahoney and Ellison 1 published a comprehensive evaluation of published BG system accuracy studies (between 2002 and 2006). The study showed that the average BG system accuracy evaluation used less than 50% of the combined CLSI and Standards for the Reporting of Diagnostic Accuracy Studies (STARD) recommendations and that the overall quality of reports was low. Compliance with these recommendations varied widely (median, 53%; range, 21–84%), and only one study out of 52 followed CLSI recommendations for checking reference test results. Fewer than half (42%) contained STARD-recommended statements regarding how and when comparative measurements were performed. The low rate of compliance to recommendations suggests many researchers do not follow published recommendations for study design, methodology, and reporting. All this may have affected negatively quality and conclusions.
The analysis of BG system accuracy publications 1 highlighted particular deficiencies with (a) reference methods, (b) ensuring comparison of appropriate samples and the importance of timings, and (c) the acceptance criteria used. Valuable publications emphasized BG system training with performing a thorough reference method evaluation, tested BG system and reference results in duplicate, and emphasized control of elapsed time and glycolysis.
Moreover, Mahoney and Ellison 2 proposed a checklist combining key elements from different guidance and recommendations that outlined a standardized approach to BG system evaluations, along with associated references applicable to international standards and consensus recommendations. This provides a basis for protocols that are (a) evidence based, (b) scientifically defensible, and (c) sufficiently descriptive to allow for test and result reproducibility.
Building on the work of Mahoney and Ellison 1,2 and considering the increasing number of publications and the apparent contradiction of results that create confusion and doubts about the reliability of the results of such studies, it is important to provide an assessment overview of study designs, to identify and review frequently made errors, and to develop simple checklists of basic requirements to verify the quality of BG system accuracy performance and comparison studies (either with regard to accuracy or interferences).
Materials and Methods
Materials
Literature outlining guidelines, recommendations, rationale, or best practice for undertaking high-quality BG system evaluations was identified.
A “key words” literature search identified 82 BG system performance articles between 2007 and 2012. For this feasibility study, from these, 20 publications 12 –31 (see Supplementary Table S3) were subjectively selected as being representative for checklist analysis. Articles chosen incorporated those from different sites/countries, included different types of publications ranging from full articles to poster abstracts, involved healthcare professional and/or patient operators, and covered different aspects of system performance. Publications were classified as studies primarily dealing with accuracy, interferences, or both, and study designs were evaluated to identify any potential issues of concern using the appropriate proposed checklist. When a single publication addressed both accuracy and interference, both checklists were used.
Methods
International guidelines and recommendations, the standardized approach to assessing BG system performance proposed by Mahoney and Ellison 1,2 and subsequent re-evaluation, 3 and other BG evaluation publications were collated. Best practice, study designs, and alignment with internationally accepted norms, practice, and guidelines were determined. This provided a basis for a comprehensive summary of important factors (see Supplementary Table S4). Key issues were collated and used to develop two simplified checklists (A and B [Tables 1 and 2, respectively]) to aid design quality assessment of accuracy and interference publications, respectively.
BG, blood glucose.
As a feasibility check the checklists were used to examine the 20 representative studies. Fifteen publications, of which four 16,18,21,22 used patients as operators, were examined using the accuracy checklist (10 studies assessing accuracy of BG systems and five studies assessing accuracy of BG systems and influence of interferences), and 10 were examined using the interference checklist (five studies assessing the influence of interferences and five studies assessing accuracy of BG systems and influence of interferences). Although a subjective process, and publications may omit provision of full details, the checklists enabled each important point to be examined and categorized as being in agreement (yes), partial agreement (partial), or not in agreement (no) with guidelines.
Results
Checklists for assessing the quality of BG system accuracy publications
Definitive checking of the validity of an accuracy study can be an extensive process. Many aspects of a BG system performance evaluation need to be considered when assessing the quality of the study design and the validity of findings.
Although differing from international guidelines in terms of layout and order of important features, the integrated information compiled in Supplementary Table S4 provides a comprehensive tool to establish the quality of published BG system evaluations. This list provides details of key issues of BG system evaluation study design to allow comprehensive examination of a publication's agreement with consensus guidelines, the quality of its study design, and the validity of results. It also provides a requirement list for a good accuracy comparison study design. Supplementary Table S4 represents a modified and extended version of the original checklist. 2 It includes additional data from ISO 15197 and broadens examination to include aspects such as independence/impartiality and the general applicability of findings by observing the number of different batches of strips used.
A general indication of important deficiencies in study design can generally be found from looking at a few key points that are summarized as the two checklists. The two short simplified checklists (A [Table 1] and B [Table 2]) cover the major aspects readers need to consider when examining BG system accuracy and interference evaluation publications. Checklists include examining the details supplied on the reference method, the specimens used for comparison, details of the protocol, and display/acceptance criteria for results.
Evaluation of study designs using the accuracy checklist
Application of the accuracy checklist showed that only 20% (three of 15) of publications were of clear high quality and in agreement with all, or disagreement with only one, of the issues deemed important; 47% (seven of 15) were not in agreement with four or more of the quality checks. Only one publication 15 showed full concordance with ISO 15197. Key areas of nonagreement included more than 47% (seven of 15) using an inappropriate reference method for specific BG systems and 67% (10 of 15) not demonstrating an appropriate spread and range of results, ideally with at least defined percentages within specified concentration ranges. Sixty percent (nine of 15) were considered to not provide full study details. Thirty-three percent (five of 15) did not compare “like with like” samples, and 80% (12 of 15) were considered to use only partially appropriate acceptance criteria. All results of limitations in study designs and their frequency identified using the accuracy checklist are summarized in Supplementary Table S5.
Evaluation of study design using the interference checklist
Use of the interference checklist showed that 50% (five of 10) of publications incorporating interference studies were in good agreement with the quality checks. However, only one publication 27 was considered to demonstrate clear concordance with recommendations of CLSI EP7 and ISO/DIS 15197:2011. Key concerns were that 40% (four of 10) inappropriately presented results and 30% (three of 10) presented no information on BG system performance.
Results of 60% (six of 10) studies were considered to partially interpret results appropriately. Only 20% (two of 10) were considered to provide full study details. All results of limitations in study designs and their frequency identified are summarized in Supplementary Table S6.
Discussion
Evaluation of accuracy and performance of BG systems is complex, and performing studies correctly in accordance with guidelines assumes greater importance as accuracy demands increase and tighter standards are introduced. In agreement with previous studies that concluded most evaluations are performed poorly and present questionable conclusions, 1,2 preliminary use of the checklists demonstrated that only few publications adhered to international guidelines and recommendations for appropriate study design or fully address many of the variables that can adversely impact on the validity of conclusions. The easy-to-use checklists help raise awareness of important issues involved, identify limitations in study design, and will aid readers drawing clear and valid conclusions from the increasing number of publications in the BG system area.
Readers of BG system accuracy publications need to be aware of limitations in study design and protocols that can lead to differences in results inappropriately being attributed to BG system inaccuracy. The use of an appropriate reference method remains of paramount importance in ensuring a correct comparison. The reference method chosen for a specific BG system should be the one stated by the manufacturer to avoid for example the negative biases of approximately 3–8% reported between the Yellow Springs Instrument (Yellow Springs, OH) glucose oxidase- and hexokinase-based reference. 21,32 This is commonly not done or acknowledged, as is provision of details of performing reference tests in duplicate on laboratory systems of known total error and traceability. An appropriate spread of results from patient samples should also be demonstrated. Not comparing “like with like” samples and acknowledging potential differences between capillary and venous blood samples and ensuring correct minimal timings between specimen collection and analysis also remain common limitations that could potentially lead to differences of up to 30%. Results also require analysis using appropriate recognized acceptance criteria, statistical methods, 33 and correct presentation.
Conclusions
It must be concluded that it is not sufficient to read just the conclusion of BG system accuracy publications. Each publication must be examined more in detail to establish any variation from recommended study designs and to establish their quality and the validity of conclusions drawn. The use of the checklists proposed in this study provide aids to interpreting studies on the performance of BG systems allowing selection of valid, reliable, transparent, and comparable results and conclusions.
Future studies extending examination to additional publications and adaptation of checklists in light of any modifications to international standards are necessary to confirm the generality and validity of findings.
Footnotes
Acknowledgments
This study was funded by an educational grant from Roche Diabetes Care, Mannheim, Germany.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
