Abstract
Background:
Diabetes alert dogs (DADs) are growing in popularity as an alternative method of glucose monitoring for individuals with type 1 diabetes (T1D). Only a few empirical studies have assessed DAD accuracy, with inconsistent results. The present study examined DAD accuracy and variability in performance in real-world conditions using a convenience sample of owner-report diaries.
Method:
Eighteen DAD owners (44.4% female; 77.8% youth) with T1D completed diaries of DAD alerts during the first year after placement. Diary entries included daily BG readings and DAD alerts. For each DAD, percentage hits (alert with BG ≤ 5.0 or ≥ 11.1 mmol/L; ≤90 or ≥200 mg/dl), percentage misses (no alert with BG out of range), and percentage false alarms (alert with BG in range) were computed. Sensitivity, specificity, positive likelihood ratio (PLR), and true positive rates were also calculated.
Results:
Overall comparison of DAD Hits to Misses yielded significantly more Hits for both low and high BG. Total sensitivity was 57.0%, with increased sensitivity to low BG (59.2%) compared to high BG (56.1%). Total specificity was 49.3% and PLR = 1.12. However, high variability in accuracy was observed across DADs, with low BG sensitivity ranging from 33% to 100%. Number of DADs achieving ≥ 60%, 65% and 70% true positive rates was 71%, 50% and 44%, respectively.
Conclusions:
DADs may be able to detect out-of-range BG, but variability across DADs is evident. Larger trials are needed to further assess DAD accuracy and to identify factors influencing the complexity of DAD accuracy in BG detection.
Self-monitoring of blood glucose (SMBG) is critical in many aspects of diabetes management, including the detection of hypoglycemia, calculating insulin bolus doses, and tracking overall glucose control. For daily SMBG, most individuals with type 1 diabetes (T1D) use glucose meters, with a growing minority using continuous glucose monitoring (CGM) devices. 1 Although beneficial for diabetes management, these technologies are associated with a degree of burden, including finger sticks, sensor insertion, equipment care and cost, and time commitment. In addition, parents of children with T1D bear the emotional burden of worrying about hypoglycemic episodes, especially during the night when they are less able to monitor their child’s BG. 2 For these reasons, many individuals with T1D and parents of children with T1D are turning to diabetes alert dogs (DADs) as a less burdensome and intrusive method of BG monitoring. Individual testimonials in popular media describe DADs as highly effective at detecting both hypoglycemia and hyperglycemia. These anecdotal stories are almost universally positive, with reports that DADs accurately detect hypoglycemia at least 90% of the time and, in some instances, are more accurate than BG meters and other current diabetes technology.3,4
Only a few scientific studies have attempted to test DAD accuracy and there are currently no industry standards for DAD training or performance. Our group conducted a survey of 36 DAD owners, 13 adults with T1D and 23 parents of children with T1D. 5 These DAD owners reported varying levels of accuracy, with 36% reporting DAD alerts to every occurrence of hypoglycemia over the past month (100% accuracy) while another 36% reported that hypoglycemia occurred without an alert at least once each week. In terms of clinical and psychosocial benefits, respondents indicated improvements in glycosylated hemoglobin levels, frequency of severe hypoglycemia, fear of hypoglycemia, and quality of life after DAD placement. Two recent studies tested DAD accuracy under highly controlled experimental conditions, using perspiration samples taken from adults with T1D when BG levels were normal versus hypoglycemic. However, these studies produced contradictory results, with one concluding that trained DADs accurately discriminated hypoglycemia, 6 and the other finding that DADs were not accurate. 7 Given the clinical implications for people with diabetes, more research is needed to understand DADs’ ability to monitor BG levels and the possibility raised by our owner survey that accuracy across individual DADs may vary.
In addition to controlled experimental trials in laboratory settings, it is also important to test accuracy in natural living conditions where individuals rely on their DADs to alert in response to BG extremes. Under real-world conditions, the use of DADs is a far more complex interactive process. Efficacy not only depends on the ability of the DAD to detect and alert to BG extremes, but also on the ability of the owner to accurately recognize the alert behavior. In another recent study, 8 DAD owners used “blinded” CGM for 1 week while recording DAD alerts. 8 Results showed that DADs detected 36% of BG events < 70 mg/dl, with minimal predictive value of DAD alerts signaling hypoglycemia. However, because of the short duration of this study of only 8 individuals, there were only a total of 45 hypoglycemic episodes to analyze, with an average of 5.6 episodes per participant. Moreover, this study did not have enough data to address the question of whether DAD performance varied across individual dogs.
The purpose of the present study was to examine DAD accuracy and variability in real-world conditions using a convenience sample of diaries kept by DAD owners following placement of a dog in the home. In these diaries, DAD owners recorded daily BG meter readings along with the occurrence of DAD alerts, which allowed a comparison of the concordance of extreme BG readings and alert behaviors. The hypothesis of the study was that signal detection analysis would show that DAD alerts were generally accurate but, based on our previous survey, that accuracy would vary across individual DADs.
Methods
Participants
Participants were adult DAD owners with T1D and parents of children with T1D. The study tested DADs from one organization to control for some of the numerous variables that can potentially affect DAD performance, such as dog breed and training procedures across organizations. In this study all DADs were Labrador Retrievers who had been bred, raised, trained, and placed by the same organization. This organization utilizes a training and placement procedure in which puppies undergo several months of glucose detection training at their facility. At approximately 4-5 months of age, DADs are placed with their owners, where training continues with support and periodic home visits by training staff.
As part of the training organization’s quality control practices, DAD owners completed daily diaries for a period of time during the first year after home placement. These diaries required owners to record the date, time of day, all daily SMBG readings, occurrence of DAD alerts (yes/no), and a description of the alert behaviors. Owners were encouraged to record in diaries for several weeks or longer. The data used in the study were derived from recently completed diaries that had been returned to the training organization. Before sending diaries to the laboratory, the training organization deidentified all data; therefore, the only information available for participants was age and gender. Information on individual DAD characteristics, including current age and time since home placement, was available for 17 of the 18 participants in the final sample.
Diaries were submitted for 27 DAD owners, four of whom were excluded due to too few entries (< 30 total) or low BG readings (< 4 entries with BG ≤ 5.0 mmol/L or 90 mg/dl). This reduced the likelihood of over- or underestimating DAD accuracy because of inadequate sample size (eg, a single low would yield a score of either 0% or 100% accuracy). Of the remaining 23 participants, four were excluded from analysis due to incomplete diaries (ie, only recording entries when the DAD alerted instead of at each BG check). One additional participant was excluded due to recording diary data in an idiosyncratic manner that made it infeasible to compare to other data. The final sample consisted of 18 DAD owners (44.4% female; 77.8% children), with adults ranging in age from 40 to 47 years (mean = 44.3 ± 4.4, median = 44) and children ranging in age from 2 to 15 years (mean = 9.1 ± 4.9, median = 8.5). DAD age ranged from 113 to 1437 days (mean = 237.2 ± 318.8, median = 134). Length of DAD placement in the homes ranged from 1 to 328 days (mean = 51.0 ± 83.5, median = 22). With the exception of two DADs who had been placed for approximately 6 and 12 months, the DAD had been in the home less than 3 months.
Diary Data
Number of diary entries ranged from 34 to 505 (mean = 167.0 ± 146.3, median = 108) collected over a time period that ranged from 5 to 134 days (mean = 39.0 ± 35.0, median = 27). Alert behaviors were categorized, which yielded 26 single-word descriptors (eg, “paw” or “barked”; see Table 1 for a complete list). In cases where DADs performed two behaviors to alert to the same BG reading, both were recorded. Number of different DAD alert behaviors reported ranged from 3 to 20 per participant (mean = 10.6 ± 5.2, median = 9.5).
DAD Alert Behaviors, Frequencies, and True Positive Rates.
Data Analysis
Analysis of DAD accuracy presents challenges because alerts are dichotomous data (alert/no alert) and, in contrast, BG readings are continuous data ranging from 1.1 to 33.3 mmol/L (20 to 600 mg/dl). For this reason, an approach based on signal detection theory was used to categorize diary entries.9,10 First, a target BG range was set from 5.0 mmol/L to 11.1 mmol/L (90 mg/dl to 200 mg/dl). A wider hypoglycemic range was selected based on clinical considerations, including recommendations that individuals with diabetes not drive without self-treatment when BG is ≤ 5.0 mmol/L (90 mg/dl), 11 as well as ensuring that alerts occurring in response to falling BG were captured.
Entries were then categorized into one of four accuracy classifications determined by BG value (within target range or not) and occurrence of DAD alert (yes/no):
Hits (BG outside target range; DAD alerted). To account for reports that DADs often signal owners ahead of BG extremes, entries with alerts ≤ 20 minutes before an out-of-range BG were also categorized as hits.
Misses (BG outside of target range; no DAD alert ≤ 20 minutes prior to entry).
False alarms (BG within target range; DAD alerted with no BG excursions ≤ 20 minutes after the entry).
Correct rejection (BG within target range; no DAD alert).
Based on these classifications, we calculated four different measures to summarize DAD performance:
Overall sensitivity was computed as the percentage of out-of-range BG entries the DAD alerted to (hits) compared to the overall number of out-of-range BGs (hits + misses). Sensitivity was also calculated separately for low and high BG excursions.
Overall specificity was computed as the percentage of in-target range BGs the DAD did not alert to (correct rejections) compared to the overall number of in-range BGs (correct rejections + false alarms).
True positive rate (also known as “positive predictive value”) is computed as the percentage of Hits to the overall number of DAD alerts (hits + false alarms).
Overall accuracy is computed as the percentage of accurate DAD responses (hits + correct rejections) to the total number of categorized entries (hits + correct rejections + false alarms + misses).
Positive likelihood ratio (PLR) is computed as the ratio of sensitivity to (1 – specificity). Values > 1.00 signify that DAD alerts are more likely associated with out of range BG than in-target range BG.
Results
Table 2 lists the frequencies and proportions of hits, misses, false alarms, and correct rejections for individual DADs and the total sample. Across DADs, hits was the most represented category for the sample (mean = 38.2%), but this varied between individual DADs, ranging from 20% to 59%. Table 3 shows summary accuracy measures for both individual DADs and the total sample. Collapsing across participants, DADs accurately categorized more than half of BG readings, with a total overall accuracy of 54.4%. Total sensitivity was 57.0%, with DADs appearing to be more sensitive to low BG values (59.2%) than high BG values (56.1%). Total specificity was 49.3%, total true positive rate was 69.1%, and overall PLR was 1.12. This indicates that while the frequency of false alarms outnumbered correct rejections, the majority of DAD alerts corresponded to out-of-range glucose values.
Frequency of DAD Signal Detection Categories.
DAD Accuracy measures.
These overall results, however, should be interpreted in the context of highly variable individual DAD performance. Sensitivity for all out-of-range BG readings ranged from 39.0% to 73.7% across individual DADs, low BG sensitivity from 33.3% to 100.0%, and high BG sensitivity from 29.4% to 76.9%. True positive rate ranged from 37.5% to 94.1%, while specificity ranged from 0.0% to 96.4%. Overall accuracy ranged from 30.9% to 73.8%. Table 4 displays the number of DADs who achieved ≥ 60%, 65%, and 70.0% scores in their overall sensitivity, low/high sensitivity, true positive rate, and overall accuracy. As the table indicates, half of the DADs performed at ≥ 60% low BG sensitivity, with far fewer DADs performing this well in high BG sensitivity. PLR values ranged from 0.41 to 10.93, with 3/18 DADs at scores > 1.00.
Number of DADs Achieving ≥ 60%, 65%, and 70% Accuracy.
Table 1 displays the type of alert behaviors reported by owners, the reported frequency of each alert behavior, the number of participants who reported each behavior, and the true positive rate for each behavior. Examining only those behaviors with ≥ 30 occurrences, the five behaviors with the highest true positive rate were sniffing (86.7%), scratching (82.7%), jumping (81.8%), pacing (79.3%), and other bodily contact (77.4%). The five behaviors with the lowest true positive rate were staring (54.0%), disobedience (56.5%), whining (60.3%), nose nudging (61.4%), and yawning (66.0%).
Discussion
This is the first study to compare recorded DAD alerts to actual BG readings in a real-world setting over an extended period of time averaging just over a month (range 5 to > 120 days). When data was analyzed across participants, the overall accuracy rate was 54%, with a true positive rate of 69%. Overall sensitivity was slightly higher for low BG readings compared to high BG readings. However, when sensitivity for low BG readings is examined closely, the rate of “missed” hypoglycemic readings was 40.8%. Sample PLR indicated that DAD alerts corresponded more frequently with out of range BG. Although CGM accuracy studies rarely report signal detection statistics, this value was substantially lower than results reported in one previous study of CGM accuracy. 12 Taken together, the results across all participants do not support the belief that DADs are more accurate than diabetes technology. These findings, that DADs as a group are above 50% accuracy but not more accurate than glucose monitoring devices, are in alignment with the recently published article comparing alerts to CGM readings. 8
A different picture emerged, however, when data was analyzed separately for each individual DAD, showing a great deal of variability in detection accuracy. True positive rates ranged from 37% to 94% and the proportion of DADs achieving ≥ 60%, 65%, and 70% true positive rates was 71%, 50%, and 44%, respectively. More DADs showed ≥ 60%, 65%, and 70% accuracy in terms of sensitivity to low BG values as compared to sensitivity to high BG. PLR was generally low, with 83% of DADs at values < 1.0, but a few participants reported accuracy equivalent to the CGMs in the Wentholt et al study. 12 These findings suggest that some DADs are more accurate than others, especially at low BG detection, and that accuracy rates may be comparable to current diabetes technology. However, given this variability in performance it appears to be important to objectively test the accuracy of individual DADs prior to home placement.
Based on individual variability of accuracy across DADs, it is important to consider potential factors that contribute to these observed differences. In this study, all DADs were Labrador Retrievers, bred and trained by one organization in an attempt to control for breed, genetic factors, and training technique, all of which vary greatly across different training organizations and programs. 13 As part of their placement procedure, this organization places DADs in homes after a few months of initial training, then makes periodic visits to the home to continue training over the first year. Thus, it could be argued that the DADs tested in this study were younger dogs with limited training which may be associated with lower accuracy; however, this does not address the observed variability in accuracy. One contributor could simply be that dogs differ in innate ability or level of skill in BG detection, even within the same breed. Another possibility, indicated by this data set, may be the specific alert behavior the DAD exhibits and/or the DAD owner utilizes as an alert signal. Certain DAD behaviors (eg, pawing) appeared to be associated with higher true positive rates while other behaviors (eg, staring), were associated with lower accuracy. Another important factor to consider is the ability of the DAD owner to recognize true alert behaviors, and to distinguish these from DAD behaviors that are not related to BG values. DAD owners may also vary in skills related to training, such as appropriate reinforcement for alert behaviors. Studies in highly controlled experimental settings have the benefit of bypassing the influence of such owner skills;6,7 however, findings may not necessarily transfer to real-world settings.
Additional methodological limitations exist in this study. The data was based on self-reported diaries and nonmasked glucose data, which one could argue might bias the findings. For example, DAD owners could retroactively note an alert behavior based on an out of range SMBG value. However, these results are not indicative of biased reporting that make DADs appear to be more accurate, as rates were generally lower than owners’ reported beliefs about their DADs’ accuracy in a previous study. 5 Another methodological problem is that the available BG readings were discrete variables recorded either in response to an alert (or other signals such as physical symptoms) or as part of routine self-monitoring. This means that we could not obtain an accurate measure of “missed” out-of-target BG readings or “correctly rejected” in-target readings, which could lead to overestimation of sensitivity and underestimation of specificity. To remedy this shortcoming, we are currently conducting a study using masked CGM to assess DAD accuracy in the real world. Finally, this study was observational in nature and not truly experimental, so it cannot systematically assess whether DADs were truly performing better than a group of untrained dogs would perform. Related to this, though a strength of the study is the high level of control afforded by use of a single training organization, results may not be generalizable to alternative DAD training and breeding procedures.
Conclusions
This study provides preliminary evidence that some DADs are relatively accurate at detecting BG fluctuations outside of the target range, especially low BG levels. In contrast, other DADs showed very poor accuracy. Importantly, these findings also suggest that DAD accuracy is likely a complex process that can be affected by numerous factors, including the interactions between the DAD and its owner. From a clinical perspective, it will be important to develop objective procedures to assess DAD accuracy to ensure that individuals with diabetes have the information they need to determine the degree to which their DADs’ alerts are reliable and valid. In the current market and industry of medical detection dogs, there are no regulatory guidelines for determining DAD accuracy and, as the number of people using DADs continues to grow, evaluation and ultimate standardization of these factors are essential.
Footnotes
Abbreviations
BG, blood glucose; CGM, continuous glucose monitor; DAD, diabetes alert dog; PLR, positive likelihood ratio; SMBG, self-monitored blood glucose; T1D, type 1 diabetes.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: LGF has previously served as a consultant for Dexcom, Inc. All other authors report no disclosures.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research was supported by NIH-NIDDK grant 1R21DK099697-01.
