Digital Diabetes Data and Artificial Intelligence: A Time for Humility Not Hubris

Abstract

In the future artificial intelligence (AI) will have the potential to improve outcomes diabetes care. With the creation of new sensors for physiological monitoring sensors and the introduction of smart insulin pens, novel data relationships based on personal phenotypic and genotypic information will lead to selections of tailored, effective therapies that will transform health care. However, decision-making processes based exclusively on quantitative metrics that ignore qualitative factors could create a quantitative fallacy. Difficult to quantify inputs into AI-based therapeutic decision-making processes include empathy, compassion, experience, and unconscious bias. Failure to consider these “softer” variables could lead to important errors. In other words, that which is not quantified about human health and behavior is still part of the calculus for determining therapeutic interventions.

Keywords

artificial intelligence big data analytics quantitative fallacy human behavior clinical decision-making

Torture the data and it will confess to anything.

—Ronald Coase

Successful Diabetes Treatment Needs Data

Discussion on use of artificial intelligence (AI) and health specifically is ubiquitous in the medical and lay press reflecting the perception that it has enormous potential to reduce the personal and global burden of many long-term medical conditions. Currently diabetes appears to be the poster child for the application of AI in health care for a number of reasons.¹

Worldwide, the number of adults and children developing diabetes continues to rise in parallel with global access to smartphone technologies.

On a daily basis, personal data from people living with diabetes are continuously created and logged.

Although the main variable of interest is glucose, with the rise in consumer tracking technologies, glucose data are being supplemented with additional information related to nutrition, physical activity, and sleep.

With the increasing availability of additional sensor technologies for physiological monitoring including smart insulin pens, social media, and records of internet searches, the diabetes data pool will continue to grow.^2,3 Moreover, other data-generating comorbidities (eg, hypertension and cardiac arrhythmias) plus information from screening tests for complications (eg, retinopathy) are also adding to this “big data” resource.

The anticipated value from this torrent of data is that it can be analyzed and converted into patterns leading to actionable information, that is, a clear opportunity for AI.⁴ For clinicians and people with diabetes, examples of actionable information are early prediction of severe hypoglycemia not just in those with hypoglycemia unawareness or the most opportune time for insulin initiation and optimization in type 2 diabetes. Existing large population data sets have already been used to predict the onset of type 2 diabetes, which appear to have better prediction performance than classical diabetes risk prediction algorithms.⁵ The use of AI to analyze big datasets comprised of many data streams (not all of which are human sensor data and may be behavioral or geographic in origin) is already becoming a reality.⁶ It is important to note that this type of big data is not analyzed as if it is presented on a very large spreadsheet because this type of data is often unstructured (eg, pictures, phone messages, video, email, and text messages) and not amenable to capture, storage, and management by commonly used software tools—analyses of big data typically require distributed computation over a cluster of computers.^7,8 The process of assembling a highly detailed set of phenotypic and genotypic data to obtain the most appropriate treatment for individuals with a specific combination of traits is the basis of precision medicine.⁹ There is a growing belief that novel data relationships based on phenotypic and genotypic information will lead to powerful predictions and accurate selections of tailored therapies that will transform health care in a very positive way. The diabetes clinic of the future is likely to be unrecognizable from its current format.¹⁰ The anticipated promise from the triumvirate of (1) the internet of medical things, (2) big data, and (3) AI analyzed by way of cloud computing is being welcomed as necessary, inevitable, and beneficial.¹¹ However, this paradigm may turn out instead to be a modern-day “quantitative fallacy."

Quantitative Fallacies

A quantitative fallacy refers to a flawed decision-making process that is based exclusively on quantitative metrics and that ignores qualitative factors. The most well-known example is the eponymous McNamara Fallacy, named after the US Secretary of Defense during the Vietnam War and summarized as “if it cannot be measured, it is not important.”¹² The genesis of a quantitative fallacy requires four erroneous steps¹³ (Table 1).

Table 1.

Four Steps for Creating a Quantitative Fallacy.

1. Measure whatever can easily be measured
2. Disregard that which cannot be easily measured or give it an arbitrary quantitative value
3. Presume that what cannot be measured easily is not important
4. Believe that what cannot be easily measured really does not exist

In health care, most previous examples of this flawed type of decision process have been based on the mistaken belief that all of clinical practice can be quantified.¹⁴ In consideration of big data analytics and AI in diabetes care difficult to quantify inputs into the therapeutic decision making processes include empathy, compassion, understanding, previous experiences, and unconscious bias.¹⁵ Failure to consider these so-called “softer” variables could lead to important errors in AI when used to solve clinical problems. In other words, that which is not quantified about human health and behavior is still part of the calculus for determining therapeutic interventions. For example, continuous care by the same doctor over time is associated with greater patient satisfaction, improved health promotion, increased adherence to medication, reduced hospital use, and a reduced risk of premature death.¹⁶ The reasons for this beneficial effects of care from the same clinician over time are likely to be multifactorial, but it is also noteworthy that doctors tend to overestimate their effectiveness when consulting with patients they do not know, and underestimate their effectiveness when consulting with patients they know.¹⁷ It remains to be determined whether the same correlation applies to AI-delivered care in the future.

Data Sources: Quantitative and Qualitative

Sources of big data for diabetes include both (1) structured data from electronic health records, population registries, clinical trials, biometric data from an increasingly wide array of physiological and geospatial sensors, as well as (2) unstructured data, from medical images, photos, audio and video recordings, social media content, and consumer search data based on information collated with a smartphone. The diversity of these health care data sources can create methodological challenges for data integration. To date, big data analytics, machine learning, and AI are in their infancies with respect to providing software-generated decision support, but over time these sources of therapeutic recommendations are likely to become increasingly embedded into the health care system. As discussed earlier, unwavering adherence to the mantra of an artificial intelligence “solution” for diabetes care based solely on big data analytics (ie, use of software that learns from patterns in the data) has the potential to create a digital diabetes fallacy if there is sole reliance on the measurable. In addition, there are many methodological challenges to creating useful quantitative datasets, including (1) ensuring data quality especially from electronic health record sources, (2) maintaining data consistency, and (3) standardizing outcomes data from clinical trials. Moreover, the process of clinical decision making is invariably not recorded.¹⁸ Therefore, it is important to consider qualitative factors for affecting decision-making algorithms, which are, at present, difficult to capture but important for diabetes care (Table 2).

Table 2.

Difficult to Quantify Factors That Can Potentially Influence Artificial Intelligence Algorithms.

1. Low-quality quantitative data
2. Language
3. Health beliefs due to cultural, racial, or ethnic influences

Low-Quality Quantitative Data

Quantitative biomedical data can be classified according to its quality. Medical decisions based on artificial intelligence depend on the quality of the inputted data—in other words, poor quality of quantitative data can lead to poor decision making. A recent review of the health care quality literature generated 96 terms used to describe data quality concepts.¹⁹ The six most widely recognized dimensions of biomedical data quality are presented in Table 3.²⁰

Table 3.

Six Dimensions of Data Quality.

Type of dimension	Definition
Relevance	Degree to which the information meets the needs of users
Accuracy	Degree to which the information correctly describes what it was designed to measure
Timeliness	Delay between the time to which the information pertains and when the information becomes available
Accessibility	Ease with which the information can be obtained
Interpretability	Availability of supplementary information and metadata necessary to interpret the information
Coherence	Degree to which a set of information can be combined with other information

The potential limiting factors of big data have been summarized into four features known as the four Vs: volume, velocity, variety, and veracity.²¹ Limitations in these areas can lead to misinterpretation of data sources. For example, the hype created at the onset of the digital revolution suggested that real-world data from individuals based on their online activities including social media could supplant traditional approaches to public health. It was suggested that identification of an influenza epidemic or an adverse drug effect could be determined by counting web searches of related topics. This proved to be incorrect as a standalone method; however, this approach can provide useful supplemental information.^22-25 In day-to-day clinical practice, patient-generated data are invariably unstructured and highly context-dependent, and the impact of illness on an individual’s behavior and cognitive processing has been underappreciated.²⁶ Going forward, it will be necessary to find a way to combine quantitative data from traditional health systems with qualitative patient-generated data.

The use of big data analytics to form conclusions can also contain risks of mishandling of the data or inadequate high-quality data to form robust conclusions. Fallacies in the generation of quantitative data from research design, sampling, and instrumentation, statistical analysis, and interpretation can result in unrecognized knowledge gaps^27,28 (Table 4).

Table 4.

Potential Risks of Gaps in the Analysis of Quantitative Data.

• Stratification of individuals into subgroups in error, eg, misclassification of diabetes type.²⁹
• Variable effects of an illness upon data which can change over time.³⁰
• Failure to consider the impact of the prevailing glucose level on patient-generated physical, psychological, and behavioral responses, eg, making assessments during or following a hypoglycemic event.³¹
• Exclusion bias. Absence of data from individuals not using social media can skew the interpretation. Potentially, there can be more or less big data from wealthier and younger communities as well as geographical bias (ie, urban versus rural populations contributing toward big data).³²
• Inappropriate conclusions from novel big data sets without clinical interpretation, or statistical governance could lead to model overfitting and the belief in spurious relationships between data groups.³³
• Patient behavior including the generation of factitious data.³⁴

Mishandling of data also relates to information privacy. A successful doctor-patient relationship is based on the medical practitioner’s ability to keep information confidential—trustworthiness. For AI data are increasingly being deidentified, which works at a population level, but for personalized decision support other safeguards are necessary to protect privacy.³⁵

Language

For any AI system to work efficiently and effectively, it will need to understand the nuances of the language of health care from the perspective of people with diabetes and not simply the jargon favored by clinicians.³⁶ Potential confusion could arise with homophones (words that sound the same, but which have different meanings and spellings, such as cabbage and CABG) and homographs (words that are spelled the same, but have different meanings. For example, one man’s emergency department (ED) is another’s erectile dysfunction, and a verbal order for K therapy in the emergency department can result in administration of either potassium or vitamin K). Within a single language, there are also dialectal differences—what would an AI system make of the common Scottish vernacular use of “bampots,” “bevvies,” and “bairns,” or the use of a “stookey” for a broken arm? There is already abundant evidence that many patients encounter barriers to understanding health related information, and that materials and other content created by clinicians often fail in terms of understandability.^37,38 Language barriers can also contribute to health disparities. US Latino diabetes patients with decreased English language skills have been shown to be at increased risk of poor glycemic control, however this risk is not present when care is delivered by physicians who speak Spanish.³⁹ It is also worth noting that AI development itself has highlighted the underrecognized clinical challenge of patients’ and doctors’ different understanding of what is being said.⁴⁰ If technology companies are to create useful AI systems, then they will need to access language from a variety of sources. These will include handwritten notes, letters, and emails (ie, medical records), and presumably (and controversially) they will also listen directly to patients talking with their clinicians.⁴¹

Race and Ethnicity

To be ultimately successful, AI requires evidence from clinical trials. In the United States, racial/ethnic minority populations are disproportionately affected by diabetes and the associated complications.⁴² However, despite the discriminatory nature of diabetes being self-evident, minority participation in technological interventions such as artificial pancreas development in type 1 diabetes and trials of new therapeutic agents in type 2 diabetes has been consistently low.^43,44 Failure to recruit adequate numbers of minorities in clinical trials results in (1) poor trial validity, (2) poor generalizability of the results, (3) magnification of inequalities, and (4) concern about failure to detect harm in certain populations. Structured interventions, tailored to ethnic minority groups by integrating elements of culture, language, religion, and health literacy skills, have demonstrated that these measures can produce a positive impact on a range of patient-important outcomes for individuals with diabetes.⁴⁵ Similarly, a review of 34 randomized trials testing culturally tailored interventions to prevent diabetes in minority populations noted that culturally tailored interventions were effective in improving risk factors for progression to diabetes among ethnic minority groups.⁴⁶ There is also evidence that the differences in diabetes beliefs (between low- and high-education African American, American Indian, and white older adults) are due to socioeconomic conditions.⁴⁷ Translating culturally focused education programs that, in addition, take into consideration changing socioeconomic circumstances are not easily amenable to being generated by computers simply using quantitative data.

Conclusion

Big data and artificial intelligence will be useful tools for treating diabetes in a precision medicine or precision public health paradigm. The nature of the analytic tools to process diverse large datasets is to only use quantitative data. At this time, there are many flaws with total dependence on quantitative data, based on the frequent inadequate quality of this type of data as well on the frequent need to supplement a quantitative approach with a qualitative approach. Going forward, the conversion of unstructured data into digital-processible data is the domain of cognitive computing that is likely to add significant value to AI.⁴⁸ Factors besides objective data also go into clinical decision making, such as sentiment, intuition, and a physician’s experience, which have been referred to as judgment or a “gut feeling." Cognitive computing is currently ill equipped to duplicate this subjective part of reaching medical conclusions.⁴⁹ There remains a need for human physicians to treat diabetes and other diseases to provide judgment, compassion, and context, which will not be available from computers for the foreseeable future.

Footnotes

Abbreviations

AI, artificial intelligence; ED, emergency department.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Contreras

Vehi

Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018;20:e10775.

Majumder

Mondal

Deen

MJ.

Wearable sensors for remote health monitoring. Sensors (Basel). 2017;17(1):E130.

Klonoff

Kerr

Digital diabetes communication: there’s an app for that. J Diabetes Sci Technol. 2016;10(5):1003-1005.

Bellazzi

Dagliati

Sacchi

Segagni

Big data technologies: new opportunities for diabetes management. J Diabetes Sci Technol. 2015;9(5):1119-1125.

Razavian

Blecker

Schmidt

Smith-McLallen

Nigam

Sontag

Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data. 2015;3(4):277-287.

Azmak

Bayer

Caplin

, et al. Using big data to understand the human condition: the Kavli HUMAN project. Big Data. 2015;3(3):173-188.

Andreu-Perez

Poon

Merrifield

Wong

Yang

GZ.

Big data for health. IEEE J Biomed Health Inform. 2015;19(4):1193-1208.

Peek

Holmes

Sun

Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics. Yearb Med Inform. 2014;9:42-47.

Klonoff

DC.

Precision medicine for managing diabetes. J Diabetes Sci Technol. 2015;9(1):3-7.

10.

Kerr

Axelrod

Hoppe

Klonoff

Diabetes and technology 2030: a utopian or dystopian future?

Diabet Med. 2018:35(4):498-503.

11.

Elhosenya

Abdelaziz

Salama

Riad

Muhammad

Sangaiah

AK.

A hybrid model of Internet of Things and cloud computing to manage big data in health services applications. Future Gener Comp Sys. 2018;86:1383-1394.

12.

Cukier

Mayer-Schönberger

The dictatorship of data. MIT Technology Review. 2013. Available at: https://www.technologyreview.com/s/514591/the-dictatorship-of-data/.

13.

Yankelovich

Corporate Priorities: A Continuing Study of the New Demands on Business. Stanford, CT: Yankelovich Inc; 1972.

14.

O’Mahony

Medicine and the McNamara fallacy. J Royal Coll Physic Edin. 2017;47(3):281-287.

15.

Kneafsey

Brown

Sein

Chamley

Parsons

A qualitative study of key stakeholders’ perspectives on compassion in healthcare and the development of a framework for compassionate interpersonal relations. J Clin Nurs. 2016;25(1-2):70-79.

16.

Pereira Gray

Sidaway-Lee

White

Thorne

Evans

PH.

Continuity of care with doctors—a matter of life and death? A systematic review of continuity of care and mortality. BMJ Open. 2018;8:e21161.

17.

Pereira Gray

Sidaway-Lee

White

, et al. Improving continuity: the clinical challenge. InnovAiT. 2016;9:635-645.

18.

Warnecke

The art of communication. Aust Fam Physician. 2014;43(3):156-158.

19.

Johnson

Speedie

Simon

Kumar

Westra

BL.

A data quality ontology for the secondary use of EHR data. AMIA Annu Symp Proc. 2015:2015:1937-1946.

20.

United Nations Economic Commission for Europe. Conference of European Statisticians Recommendations of the 2010 Censuses of Population and Housing. Prepared in cooperation with the Statistical Office of the European Communities (EUROSTAT). New York and Geneva: United Nations Economic Commission for Europe; 2006.

21.

Kruse

Goswamy

Raval

Marawi

Challenges and opportunities of big data in health care: a systematic review. JMIR Med Inform. 2016;4:e38.

22.

Alessa

Faezipour

A review of influenza detection and prediction through social networking sites. Theor Biol Med Model. 2018;15(1):2.

23.

Liu

Chen

Identifying adverse drug events from health social media: a case study on heart disease discussion forums. In: Zheng

Zeng

Chen

Zhang

Xing

Neill

eds. International Conference on Smart Health. ICSH 2014: Smart Health. Cham, Switzerland: Springer; 2014:25-36.

24.

Salathé

Digital pharmacovigilance and disease surveillance: combining traditional and big-data systems for better public health. J Infect Dis. 2016;214(suppl 4):S399-S403.

25.

Pierce

Bouri

Pamer

, et al. Evaluation of Facebook and Twitter monitoring to detect safety signals for medical products: an analysis of recent FDA safety alerts. Drug Saf. 2017;40(4):317-331.

26.

Lawson

Bundy

Belcher

Harvey

JN.

Changes in coping behavior and the relationship to personality, health threat communication and illness perceptions from the diagnosis of diabetes: a 2-year prospective longitudinal study. Health Psychol Res. 2013;1:e20.

27.

Wang

Watts

Anderson

Little

TD.

Common fallacies in quantitative research methodology. In: Little

ed. The Oxford Handbook of Quantitative Methods in Psychology: Vol. 2: Statistical Analysis. New York, NY: Oxford University Press; 2013. doi: 10.1093/oxfordhb/9780199934898.013.0031

28.

Dolley

Big data’s role in precision public health. Front Public Health. 2018;6:68.

29.

Tripathi

Rizvi

Knight

Jerrell

JM.

Prevalence and impact of initial misclassification of pediatric type 1 diabetes mellitus. South Med J. 2012;105(10):513-517.

30.

Vos

Kasteleyn

Heijmans

, et al. Disentangling the effect of illness perceptions on health status in people with type 2 diabetes after an acute coronary event. BMC Fam Pract. 2018;19(1):35.

31.

Kerr

Macdonald

Tattersall

RB.

Patients with type-1 diabetes adapt acutely to sustained mild hypoglycaemia. Diabetic Med. 1991;8(2):123-128.

32.

Pal

BR.

Social media for diabetes health education—inclusive or exclusive?

Cur Diabetes Rev. 2014;10(5):284-290.

33.

Babyak

MA.

What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2004;66(3):411-421.

34.

Horwitz

DL.

Factitious and artifactual hypoglycemia. Endocrinol Metab Clin North Am. 1989;18(1):203-210.

35.

Flaumenhaft

Ben-Assuli

Personal health records, global policy and regulation review. Health Policy. 2018;122(8):815-826.

36.

Reach

Linguistic barriers in diabetes care. Diabetologia. 2009;52(8):1461-1463.

37.

Schenker

Karter

Schillinger

, et al. The impact of limited English proficiency and physician language concordance on reports of clinical interactions among patients with diabetes: the DISTANCE study. Patient Educ Couns. 2010;81(2):222-228.

38.

Hannonen

Komulainen

Eklund

Tolvanen

Riikonen

Ahonen

Verbal and academic skills in children with early-onset type 1 diabetes. Dev Med Child Neurol. 2010;52:e143-e147.

39.

Fernandez

Schillinger

Warton

, et al. Language barriers, physician-patient language concordance, and glycemic control among insured Latinos with diabetes: the Diabetes Study of Northern California (DISTANCE). J Gen Intern Med. 2011;26(2):170-176.

40.

Schoenick

Clark

Tafjord

Turney

Etzioni

. Moving beyond the Turing Test with the Allen AI Science Challenge. Commun Association for Computing Machinery. 2017; 60: 60-64. Available at: https://cacm.acm.org/magazines/2017/9/220439-moving-beyond-the-turing-test-with-the-allen-ai-science-challenge/fulltext

41.

Reach

The “Chinese room” argument and patient education. BMJ. 2008;336:335.

42.

Bullard

Cowie

Lessem

, et al. Prevalence of diagnosed diabetes in adults by diabetes type—United States, 2016. MMWR Morb Mortal Wkly Rep. 2018;67:359-361.

43.

Choppe

Kerr

Minority underrepresentation in cardiovascular outcome trials (CVOT) for type 2 diabetes?

Lancet Diabetes Endocrinol. 2017;5(1):13.

44.

Huyett

Dassau

Pinsker

Doyle

Kerr

Who is (not) in line for the artificial pancreas?

Lancet Diabetes Endocrinol. 2016;4:880-881.

45.

Zeh

Sandhu

Cannaby

Sturt

JA.

The impact of culturally competent diabetes care interventions for improving diabetes-related outcomes in ethnic minority groups: a systematic review. Diabet Med. 2012;29:1237-1252.

46.

Lagisetty

Priyadarshini

Terrell

, et al. Culturally targeted strategies for diabetes prevention in minority population. Diabetes Educ. 2017;43(1):54-77.

47.

Grzywacz

Arcury

, et al. Cultural basis for diabetes-related beliefs among low- and high-education African American, American Indian, and white older adults. Ethn Dis. 2012;22(4):466-472.

48.

Chen

Elenee Argentinis

Weber

IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research. Clin Ther. 2016;38(4):688-701.

49.

Trafton

. Doctors rely on more than just data for medical decision making. Available at: http://news.mit.edu/2018/doctors-rely-gut-feelings-decision-making-0720. Accessed August 1, 2018.