Sage Journals: Discover world-class research

Abstract

Keywords

LLM chatbots gestational diabetes counseling artificial intelligence LLaMa GPT-4

Women with gestational diabetes mellitus (GDM) require individualized assistance to navigate the complexities of blood glucose control, dietary modifications, and potential medical interventions. In Denmark, the prevalence of GDM increased from 1.7% in 2004 to 4.2% in 2017, reflecting a global trend.¹ Despite this increase, financial resources allocated to the health care sector have only seen marginal growth. Projections indicate that by 2030, a substantial number of additional healthcare professionals, including doctors, nurses, and nursing aides, will be required to maintain the current level of health care service provision.²

Artificial intelligence (AI)–driven chatbots, such as OpenAI’s ChatGPT, are conversational agents that emulate human interaction through written communication.^3,4 Chatbots like ChatGPT have the potential to lighten or streamline healthcare personnel tasks related to text, such as writing summaries for health journal documentation or responding to messages.^3,4

Current digital telehealth interventions enable health care providers to interact with patients through various platforms, including email and video calls.³ However, these interventions often face challenges related to inflexibility. In contrast, AI chatbots offer flexible, on-demand, and personalized support, thereby addressing the limitations of traditional telehealth services. Thus, it is imperative for health care systems to adapt and integrate these innovations to ensure optimal care for conditions such as GDM.

The objective of this proof-of-concept study was to evaluate the clinical accuracy and sentence construction quality of responses generated by large language model (LLM)-based chatbots to 10 commonly asked questions related to GDM. The questions are presented in Supplementary Material S1. Six clinicians assessed and scored responses (on a scale of 1 to 5; low to high) from ChatGPT (v4.0, based on GPT-3.5-turbo-0125), DanskGPT (based on LLaMa v.2), and a clinician. ChatGPT was fine-tuned using non-sensitive data collected from Facebook groups, websites with frequently asked questions, and local clinical guidelines. The origin of the responses was blinded during the assessment process. The differences in scores were tested statistically using the Friedman test with post hoc analyses.

The assessment of clinical accuracy yielded median [25th/75th quantiles] scores of 5 [4;5] for ChatGPT, 4 [3;4] for DanskGPT, and 4 [3;4] for the clinician’s answers, with a significant difference observed (P < .001). Post hoc tests revealed a significant difference between ChatGPT and both DanskGPT and the clinician’s answers (P < .05).

The assessment of sentence construction quality showed median [25th/75th quantiles] scores of 4.5 [4;5] for ChatGPT, 3 [3;4] for DanskGPT, and 5 [4;5] for the clinician’s answers, with a significant difference observed (P < .001). Post hoc tests indicated a significant difference between ChatGPT/clinician and DanskGPT (P < .05). Figure 1 illustrates the average score for each response.

Figure 1.

Score heatmap showcases the clinician survey scores in heatmaps, presenting average clinical accuracy and sentence construction quality scores for the clinician, DanskGPT, and GPT-4 across 10 questions.

In conclusion, the results suggest that LLM-based chatbots have the potential to serve as supplementary counseling tools in gestational diabetes care. However, further evidence is needed to consolidate these findings and investigate potential limitations.

Supplemental Material

sj-docx-1-dst-10.1177_19322968241265882 – Supplemental material for The Potential of Large Language Model-Based Chatbot Solutions for Supplementary Counseling in Gestational Diabetes Care

Supplemental material, sj-docx-1-dst-10.1177_19322968241265882 for The Potential of Large Language Model-Based Chatbot Solutions for Supplementary Counseling in Gestational Diabetes Care by Lukas Lindstrøm, Mia Clausen, Nina Albrektsen Jensen, Maria Hartman Nielsen, Amar Nikontovic and Simon Lebech Cichosz in Journal of Diabetes Science and Technology

Footnotes

Abbreviations

GDM, Gestational diabetes mellitus; AI, Artificial intelligence.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Simon Lebech Cichosz

Supplemental Material

Supplemental material for this article is available online.

References

Scheuer

Andersen

Mathiesen

, et al. Regional divergence and time trends in the prevalence of gestational diabetes mellitus: a national Danish cohort study. Acta Diabetol. 2023;60(3):379-386.

Nye tal viser dramatisk mangel på arbejdskraft: Behov for 40.000 flere sundhedspersoner i 2030. https://ugeskriftet.dk/nyhed/nye-tal-viser-dramatisk-mangel-pa-arbejdskraft-behov-40000-flere-sundhedspersoner-i-2030. Accessed May 23, 2024.

Aggarwal

Tam

Qiao

. Artificial intelligence–based chatbots for promoting health behavioral changes: systematic review. J Med Internet Res. 2023;25:e40789.

Thirunavukarasu

Ting

DSJ

Elangovan

Gutierrez

Tan

Ting

DSW

. Large language models in medicine. Nat Med. 2023;29(8):1930-1940.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB