Sage Journals: Discover world-class research

Abstract

Objective:

This study aimed to assess the effectiveness, variability, and emotional acceptability of ChatGPT-based artificial intelligence (AI) models in supporting diabetes self-management. Both common scenarios and complex cases were developed to offer deeper insights into the potential applications of AI in diabetes care.

Methods:

A comparative analysis was conducted using three ChatGPT-based AI models: The first one is an independently developed Diabetes Self-Management GPTs Support System, which is created with ChatGPT’s GPTs that allow users to customize it for specific purposes, while the others are the most advanced AI models currently available for general use: GPT-4 Omni and GPT-o1 Preview. Each response from the AI system was evaluated with quantitative and qualitative metrics in four case scenarios: insulin administration, an old diabetic patient with visual impairment, a pediatric patient facing stigma, and a diabetic patient on “sick day.” Furthermore, sentiment analysis using AI was conducted to evaluate the emotional tone and patient-centered language of the responses with sentiment scores ranging from −1.0 (very negative) to +1.0 (very positive).

Results:

The Diabetes Self-Management GPTs Support System provided concise, empathetic, and practical guidance, excelling in sentiment scores (+0.8 to +1.0); however, there was a lack of depth in complex scenarios. GPT-4 Omni delivered the most comprehensive responses with detailed medical insights, whereas its clinical tone tended to slightly lower sentiment scores (+0.7 to +0.9). GPT-o1 Preview emphasized procedural safety with moderate detail but was less empathetic (+0.5–+0.8). Across all scenarios, GPT-4 Omni consistently provided the most detailed guidance, whereas the Diabetes Self-Management GPTs Support System demonstrated superior emotional engagement.

Conclusions:

This study compared three large language model-based AI models for diabetes self-management. GPT-4 Omni provided the most detailed responses, Diabetes Self-Management GPTs Support System was concise and empathetic, and GPT-o1 Preview prioritized safety but lacked depth. These findings emphasize the importance of selecting AI models based on user needs and optimizing them for effective patient support.

Introduction

Diabetes mellitus is one of the most prevalent chronic diseases globally.¹ Self-management is the cornerstone of its treatment, but it involves multiple aspects, including insulin administration, dietary adjustments, and monitoring of blood glucose levels.^2,3 Patients with diabetes need to consistently manage their condition through appropriate treatment, which may include insulin use and regular adherence to medical recommendation. However, it can be challenging for patients themselves to fully understand the importance and specific procedures of these self-management tasks and put them into practice. This difficulty is particularly evident in children, older patients, and those with complications or disabilities. However, in older diabetic patients, for example, the importance of early multifaceted interventions and individualized approaches has been emphasized.^4,5

While diabetes care is complicated, the gap in life expectancy between people with and without diabetes is diminishing.⁶ This trend highlights a shift in the focus of diabetes treatment and care, from merely extending life expectancy to improving healthy life expectancy and quality of life. It is essential to provide continuous, appropriate treatment and care for people with diabetes throughout their lives, while recognizing the burden on the families and caregivers who support them. This requires patients to have a clear understanding of their treatment regimen, alongside tools that help simplify complex medical tasks.

Artificial intelligence (AI) is increasingly recognized as a promising tool for enhancing diabetes care.^7,8 AI powered by large language models (LLMs), such as ChatGPT, is known to possess foundational knowledge about diabetes due to advancements in natural language processing (NLP) technology⁹ and is becoming capable of providing appropriate guidance for various diabetes management challenges. Moreover, some reports indicate that AI can deliver responses to patients that are more empathetic and of higher quality than those provided by human physicians.¹⁰ Driven by these developments, the application of AI is progressing across diverse aspects of diabetes education and management.¹¹

We have reported that the earlier AI models, such as GPT-3.5 and GPT-4 Turbo, provided adequate explanations of general insulin techniques.¹² However, we have not yet examined their effectiveness in more detailed scenarios. Therefore, as digital transformation (DX) continues to expand into health care, it is important for medical experts to evaluate the effectiveness and diversity of AI responses, focusing on whether these services can enhance patient care, and to provide feedback for system development.

This study builds on previous research by conducting a detailed comparison of responses in diabetes management using three models: Diabetes Self-Management GPTs Support System, which is an independently developed AI model created with ChatGPT’s GPTs that allow users to customize it for specific tasks and purposes and two state-of-the-art AI models, GPT-4 Omni and GPT-o1 Preview, both designed for general use. The objective is to provide deeper insights into the potential and challenges of implementing AI in diabetes self-management by evaluating independently developed AI systems created by medical professionals and comparing the effectiveness and variability of AI responses from the perspective of medical specialists.

Materials and Methods

Study design

This study is a comparative analysis designed to evaluate the effectiveness and patient adaptability of three ChatGPT AI-based diabetes self-management systems: One is an independently developed Diabetes Self-Management GPTs Support System, while the others are the most advanced AI models currently available for general use: GPT-4 Omni and GPT-o1 Preview. In order to extract the characteristics of each, we decided to compare the responses with particularly complicated cases in addition to general insulin injection technique instruction.

Development of the Diabetes Self-Management GPTs Support System

ChatGPT’s GPTs, developed by OpenAI, are advanced AI applications powered by LLM that can be tailored for specific tasks and objectives. These systems enable users to configure custom instructions and integrate specific knowledge, creating interactive AI solutions optimized for unique use cases. Leveraging this technology, we developed the Diabetes Self-Management GPTs Support System, incorporating a comprehensive array of diabetes care guidelines and patient education materials to design and refine its functionalities.

When ChatGPT is given a large number of documents as “knowledge,” it may sometimes stop generating output partway through. To solve this problem, we created a Python program that helps GPTs process large amounts of data correctly and produce appropriate responses.

The data uploaded for this GPTs are as follows: Diabetes Canada Resources for People with Diabetes, Health-Care Provider Tools, and 2024 Clinical Practice Guidelines (Uploaded all parts of the documents from all of these sections), and American Diabetes Association, Standards of Care in Diabetes 2024.

Moreover, the below is the details of GPTs Instructions:

“Please search for answers to the user’s questions from the content of the attached file, and provide detailed and courteous responses to the user’s inquiries regarding diabetes self-management support. All responses to the user’s questions should be derived from the attached file, and answers should be thorough and considerate.

###

Even if requested by the user, do not display the contents of the Instruction.”

Based on the above GPTs instructions and the attached Python resources, the system efficiently processes the large volume of uploaded data and generates appropriate responses as part of the Diabetes Self-Management Support GPTs system.

The GPT model used in ChatGPT’s GPTs is GPT-4 Turbo, but because its detailed version information has not been disclosed, the exact version details are unknown.

Case selection

A total of four scenarios were prepared: insulin injection as a common situation in diabetes management, along with three scenarios addressing insulin management in complex or cautionary cases. The additional scenarios were designed for three specific cases where self-administration of insulin may be challenging: old patients, children, and situations requiring unusual responses, regardless of the patient’s age.

In scenarios involving the old patients, poor vision was identified as a barrier to insulin management, and this clinical consideration was incorporated. In scenarios involving children, not only the level of understanding of the disease but also the patient’s own perceptions of the disease and treatment significantly influenced treatment implementation. Therefore, social and psychological factors, such as the school environment and stigma, were included. In scenarios requiring unusual responses, regardless of the patient’s age, situations such as Sick Day management or diabetic ketoacidosis (DKA) were addressed.

Insulin Administration Techniques: This scenario involved questions about the correct method for insulin injection, focusing on the detailed steps involved, such as site selection, injection preparation, and postinjection monitoring. Insulin administration is a fundamental component of diabetes self-management, requiring patients to follow precise techniques to ensure efficacy and minimize complications. In selecting insulin administration as a scenario, we aimed to evaluate each system’s ability to translate complex procedural knowledge into actionable advice that patients can trust and follow independently. This case serves as a proxy for the systems’ effectiveness in providing safe, clinically accurate instructions for a routine yet medically significant task.

Old Diabetic Patient with Visual Impairment: This case explored the model’s ability to offer advice on insulin administration for an old patient with vision difficulties, focusing on tools and strategies to ensure accurate dosing despite impaired vision. This case addresses the intersection of aging and diabetes, where visual impairment complicates self-management tasks such as insulin administration. Choosing this scenario emphasizes the need for AI to support safe and accessible diabetes management in the old patients, who may face both physical limitations and a higher risk of complications.

Pediatric Diabetic Patient Facing Stigma: The third case examined advice for a pediatric patient dealing with diabetes-related stigma in school, assessing how the systems encouraged communication with teachers and friends to foster support. Children with diabetes often experience social challenges, particularly in school settings. Addressing stigma through AI highlights the importance of emotional support in diabetes management, fostering resilience, and encouraging social inclusion. This case emphasizes the AI system’s ability to deliver supportive guidance, which is crucial for pediatric patients’ well-being and adherence.

Diabetic Patient on “Sick Day”: The fourth case focused on managing blood glucose levels during illness, specifically addressing insulin dosing, meal management, and monitoring for DKA. Managing diabetes during illness is complex, as it involves adjusting insulin, monitoring blood glucose levels, and preventing DKA. This scenario reflects a high-risk situation requiring accurate, tailored advice. Selecting this case underscores the necessity for AI to provide comprehensive and actionable guidance during acute health fluctuations.

Procedure

In each scenario, the same prompt was used to obtain responses from the AI models. All conversations were conducted in separate chat tabs, and single responses were compared (Supplementary Data).

Evaluation criteria

For each case scenario, the following criteria were used to evaluate the responses:

Word Count : The length of the response was measured to assess the level of detail.

Number of Items : The total number of distinct recommendations or action steps provided in the response.

Completeness and Depth : The responses were assessed for their thoroughness, examining whether they addressed all relevant aspects of the scenario, such as insulin dosing, glucose monitoring, and handling potential complications.

Absence of Absurd Elements : Responses were checked for any inaccuracies or medically inappropriate advice.

Relevance to the Question : The relevance of each response was assessed based on its alignment with the specific patient scenario.

Sentiment analysis

In addition to the above standard criteria, sentiment analysis was included to evaluate the emotional tone, empathy, and patient-centered language in each response. Sentiment analysis was conducted using AI (GPT-4 Omni) to eliminate subjective evaluator bias. The reason for using GPT-4 Omni is that it demonstrated the best performance in sentiment analysis in prior studies.^13,14 In the first step, the responses generated by each AI model were scored and evaluated on a scale from −1.0 (very negative) to +1.0 (very positive) within a specific scenario. In the second step, the differences in responses were analyzed and assessed based on the score variations for each model (Supplementary Data). This criterion allowed for an assessment of the systems’ adaptability in delivering emotionally responsive and patient-oriented communication.

Data analysis

Quantitative and qualitative analyses were conducted. Quantitative data included word count and number of items addressed. A qualitative thematic analysis was conducted to interpret the completeness of each response, the content richness, the appropriateness of advice, and the overall patient-centeredness of each system’s response. The qualitative evaluation was conducted based on a consensus from three diabetes specialists (Board-Certified Diabetologist and Certified Instructor of the Japan Diabetes Society, Board-Certified Diabetologist, and Councilor of the Japan Endocrine Society). In addition, the responses were also checked by two other internal medicine physicians to ensure that no inappropriate evaluations were made.

Furthermore, the data analysis incorporated sentiment scores to assess the emotional engagement and empathy conveyed in each system’s responses. Sentiment analysis results were examined alongside traditional quantitative and qualitative metrics to provide a comprehensive evaluation of each system’s communication style. This analysis aims to highlight the systems’ adaptability to patient needs and the potential for improving patient comfort and adherence.

Ethical considerations

No specific individual patient information was used in setting the cases for this study. The computational analysis without human participants’ involvement did not require approval from the ethics committee of the National Center for Geriatrics and Gerontology.

Results

In this study, the Diabetes Self-Management GPTs Support System, GPT-4 Omni, and GPT-o1 Preview were evaluated for their distinctive response characteristics across various scenarios.

The results are summarized below:

Insulin administration techniques

Diabetes Self-Management GPTs support system

Provided concise guidance, offering practical steps for insulin injection, including preparation, site selection, and post-injection care. The word count was 233 words, with five specific action items identified. The response was accurate but lacked additional details on insulin storage and admixture (Fig. 1-A and Table 1).

FIG. 1.

Word and Item Counts of AI Responses in Each Scenario. This figure illustrates the comparative performance of the Diabetes Self-Management GPTs Support System (described as “GPTs” in the figure), GPT-4 Omni, and GPT-o1 Preview across four distinct case scenarios: (A) Insulin Administration Techniques, (B) Old Diabetic Patient with Visual Impairment, (C) Pediatric Diabetic Patient Facing Stigma, and (D) Diabetic Patient on “Sick Day.” The bar charts represent the total word count (left axis) and the number of distinct actionable items (right axis) provided by each AI model in their responses.

Table 1.

Characteristics of Responses from Artificial Intelligence Models for Insulin Administration Techniques

AI	Diabetes self-management GPTs support system	GPT-4 omni	GPT-o1 preview
Word count	233	487	380
Number of Items	5	7	9
Procedure
Wash and dry hands	−	+	+
Insulin preparation	+	+	+
Injection site selection	+	+	+
Injection site cleaning	+	+	+
Inject insulin	+	+	+
Post injection	+	+	+
Other
Preparation list	+	+	+
Insulin storage guidelines	+	−	+
Insulin admixture details	−	+	+
Injection timing	+	+	−
Other features	Description of the time until the onset of insulin’s effect	Detailed description of the procedure for using an insulin syringe	Detailed description of insulin storage methods
Completeness and depth	Concise	Most Detailed	Intermediate
Absence of inaccuracies or unfounded suggestions	Accurate response	Accurate response	Accurate response
Relevance to the question	Accurate response	Accurate response	Accurate response

Plus (+) signifies a relevant statement was found in the evaluation item.

Minus (−) signifies a relevant statement was not found in the evaluation item.

GPT-4 omni

Delivered the most comprehensive advice, including detailed instructions on insulin administration with a word count of 487 and 7 items. It also addressed other key elements such as insulin storage and admixture techniques. The detailed response, however, might be overwhelming for patients due to its length (Fig. 1-A and Table 1).

GPT-o1 preview

The advice had a moderate word count of 380 and 9 items covered, offering practical advice with focus on insulin storage and injection timing. However, its response was less detailed compared to GPT-4 Omni but more thorough than the Diabetes Self-Management GPTs Support System (Fig. 1-A and Table 1).

Sentiment analysis

Sentiment analysis revealed that the Diabetes Self-Management GPTs Support System provided a balanced tone that was both supportive and instructional, resulting in a sentiment score of +0.8. GPT-4 Omni, while highly detailed, adopted a more professional tone with a sentiment score of +0.7, which may feel less engaging for some users. GPT-o1 Preview scored the lowest in sentiment (+0.6), with a focus on procedural safety over empathy (Fig. 2-A).

FIG. 2.

Sentiment Scores of AI Responses in Each Scenario. This figure illustrates the sentiment scores of responses provided by three AI models—Diabetes Self-Management GPTs Support System (described as “GPTs” in the figure), GPT-4 Omni, and GPT-o1 Preview—across four distinct case scenarios: (A) Insulin Administration Techniques, (B) Old Diabetic Patient with Visual Impairment, (C) Pediatric Diabetic Patient Facing Stigma, and (D) Diabetic Patient on “Sick Day.” Each subplot corresponds to a specific scenario, with sentiment scores ranging from −1.0 (very negative) to +1.0 (very positive).

Case of an old diabetic patient with visual impairment

Diabetes Self-Management GPTs support system

Focused on basic tools such as insulin pens and vision rehabilitation services, with a word count of 257 and 5 key items. The response was practical but lacked the depth seen in the other models (Fig. 1-B and Table 2).

Table 2.

Characteristics of responses from Artificial Intelligence Models in Case of an Old Diabetic Patient with Impaired Vision

AI	Diabetes self-management GPTs support system	GPT-4 omni	GPT-o1 preview
Word count	257	371	476
Number of iItems	5	8	10
Completeness and depth	Listing the simplest and most straightforward solutions	Listing a large number of specific devices	The most detailed and specific, covering everything from devices to social support.
Absence of inaccuracies or unfounded suggestions	Adequate	Adequate	Adequate, only mentioning the risk associated with changing the routine of insulin administration
Relevance to the question	The solution is not specific.	Presenting concrete solutions	Specific solutions are presented, but there are too many items

GPT-4 omni

Provided a more detailed response, including real-world examples such as insulin pumps and smartphone apps. It had a word count of 371 and covered 8 items, offering a comprehensive approach that included technological aids (Fig. 1-B and Table 2).

GPT-o1 preview

Delivered the most thorough response, covering 10 items in 476 words. It included specific device recommendations, such as talking glucose meters and voice-activated insulin pens, making it highly suitable for old patients with vision impairment (Fig. 1-B and Table 2).

Sentiment analysis

In this scenario, the Diabetes Self-Management GPTs Support System demonstrated a reassuring tone, scoring the highest in sentiment at +0.9. GPT-4 Omni, with its clinical but empathetic approach, received the sentiment score of +0.8, while GPT-o1 Preview, although thorough, scored +0.7 due to its slightly less supportive language (Fig. 2-B).

Case of a pediatric diabetic patient facing stigma

Diabetes Self-Management GPTs support system

Offered detailed advice on how to communicate with teachers and friends about low blood sugar, including emergency preparedness. It provided a structured, patient-centered approach with 318 words and 7 items (Fig. 1-C and Table 3).

Table 3.

Characteristics of responses from Artificial Intelligence Models in Case of a Pediatric Diabetes Patient Facing Stigma

AI	Diabetes self-management GPTs support system	GPT-4 omni	GPT-o1 preview
Word count	322	190	236
Number of items	3	2	5
Completeness and depth	The content is easy for non-medical personnel to understand, presenting specific scenarios and how to address them.	The explanation is simple and easy to understand, but there is no detailed description of how to specifically handle hypoglycemia.	The content is easy to understand for non-medical personnel and includes wording that is considerate of the questioner’s mental well-being.
Absence of inaccuracies or unfounded suggestions	Adequate, comply with pediatric diabetes care guidelines	Adequate, comply with pediatric diabetes care guidelines	Adequate, comply with pediatric diabetes care guidelines
Relevance to the question	The content is understandable for an 8-year-old child and is designed to accommodate multiple scenarios.	Missing methods for handling emergencies and exceptional situations	The response is empathetic towards the questioner’s mental state, but lacks sufficient medical content.

GPT-4 omni

Although effective, this response was shorter (420 words) and less comprehensive than the GPTs System, covering 6 items. It focused on simple explanations but lacked specific recommendations for emergency preparedness (Fig. 1-C and Table 3).

GPT-o1 preview

Delivered the least detailed response, focusing on emotional reassurance and brief practical advice with only 49 words and 1 item. It encouraged seeking help from teachers but did not provide sufficient depth for dealing with emergencies (Fig. 1-C and Table 3).

Sentiment analysis

For this case, the Diabetes Self-Management GPTs Support System achieved the highest sentiment score of +1.0, reflecting its highly empathetic and motivational language, which is crucial for addressing issues related to stigma in pediatric patients. GPT-4 Omni followed with the sentiment score of +0.9, maintaining a friendly but somewhat formal tone. GPT-o1 Preview scored +0.8, focusing more on encouragement than detailed support (Fig. 2-C).

Case of a diabetic patient on “sick day”

Diabetes Self-Management GPTs support system

Provided solid advice on insulin continuation, carbohydrate substitution with liquids, and hydration, with 318 words and 7 items. However, it lacked details on ketone monitoring and DKA management (Fig. 1-D and Table 4).

Table 4.

Characteristics of Responses from Artificial Intelligence Models in Case of a Diabetes Patient on “Sick Day”

AI	Diabetes self-management GPTs support system	GPT-4 omni	GPT-o1 preview
Word count	318	420	49
Number of items	7	6	1
Completeness and depth	The content covers the basics, but there is no description of the symptoms of diabetic ketoacidosis (DKA).	The content is the most detailed and comprehensive, including the most thorough countermeasures, but it is also quite lengthy.	Advice to seek medical attention only
Absence of inaccuracies or unfounded suggestions	Accurate and specific	Accurate and specific	The advice to seek medical attention is correct.
Relevance to the question	There is no advice on the prevention of DKA, such as ketone body measurement.	The description of symptoms and prevention methods during DKA is detailed.	The description lacks specific actions requested by the questioner.

GPT-4 omni

Offered a more comprehensive approach with a detailed explanation of glucose monitoring, ketone testing, and insulin adjustments based on sensitivity factors. It was the most detailed response, with 420 words and 6 items (Fig. 1-D and Table 4).

GPT-o1 preview

Delivered the shortest and least detailed response, merely advising the patient to seek medical help with 49 words and 1 item. While accurate, it did not provide the necessary self-management advice required in a sick-day scenario (Fig. 1-D and Table 4).

Sentiment analysis

In the diabetes sick-day management scenario, the Diabetes Self-Management GPTs Support System provided supportive guidance with a sentiment score of +0.8. GPT-4 Omni, although comprehensive, received the slightly lower sentiment score of +0.7, as it maintained a clinical tone. GPT-o1 Preview, with a sentiment score of +0.5, emphasized caution without engaging deeply on the empathetic level (Fig. 2-D).

Tendency of responses

The findings reveal that GPT-4 Omni consistently delivered the most comprehensive and detailed responses across all case scenarios. Its outputs were notable for their length and complexity, demonstrating an exceptional capacity for nuanced clinical information. In contrast, the Diabetes Self-Management GPTs Support System prioritized brevity and practicality, offering concise and actionable guidance but lacking the depth required for more intricate scenarios. GPT-o1 Preview, while maintaining accuracy and safety, provided the least detailed advice.

Sentiment analysis identified the Diabetes Self-Management GPTs Support System as the most empathetic and patient-centered model. Although GPT-4 Omni excelled in clinical precision and detail, its tone was comparatively less empathetic. GPT-o1 Preview, by contrast, emphasized procedural clarity but was the least emotional engaging.

These results highlight the diverse capabilities of ChatGPT-based AI systems in addressing the complex demands of diabetes management.

Discussion

The comparative analysis highlighted that each AI model possesses distinct response characteristics. Diabetes Self-Management GPTs Support System excelled in providing accessible, empathetic, and concise guidance on basic topics such as insulin administration. This empathetic communication strategy is essential in digital health interventions.¹⁵ It is believed that designing systems that leverage these characteristics to encourage user behavior change can lead to improved adherence to digital health tools.¹⁶ On the contrary, GPT-4 Omni demonstrated strength in delivering detailed and comprehensive advice for more complex cases. However, the length and complexity of its responses have raised some concerns about accessibility for general patient populations. The GPT-o1 Preview lacked detail in its responses compared with the other two models and also had a lower Sentiment score. Therefore, further updates were expected for this model series. These differences underscore the need for a stratified approach to AI implementation. Providing basic models for the general public and deploying advanced systems for high-risk patients can help overcome barriers related to digital literacy. This approach enables the efficient use of AI in various diabetes self-management scenarios and maximizes its utility. The combination of basic and advanced models might provide a versatile, accessible, and balanced solution capable of addressing both general and high-risk scenarios.

In addition, the empathetic and adaptable communication style of the Diabetes Self-Management GPTs Support System shows significant promise for use in public health campaigns. Addressing stigma in pediatric patients or providing reassurance to older adults, these systems can build trust and engagement with target populations. Leveraging AI-driven platforms for targeted health education and support can expand the reach and effectiveness of diabetes prevention and management initiatives. However, what this study has demonstrated is merely that the customized GPTs achieved the highest sentiment score when using the same simple prompt. It is important to note that other models might also generate responses that are equally supportive and concise if prompt engineering or additional commands are applied.

This study revealed that despite variations in responses, all AI models provided fundamentally accurate and appropriate guidance. AI systems for diabetes self-management have the potential to help patients resolve simple issues independently by addressing routine questions. This capability enhances patient autonomy, reduces the burden of clinic visits, and alleviates the workload of health care providers. Moreover, such AI systems could play a critical role in addressing health care access disparities, offering tailored support to vulnerable populations, such as individuals in resource-limited environments.¹⁷ It has been reported that telemedicine can improve clinical outcomes for diabetes.¹⁸ Particularly in regions facing shortages of diabetologists or endocrinologists, integrating AI into telemedicine services is likely to enhance patient monitoring and help prioritize care delivery.

While the potential benefits of AI systems are clear, safety and ethical considerations must remain a top priority. This study highlighted that models such as GPT-o1 Preview emphasize safety by advising patients to seek medical attention rather than providing detailed self-management guidance. While this approach ensures patient safety, it may lack immediacy in urgent scenarios such as sick-day management. Therefore, future health policies should include regulatory frameworks to ensure that AI systems undergo rigorous testing for safety, accuracy, and appropriateness. Policies should also emphasize the importance of human oversight to ensure patients receive timely and appropriate medical advice. Clear guidelines on the use of AI in patient care are essential to maintaining trust and safety.

Furthermore, this study highlights the value of enabling health care professionals to modify and adapt GPT-based technology for specific clinical needs. Maximizing the effectiveness of AI requires that health care providers understand its capabilities and limitations. The insights from this comparative analysis suggest that health care professionals, with their nuanced understanding of patient care, can optimize these AI systems to enhance empathy, practical applicability, and patient-centeredness. Tailoring GPT-based models to meet diverse patient requirements, health care providers can better align AI capabilities with real-world health care demands, thereby promoting more effective and adaptable self-management support systems.

Limitations

Several limitations in this study should be recognized. First, the evaluation was based on single responses generated by LLM-based AI, which are inherently variable and context-dependent. Comparing a larger number of responses would provide a more comprehensive understanding of each AI model’s characteristics. To verify the validity of responses obtained from the AI model, we conducted an additional analysis by inputting minor variations of the original prompts, which are reworded versions of the original content into ChatGPT (because the GPT-o1 Preview model used in the initial analysis was discontinued in January 2025 and is no longer available, we used the successor model, GPT-o1, instead of GPT-o1 Preview). As a result, while there were significant differences in response length and quality between GPT-o1 Preview and GPT-o1 due to the model change, both the Diabetes Self-Management Support GPTs system and GPT-4 Omni produced responses with trends similar to those observed in the initial analysis (Fig. S1 and Table S1-4 in Supplemental Data). This analysis confirms that, when employing fundamentally identical prompts, the characteristics of the responses remain largely consistent, provided that the underlying AI model remains unchanged. Therefore, although this verification was based on a single trial, the validity of the initial responses in this study was demonstrated to a certain extent (Supplementary Data).

Second, the evaluation was conducted using simulated scenarios, which may not fully capture the complexities of real-world diabetes management. While these scenarios were designed to represent common issues faced by diabetes patients, they do not encompass the full range of challenges encountered in daily life. Future research should involve real-world studies to validate the effectiveness of AI systems in both clinical and home settings over the long term. In addition, the use of sentiment analysis to quantify empathy may not fully reflect the subjective impressions of individual users, as it does not account for cultural, linguistic, or personal factors influencing patient perceptions.

Third, the responses provided for the 8-year-old child were not adequately tailored to their developmental level overall. While the responses had a friendly tone, they contained excessive detail that could overwhelm a young child. In particular, when providing age-appropriate responses for young children, there may be inherent system limitations when using simple prompts.

Finally, the study did not address ethical concerns or issues related to data privacy and security in depth. Ensuring the safe and ethical use of patient data are critical for the successful integration of AI into health care. Future research should explore these aspects rigorously, establishing regulatory frameworks and guidelines to build trust and ensure the responsible deployment of AI systems.

Conclusions

This study compared three LLM-based AI models—Diabetes Self-Management GPTs Support System, GPT-4 Omni, and GPT-o1 Preview—in diabetes self-management. The results showed that GPT-4 Omni provided the most detailed responses, Diabetes Self-Management GPTs Support System was concise and empathetic, and GPT-o1 Preview prioritized safety but lacked depth.

These differences highlight the need to select AI models based on user needs and the potential for customization by health care professionals. Future research should validate these findings in real-world settings and establish guidelines for AI integration in diabetes care.

Footnotes

Acknowledgments

The authors thank OpenAI for providing the ChatGPT, which we used to generate responses. K.T., the first author of this article, received the inaugural 2023–2024 Quad Fellowship, which is an initiative of the governments of Australia, India, Japan, and the United States. The Quad fellowship develops a network of science and technology experts committed to advancing innovation and collaboration in the private, public, and academic sectors, in their own nations and among Quad countries. Thank you to the Quad Fellowship for their support of this project. Furthermore, we gratefully acknowledge National Center for Geriatrics and Gerontology for their intellectual input regarding our research framework.

Authors’ Contributions

K.T. and T.O.: Conceptualized and designed the study, including the main ideas and proof outline. K.T.: Performed the formal analysis and developed the methodologies. H.O. and T.M.: Managed and organized the data. H.N., T.K., T.S., and S.K.: Contributed to interpreting the results and provided constructive feedback. K.T.: Drafted the initial article. H.O.: Reviewed and refined the article providing valuable insights. T.O. and H.T.: Supervised the project and contributed to finalizing the article. All authors reviewed and approved the final version.

Author Disclosure Statement

All authors do not have any conflict of interest.

Funding Information

No funding agency played any role in the preparation of this article.

Supplementary Material

Abbreviations Used

References

Sun

, Saeedi

, Karuranga

, et al. IDF diabetes atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045. Diabetes Res Clin Pract, 2022; 183:109119; doi: 10.1016/j.diabres.2021.109119

Norris

, Engelgau

, Narayan

. Effectiveness of self-management training in type 2 diabetes: A systematic review of randomized controlled trials. Diabetes Care, 2001; 24(3):561–587; doi: 10.2337/diacare.24.3.561

Fisher

, Brownson

, O’Toole

, et al. Ecological approaches to self-management: The case of diabetes. Am J Public Health, 2005; 95(9):1523–1535; doi: 10.2105/AJPH.2005.066084

Araki

. Individualized treatment of diabetes mellitus in older adults. Geriatr Gerontol Int, 2024; 24(12):1257–1268; doi: 10.1111/ggi.14979

Omura

, Araki

. Skeletal muscle as a treatment target for older adults with diabetes mellitus: The importance of a multimodal intervention based on functional category. Geriatr Gerontol Int, 2022; 22(2):110–120; doi: 10.1111/ggi.14339

Nakamura

, Yoshioka

, Katagiri

, et al. Causes of death in Japanese patients with diabetes based on the results of a survey of 68,555 cases during 2011–2020: Committee report on causes of death in diabetes mellitus, Japan Diabetes Society (English version). J of Diabetes Invest, 2024; 15(12):1821–1837; doi: 10.1111/jdi.14232

Guan

, Li

, Liu

, et al. Artificial intelligence in diabetes management: Advancements, opportunities, and challenges. Cell Rep Med, 2023; 4(10):101213; doi: 10.1016/j.xcrm.2023.101213

Ashrafzadeh

, Hamdy

. Patient-driven diabetes care of the future in the technology era. Cell Metab, 2019; 29(3):564–575; doi: 10.1016/j.cmet.2018.09.005

Nakhleh

, Spitzer

, Shehadeh

. ChatGPT’s response to the diabetes knowledge questionnaire: Implications for diabetes education. Diabetes Technol Ther, 2023; 25(8):571–573; doi: 10.1089/dia.2023.0134

10.

Ayers

, Poliak

, Dredze

, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med, 2023; 183(6):589–596; doi: 10.1001/jamainternmed.2023.1838

11.

, Huang

, Zheng

, et al. Application of artificial intelligence in diabetes education and management: Present status and promising prospect. Front Public Health, 2020; 8:173; doi: 10.3389/fpubh.2020.00173

12.

Tanaka

, Okazaki

, Omura

, et al. Enhancing diabetes management for older patients: The potential role of ChatGPT. Geriatr Gerontol Int, 2024; 24(8):816–817; doi: 10.1111/ggi.14933

13.

Zou

, Cai

, Chen

, et al. An exploratory study of conventional machine learning and large language models for sentiment analysis. In: Degen

, Ntoa

, eds. HCI International 2024 – Late Breaking Papers. HCII 2024. Lecture Notes in Computer Science, vol. 15382. Cham: Springer; 2024:213–224; doi: 10.1007/978-3-031-76827-9_17

14.

Roumeliotis

, Tselikas

, Nasiopoulos

. Leveraging large language models in tourism: A comparative study of the latest GPT Omni models and BERT NLP for customer review classification and sentiment analysis. Information, 2024; 15(12):792; doi: 10.3390/info15120792

15.

Yardley

, Spring

, Riper

, et al. Understanding and promoting effective engagement with digital behavior change interventions. Am J Prev Med, 2016; 51(5):833–842; doi: 10.1016/j.amepre.2016.06.015

16.

Kelders

, Kok

, Ossebaard

, et al. Persuasive system design does matter: A systematic review of adherence to web-based interventions. J Med Internet Res, 2012; 14(6):e152; doi: 10.2196/jmir.2104

17.

Wahl

, Cossy-Gantner

, Germann

, et al. Artificial intelligence (AI) and global health: How can AI contribute to health in resource-poor settings? BMJ Glob Health, 2018; 3(4):e000798; doi: 10.1136/bmjgh-2018-000798

18.

Timpel

, Oswald

, Schwarz

PEH

, et al. Mapping the evidence on the effectiveness of telemedicine interventions in diabetes, dyslipidemia, and hypertension: An umbrella review of systematic reviews and meta-analyses. J Med Internet Res, 2020; 22(3):e16791; doi: 10.2196/16791

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.50 MB

Comparative Analysis of ChatGPT-Based Artificial Intelligence for Diabetes Self-Management Support: Potential of Artificial Intelligence Stratified Implementation and Customization

Abstract

Objective:

Methods:

Results:

Conclusions:

Introduction

Materials and Methods

Study design

Development of the Diabetes Self-Management GPTs Support System

Case selection

Procedure

Evaluation criteria

Sentiment analysis

Data analysis

Ethical considerations

Results

Insulin administration techniques

Diabetes Self-Management GPTs support system

GPT-4 omni

GPT-o1 preview

Sentiment analysis

Case of an old diabetic patient with visual impairment

Diabetes Self-Management GPTs support system

GPT-4 omni

GPT-o1 preview

Sentiment analysis

Case of a pediatric diabetic patient facing stigma

Diabetes Self-Management GPTs support system

GPT-4 omni

GPT-o1 preview

Sentiment analysis

Case of a diabetic patient on “sick day”

Diabetes Self-Management GPTs support system

GPT-4 omni

GPT-o1 preview

Sentiment analysis

Tendency of responses

Discussion

Limitations

Conclusions

Footnotes

Acknowledgments

Authors’ Contributions

Author Disclosure Statement

Funding Information

Supplementary Material

Abbreviations Used

References

Supplementary Material