Sage Journals: Discover world-class research

Abstract

Background:

Diabetes is a chronic condition requiring long-term management, and continuous health education is vital for improving disease awareness and self-management. Large language models (LLMs), advanced artificial intelligence systems trained on large text data sets, have shown promise in generating diabetes-related educational materials. While LLMs can generate accurate and readable content, most studies focus on general education based on guidelines, rather than tailoring content to individual patients’ clinical profiles. This study addresses these gaps by comparing the performance of three major LLMs (ChatGPT-4o, Doubao 1.5, and DeepSeek R1) in generating health education materials for discharged patients with diabetes.

Methods:

Ten de-identified medical records of discharged patients with diabetes were uploaded to the LLMs. Each model generated health education materials based on these records. Experienced diabetes nursing experts evaluated the quality of the generated materials.

Results:

The comprehensibility scores pass rates for all models were above 70%, with DeepSeek R1 performing the best (P < .01). The actionability scores pass rates were below 70% for all models, with no significant differences (P > .01). Accuracy scores for all models were ≥98%, and there were no significant differences in accuracy (P > .01). Similarly, no significant differences were observed in personalization and effectiveness scores (P > .01). DeepSeek R1 achieved the highest safety score, while Doubao 1.5 had the lowest safety score (P < .01).

Conclusion:

While ChatGPT-4o, Doubao 1.5, and DeepSeek R1 generate accurate and comprehensible materials, concerns remain regarding their actionability and safety. These findings suggest that LLMs should be used as auxiliary tools in diabetes education, requiring further refinement for personalized and actionable content.

Keywords

patients with diabetes artificial intelligence large language models ChatGPT DeepSeek health education material

Get full access to this article

View all access options for this article.

References

GBD 2021 Diabetes Collaborators. Global, regional, and national burden of diabetes from 1990 to 2021, with projections of prevalence to 2050: a systematic analysis for the Global Burden of Disease Study 2021. Lancet. 2023;402(10397):203-234. doi:10.1016/S0140-6736(23)01301-6.

Cashmore

Cooper

Evangelidis

Green

Lopez-Vargas

Tunnicliffe

DJ.

Education programmes for people with chronic kidney disease and diabetes. Cochrane Database Syst Rev. 2024;8(8):CD007374. doi:10.1002/14651858.CD007374.pub3.

BMC Medicine. Diabetes education for better personalized management in pediatric patients. BMC Med. 2023;21(1):30. doi:10.1186/s12916-022-02709-2.

Vaughan

Virtual reality meets diabetes. J Diabetes Sci Technol. 2025;19(3):810-819. doi:10.1177/19322968231222022.

Cunningham

Stoddart

Wild

Conway

Gray

Wake

DJ.

Cost-utility of an online education platform and diabetes personal health record: analysis over ten years. J Diabetes Sci Technol. 2023;17(3):715-726. doi:10.1177/19322968211069172.

Whittemore

Vilar-Compte

De La Cerda

, et al. Challenges to diabetes self-management for adults with type 2 diabetes in low-resource settings in Mexico City: a qualitative descriptive study. Int J Equity Health. 2019;18(1):133. doi:10.1186/s12939-019-1035-x.

Sharma

Pajai

Prasad

, et al. A critical review of ChatGPT as a potential substitute for diabetes educators. Cureus. 2023;15(5):e38380. doi:10.7759/cureus.38380.

Stokel-Walker

Van Noorden

What ChatGPT and generative AI mean for science. Nature. 2023;614(7947):214-216. doi:10.1038/d41586-023-00340-6.

Temsah

Alhasan

Altamimi

, et al. DeepSeek in healthcare: revealing opportunities and steering challenges of a new open-source artificial intelligence frontier. Cureus. 2025;17(2):e79221. doi:10.7759/cureus.79221.

10.

Hirosawa

Shimizu

The potential, limitations, and future of diagnostics enhanced by generative artificial intelligence. Diagnosis. 2024;11(4):446-449. doi:10.1515/dx-2024-0095.

11.

Jiang

Guan

, et al. Large language models for diabetes training: a prospective study. Sci Bull. 2025;70(6):934-942.

12.

Cheng

Zhang

Liang

, et al. Comparison of artificial intelligence-generated and physician-generated patient education materials on early diabetic kidney disease. Front Endocrinol. 2025;16:1559265.

13.

Wang

Liang

, et al. Enhancement of the performance of large language models in diabetes education through retrieval-augmented generation: comparative study. J Med Internet Res. 2024;26:e58041. doi:10.2196/58041.

14.

Aypar Akbağ

. Assessing artificial intelligence-generated patient educational material on gestational diabetes mellitus: content and quality evaluation. J Perinat Neonatal Nurs. 2025;39(3):210-217.

15.

Shoemaker

Wolf

Brach

Development of the patient education materials assessment tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns. 2014;96(3):395-403. doi:10.1016/j.pec.2014.05.027.

16.

Huang

. Chinesization and Application of Health Education Assessments Tool for Printable Materials [dissertation]. Guangzhou: Guangdong Pharmaceutical University.

17.

Shah

Ghosh

Hochberg

, et al. Comparison of ChatGPT and traditional patient education materials for men’s health. Urol Pract. 2024;11(1):87-94. doi:10.1097/UPJ.0000000000000490.

18.

Vaira

Lechien

Abbate

, et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. 2024;170(6):1492-1503. doi:10.1002/ohn.489.

19.

Shi

Zhang

Zhu

Using the Delphi method to identify risk factors contributing to adverse events in residential aged care facilities. Risk Manag Healthc Policy. 2020;13:523-537. doi:10.2147/RMHP.S243929.

20.

Shieh

Hosei

Printed health information materials: evaluation of readability and suitability. J Community Health Nurs. 2008;25(2):73-90. doi:10.1080/07370010802017083.

21.

Ghanem

Rouhi

Al-Houssan

, et al. Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis. Surg Endosc. 2024;38(5):2887-2893. doi:10.1007/s00464-024-10739-5.

22.

Doru

Maier

Busse

, et al. Detecting artificial intelligence-generated versus human-written medical student essays: semirandomized controlled study. JMIR Med Educ. 2025;11:e62779. doi:10.2196/62779.

23.

Messeri

Crockett

MJ.

Artificial intelligence and illusions of understanding in scientific research. Nature. 2024;627(8002):49-58. doi:10.1038/s41586-024-07146-0.

24.

Liu

Xia

Wang

, et al. Study on the effects of generative artificial intelligence in ICU novice simulation instructor case design training. Chin J Nurs Educ. 2025;22(3):272-278.

25.

Ponzo

Goitre

Favaro

, et al. Is ChatGPT an effective tool for providing dietary advice? Nutrients. 2024;16(4):469. doi:10.3390/nu16040469.

26.

Papastratis

Stergioulas

Konstantinidis

Daras

Dimitropoulos

Can ChatGPT provide appropriate meal plans for NCD patients?

Nutrition. 2024;121:112291. doi:10.1016/j.nut.2023.112291.

27.

Shah

Sharma

, et al. Large language model prompting techniques for advancement in clinical medicine. J Clin Med. 2024;13(17):5101. doi:10.3390/jcm13175101.

28.

Abd-Alrazaq

AlSaad

Alhuwail

, et al. Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9:e48291. doi:10.2196/48291.

Evaluating Large Language Models-Generated Health Education Materials for Discharged Patients with Diabetes: A Comparative Analysis