Abstract
Background:
Diabetes is a chronic condition requiring long-term management, and continuous health education is vital for improving disease awareness and self-management. Large language models (LLMs), advanced artificial intelligence systems trained on large text data sets, have shown promise in generating diabetes-related educational materials. While LLMs can generate accurate and readable content, most studies focus on general education based on guidelines, rather than tailoring content to individual patients’ clinical profiles. This study addresses these gaps by comparing the performance of three major LLMs (ChatGPT-4o, Doubao 1.5, and DeepSeek R1) in generating health education materials for discharged patients with diabetes.
Methods:
Ten de-identified medical records of discharged patients with diabetes were uploaded to the LLMs. Each model generated health education materials based on these records. Experienced diabetes nursing experts evaluated the quality of the generated materials.
Results:
The comprehensibility scores pass rates for all models were above 70%, with DeepSeek R1 performing the best (P < .01). The actionability scores pass rates were below 70% for all models, with no significant differences (P > .01). Accuracy scores for all models were ≥98%, and there were no significant differences in accuracy (P > .01). Similarly, no significant differences were observed in personalization and effectiveness scores (P > .01). DeepSeek R1 achieved the highest safety score, while Doubao 1.5 had the lowest safety score (P < .01).
Conclusion:
While ChatGPT-4o, Doubao 1.5, and DeepSeek R1 generate accurate and comprehensible materials, concerns remain regarding their actionability and safety. These findings suggest that LLMs should be used as auxiliary tools in diabetes education, requiring further refinement for personalized and actionable content.
Keywords
Get full access to this article
View all access options for this article.
