Abstract
Background:
Conversational artificial intelligence agents, or chatbots, are a transformational technology understudied in end-of-life care.
Methods:
OpenAI’s ChatGPT, Google’s Bard, and Microsoft’s Bing were asked to define “terminally ill,” “end of life,” “transitions of care,” “actively dying,” and provide three references. Outputs were scored by six physicians on a scale of 0–10 for accuracy, comprehensiveness, and credibility. Flesch-Kincaid Grade Level and Flesch Reading Ease (FRE) were used to calculate readability.
Results:
Mean (standard deviation) scores for accuracy were 9 (1.9) for ChatGPT, 7.5 (2.4) for Bard, and 8.3 (2.4) for Bing. Comprehensiveness scores averaged 8.5 (1.7) for ChatGPT, 7.3 (2.1) for Bard, and 6.5 (2.3) for Bing. Credibility was low with a mean score of 3 (1.8). The mean FRE score was 41.7, and the mean grade level was 14.1, indicating low readability.
Conclusion:
Chatbot outputs had important deficiencies that necessitated clinician oversight to prevent misinformation.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
