Sage Journals: Discover world-class research

Abstract

Objective:

This study aims to evaluate ChatGPT’s performance in addressing real-world otolaryngology patient questions, focusing on accuracy, comprehensiveness, and patient safety, to assess its suitability for integration into healthcare.

Methods:

A cross-sectional study was conducted using patient questions from the public online forum Reddit’s r/AskDocs, where medical advice is sought from healthcare professionals. Patient questions were input into ChatGPT (GPT-3.5), and responses were reviewed by 5 board-certified otolaryngologists. The evaluation criteria included difficulty, accuracy, comprehensiveness, and bedside manner/empathy. Statistical analysis explored the relationship between patient question characteristics and ChatGPT response scores. Potentially dangerous responses were also identified.

Results:

Patient questions averaged 224.93 words, while ChatGPT responses were longer at 414.93 words. The accuracy scores for ChatGPT responses were 3.76/5, comprehensiveness scores were 3.59/5, and bedside manner/empathy scores were 4.28/5. Longer patient questions did not correlate with higher response ratings. However, longer ChatGPT responses scored higher in bedside manner/empathy. Higher question difficulty correlated with lower comprehensiveness. Five responses were flagged as potentially dangerous.

Conclusion:

While ChatGPT exhibits promise in addressing otolaryngology patient questions, this study demonstrates its limitations, particularly in accuracy and comprehensiveness. The identification of potentially dangerous responses underscores the need for a cautious approach to AI in medical advice. Responsible integration of AI into healthcare necessitates thorough assessments of model performance and ethical considerations for patient safety.

Keywords

ChatGPT artificial intelligence patient safety otolaryngology outcomes

Get full access to this article

View all access options for this article.

References

Noorbakhsh-Sabet

Zand

Zhang

Abedi

Artificial intelligence transforms the future of health care. Am J Med. 2019;132:795-801.

Jiang

Zhi

, et al. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230-243.

Haug

Drazen

JM.

Artificial intelligence and machine learning in clinical medicine, 2023. N Engl J Med. 2023;388:1201-1208.

Deo

RC.

Machine learning in medicine. Circulation. 2015;132:1920-1930.

Pervez

Hasnain

MJU

Abbas

Moustafa

Aslam

Shah

SSM

. A comprehensive review of performance of next-generation sequencing platforms. Biomed Res Int. 2022;2022:3457806.

Coley

Barzilay

Green

Jaakkola

Jensen

KF.

Convolutional embedding of attributed molecular graphs for physical property prediction. J Chem Inf Model. 2017;57:1757-1772.

Hosny

Parmar

Quackenbush

Schwartz

Aerts

Artificial intelligence in radiology. Nat Rev Cancer. 2018;18:500-510.

Ayoub

Pulijala

The application of virtual reality and augmented reality in Oral & Maxillofacial Surgery. BMC Oral Health. 2019;19:238.

Thorp

HH.

ChatGPT is fun, but not an author. Science. 2023;379:313.

10.

Sallam

ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare. 2023;11:887.

11.

Campbell

Estephan

Mastrolonardo

Amin

Huntley

Boon

MS.

Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J Clin Sleep Med. 2023;19:1989-1995.

12.

Zalzal

Abraham

Cheng

Shah

RK.

Can ChatGPT help patients answer their otolaryngology questions?

Laryngoscope Investig Otolaryngol. 2024;9:e1193.

13.

Kung

Cheatham

Medenilla

, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2:e0000198.

14.

Else

Abstracts written by ChatGPT fool scientists. Nature. 2023;613:423.

15.

Patel

Lam

ChatGPT: the future of discharge summaries?

Lancet Digit Health. 2023;5:e107-e108.

16.

Ayoub

Lee

Grimm

Divi

Head-to-head comparison of ChatGPT versus Google search for medical knowledge acquisition. Otolaryngol Head Neck Surg. Published online 2 August 2023. doi:10.1002/ohn.465

17.

Cheong

RCT

Unadkat

McNeillis

, et al. Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard. Eur Arch Otorhinolaryngol. 2024;281:985-993.

18.

Bhattacharyya

Miller

Bhattacharyya

Miller

LE.

High rates of fabricated and inaccurate references in ChatGPT-generated medical content. Cureus. 2023;15:e39238.

19.

Brameier

Alnasser

Carnino

Bhashyam

von Keudell

Weaver

MJ.

Artificial intelligence in orthopaedic surgery: can a large language model “write” a believable orthopaedic journal article?

J Bone Joint Surg Am. 2023;105:1388-1392.

20.

Avery

Ghandi

Keating

The ‘Dr Google’ phenomenon—missed appendicitis. N Z Med J. 2012;125:135-137.

21.

Hochberg

Allon

Yom-Tov

Assessment of the frequency of online searches for symptoms before diagnosis: analysis of archival data. J Med Internet Res. 2020;22:e15065.

22.

Kuehn

BM.

More than one-third of US individuals use the Internet to self-diagnose. JAMA. 2013;309:756-757.

23.

Shen

Perez-Heydrich

Xie

Nellis

JC.

ChatGPT

vs.

web search for patient questions: what does ChatGPT do better?

Eur Arch Otorhinolaryngol. Published online 28 February 2024. doi:10.1007/s00405-024-08524-0

24.

State Population Totals: 2010-2020. 2021. Accessed March 2024. https://www.census.gov/programs-surveys/popest/technical-documentation/research/evaluation-estimates/2020-evaluation-estimates/2010s-state-total.html

25.

Ayers

Poliak

Dredze

, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183:589-596.

26.

ChatGPT — Release Notes. 2023. Accessed June 2023. https://help.openai.com/en/articles/6825453-chatgpt-release-notes

27.

Pushshift Reddit API v4.0 Documentation. Available at: https://reddit-api.readthedocs.io/en/latest/. Accessed June 2023.

28.

Clusmann

Kolbinger

Muti

, et al. The future landscape of large language models in medicine. Commun Med. 2023;3:141.

29.

Hosseini

Gao

Liebovitz

, et al. An exploratory survey about using ChatGPT in education, healthcare, and research. PLoS ONE. 2023;18:e0292216.

30.

Ravi

Neinstein

Murray

SG.

Large language models and medical education: preparing for a rapid transformation in how trainees will learn to be doctors. ATS Sch. 2023;4:282-292.

31.

Goodman

Patrinely

Stone

Jr , et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open. 2023;6:e2336483.

32.

Sharun

Banu

Pawde

, et al. ChatGPT and artificial hallucinations in stem cell research: assessing the accuracy of generated references – a preliminary study. Ann Med Surg. 2023;85:5275-5278.

33.

Athaluri

Manthena

Kesapragada

Yarlagadda

Dave

Duddumpudi

RTS

. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus. 2023;15:e37432.

34.

Shiraishi

Lee

Kanayama

Moriwaki

Okazaki

Appropriateness of artificial intelligence chatbots in diabetic foot ulcer management. Int J Low Extrem Wounds. Published online 28 Februray 2024. doi:10.1177/15347346241236811

35.

Ali

Tang

Connolly

, et al. Performance of ChatGPT, GPT-4, and Google bard on a neurosurgery oral boards preparation question bank. Neurosurgery. 2023;93:1090-1098.

36.

Brin

Sorin

Vaid

, et al. Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments. Sci Rep. 2023;13:16492.

37.

Kaneda

Namba

Kaneda

Tanimoto

Artificial intelligence in childcare: assessing the performance and acceptance of ChatGPT responses. Cureus. 2023;15:e44484.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB

Assessing ChatGPT’s Responses to Otolaryngology Patient Questions