Sage Journals: Discover world-class research

Abstract

Background: ChatGPT is an artificial intelligence tool, which utilizes machine learning to analyze and generate human-like text. The user-friendly accessibility of this tool enables patients conveniently access medical information without intricate terminology challenges. The objective of this study was to assess the accuracy of ChatGPT in providing insights into indications and management of complications after tonsillectomy, a common pediatric otolaryngology procedure. Methods: The responses generated by ChatGPT were compared to the “Clinical practice guidelines: tonsillectomy in children—executive summary” developed by the American Academy of Otolaryngology—Head and Neck Surgery Foundation (AAO-HNSF). An assessment was carried out by presenting predetermined questions regarding indications and complications post tonsillectomy to ChatGPT, followed by a comparison of its responses with the established guideline by 2 otolaryngology experts. The responses of both parties were reviewed by the senior author. Results: A total of 16 responses generated by ChatGPT were assessed. After a comprehensive review, it was concluded that 15 out of 16 (93.8%) responses demonstrated a high degree of reliability and accuracy, closely adhering to the standard established by the AAO-HNSF guideline. Conclusion: The results validate the potential of using ChatGPT to enhance healthcare delivery making guidelines more accessible to patients while also emphasizing the importance of ensuring the provision of accurate and reliable medical advice to patients.

Keywords

ChatGPT OpenAI artificial intelligence tonsillectomy otolaryngology

Introduction

ChatGPT is a conversational artificial intelligence (AI) tool developed by OpenAI. The tool is based on the generative pre-trained transformer (GPT) architecture, which uses machine learning to analyze and generate human-like text. ChatGPT is designed to be a resource for individuals seeking information on a wide range of topics. Gaining in popularity since its release, ChatGPT rose to the top 50 most popular websites worldwide,¹ and is becoming a significant tool in various fields, including medicine.²

The emergence of ChatGPT has ushered in a revolution in the field of medicine.³ For instance, the technology could assist physicians in decision-making process in radiology,⁴ provide suicide risk assessment,⁵ and advise patients on the most common hand procedures.⁶ The accessibility and user-friendliness of this tool to the general public has enabled patients to conveniently access a vast repository of medical knowledge without spending much time on extensive web search.⁷ When seeking advice from ChatGPT, users might be more open to revealing sensitive information relevant to their medical situation.⁷

People actively use the internet to search for health information, approximately 35% of adults search for symptoms appraisal, but this figure varies between 23% and 75%.⁸ Studies raised concerns about the quality of medical information available for online search,⁹ including information about tonsilitis and tonsillectomy, which are mostly of poor quality¹⁰ and readability¹¹ with minimal information about possible complications and different treatment options.¹⁰ Concerns pertaining to the accuracy of the instruction provided by ChatGPT, specifically in matters relating to healthcare, have been raised.^5,6,12-15

Tonsillectomy is one the most common procedure and, in our experience, associated with many questions parents usually have before or after surgery. We seek to evaluate the accuracy and efficacy of ChatGPT as a patient resource they can utilize to get information about tonsillectomy. This appraisal will focus on the reliability of the medical advice and instruction provided by ChatGPT.

Methods

In our study, we used ChatGPT based on the GPT-3.5 architecture that is available at no cost. Two users on separate devices presented ChatGPT with a list of preestablished questions, which were devised by 2 otolaryngologists and approved by the senior author (SJD), concerning surgical indications and complications post-tonsillectomy, based on 16 key action statements from the current guideline: “American Academy of Otolaryngology Head and Neck Surgery Foundation Clinical Practice Guideline: Tonsillectomy in Children (Update)—Executive Summary.”¹⁶ Both users reduced confounding bias by creating a novel ChatGPT session for each individual prompt, never combining prompt questions or follow-ups. Responses collected by the 2 users were recorded and compared with the statements published in the guideline. Two independent otolaryngologists evaluated the answer provided by ChatGPT on a 4-point scale: evaluating if the prompt aligned with the latest guidelines, cited the guidelines, explicitly referenced the American Academy of Otolaryngology—Head and Neck Surgery Foundation (AAO-HNSF), as well as communicated the importance of seeking a clinician’s input. Discrepancies in the assessments of the first and second reviewer were discussed together by 2 reviewers and the senior author (SJD) to reach a consensus. A visual representation of the workflow is depicted in Figure 1.

Figure 1.

Workflow diagram.

This study was reviewed by the McGill University Faculty of Medicine and Health Sciences Research Ethics Office and granted exemption in accordance with the institutional requirements.

Results

A total of 16 responses generated by ChatGPT regarding indications and complications post-tonsillectomy, seen in Table 1, were compared with the AAO-HNSF guideline. Following a thorough review, it was determined that these responses closely align with the gold standard set by the guideline. Both judges demonstrated complete agreement (Cohen’s K = 100%) that ChatGPT consistently offered comprehensive recommendations, aligning effectively with the information emphasized in the guidelines. Out of 16 responses, 15 prompts (93.8%) presented a high degree of correlation and closely aligned with the guideline. On the other hand, 1 out of 16 (6.2%) prompts did not capture the nuance described in the guideline statement. The clinical guideline stipulates that clinicians should administer a single intraoperative dose of intravenous dexamethasone to children undergoing tonsillectomy. When presented with the prompt: Should clinicians administer intra-operative steroids for children undergoing tonsillectomy? ChatGPT provided a broad and general response, without stipulating the type, dosage, or mechanism of steroid delivery. Refer to the Supplemental Table for the complete response.

Table 1.

ChatGPT Prompts.

Action point/statement (S)	Question asked on ChatGPT (Q)	Assessment of ChatGPT’s response (AR)
Action point/statement (S)	Question asked on ChatGPT (Q)		Closely aligned with guidelines	Cited AAO-HNSF	Explicitly referenced AAO-HNSF	Recommend seeking a clinician	Agreement between evaluators
S1: Watchful waiting for recurrent throat infection	Q1: What is the recommended watchful period for a throat infection before considering tonsillectomy?	AR1:	✓		✓	✓	Y
S2: Tonsillectomy for obstructive sleep-disordered breathing (oSDB)	Q2: Is obstructive sleep apnea (OSA) a recommendation for tonsillectomy?	AR2:	✓			✓	Y
S3: Indications for polysomnography	Q3: In a patient with Obstructive sleep-disordered breathing less than 2 years old, what are the indications of polysomnography before tonsillectomy?	AR3:	✓		✓	✓	Y
S4: Tonsillectomy for obstructive sleep apnea	Q4: Should clinician recommend tonsillectomy for children with OSA documented by polysomnography?	AR4:	✓	✓	✓	✓	Y
S5: Perioperative pain counselling	Q5: Should clinicians provide comprehensive perioperative pain counselling to patients and their caregivers regarding the management of post-tonsillectomy pain?	AR5:	✓		✓		Y
S6: Perioperative antibiotics	Q6: Should clinicians prescribe perioperative antibiotics for children undergoing tonsillectomy?	AR6:	✓	✓	✓	✓	Y
S7: Intraoperative steroids	Q7: Should clinicians administer intra-operative steroids for children undergoing tonsillectomy?	AR7:				✓	Y
S8: Inpatient monitoring for children after tonsillectomy	Q8: What are the indications for inpatient monitor for children who underwent a tonsillectomy?	AR8:	✓		✓	✓	Y
S9: Postoperative ibuprofen and acetaminophen	Q9: What pain management should be prescribed postoperatively for children who underwent a tonsillectomy?	AR9:	✓			✓	Y
S10: Postoperative codeine	Q10: Should codeine be used in post-operative management pain for children who underwent a tonsillectomy?	AR10:	✓		✓	✓	Y
S11: Post-tonsillectomy bleeding	Q11: How long should the assessment of bleeding occur postoperatively in children who underwent a tonsillectomy?	AR11:	✓			✓	Y
S12: How long is the recovery after surgery?	Q12: How long is the recovery after tonsillectomy?	AR12:	✓			✓	Y
S13. Do children need to restrict their diet after surgery?	Q13. Do children need to restrict their diet after tonsillectomy?	AR13:	✓			✓	Y
S14: Will other things besides pain medication help my child’s pain?	Q14. Will other things besides pain medication help my child’s pain?	AR14:	✓		✓	✓	Y
S15: What should I do if I cannot manage my child’s pain?	Q15: What should I do if I cannot manage my child’s pain post tonsillectomy?	AR15:	✓		✓	✓	Y
S16: Will a tonsillectomy cure my child’s oSDB?	Q16: Will a tonsillectomy cure my child’s oSDB?	AR16:	✓			✓	Y

In 2 out of 16 (12.5%) occasions, ChatGPT cited the AAO-HNSF guideline in its response. These references include the following example for statement 4, where ChatGPT explicitly outlines how “The American Academy of Otolaryngology—Head and Neck Surgery (AAO-HNS) has published clinical practice guidelines for the management of pediatric OSA and according to these guidelines tonsillectomy is considered the first-line treatment for children with OSA and enlarged tonsils.” The same explicit reference pertains to statement 6, where ChatGPT cites the guidelines verbatim, refer to the Supplemental Table. Fifteen out of 16 (93.8%) of ChatGPT’s responses made at least one direct reference to consult with a specialist or healthcare provider when considering all available treatment options and inherent risk factors.

A comprehensive breakdown of all responses generated by ChatGPT is available as Supplemental Data in Table S1.

Discussion

While ChatGPT assumes a role that refrains from providing medical advice, it aims to facilitate user consideration of options and comprehension of the multifaceted factors encompassing their specific medical situations. The answers provided by ChatGPT are concise, easily understandable, and tailored to meet the needs of parents who may not be familiar with medical terminology.

Fifteen out of 16 of ChatGPT’s responses made at least one direct reference to consult with a specialist or healthcare provider when considering all available treatment options and inherent risk factors associated with a patient’s particular case. This provides somewhat of a safety-net, considering that parents might ask 2 or 3 questions they are mainly concerned about and receive a reminder to consult with a healthcare professional. This trend was observed in other studies.^6,17 It is not the role of ChatGPT to provide medical advice and the AI-generated responses made this abundantly clear to users who could potentially reduce the risk of avoiding seeking medical attention which can affect the final treatment outcome.¹⁰

In addition, the incorporation of supplementary resources such as AI offers physicians the opportunity to augment their efficiency and facilitate the accessibility of information to parents.^2,4-6,18-22 Notably, these responses are not only succinct but also effectively tailored for comprehension by parents who may lack familiarity with medical terminology.²³ Ensuring the reliability and accuracy of information provided to patients and their caregivers is of paramount importance, particularly in matters concerning health and patient education.²¹

While AI and related resources contribute significantly to efficiency and accessibility, the fundamental responsibility remains in ensuring that the information delivered to patients and their caregivers is trustworthy, precise, and aligned with the highest standards of healthcare practice.^15,18 However, the results of the current study should be viewed carefully in light of several shortcomings. Limitations of the study include the use of a small number of prompts and a rapidly evolving AI system, which is demonstrating improvement with each update as ChatGPT acquires new knowledge through learning.

The determination of information sources employed by ChatGPT becomes crucial, along with concerns regarding the up-to-dateness of the information.^2,18,24,25 The dynamic nature of guidelines necessitates their periodic updates, reflecting advancements in medical knowledge, and evolving best practices. However, the inherent challenge lies in the ability of ChatGPT to effectively adapt and accommodate these changes, particularly in the fast-paced realm of general practice in healthcare. The potential risk arises from the reliance on ChatGPT’s capacity to remain updated and aligned with the latest guidelines, and its ability to grade quality of information that is used to generate response, such as choosing between scientific publications in reputable journals over articles in popular web resources or social media platforms that describe the same medical issues.¹⁷ This could be improved by pre-training AI models on selected information or developing algorithms to prioritize scientific data over general information on the internet.

In the question regarding the usage of dexamethasone intraoperatively, ChatGPT’s response may not have been incorrect, as it acknowledged the consideration of all pertinent factors in determining management plans and treatment options; however, it did not provide a definitive answer. This has been referred to in previous study as to the noncommittal output of ChatGPT.²⁵ A limitation of ChatGPT lies in its ability to capture the tone or context of a prompt.¹⁸ AI language models can at times fail to fully capture the nuances of a query.²⁴ When provided with a clinically oriented prompt, future AI-based clinical decision-making tools must address the limitations in identifying and reporting standard of care guidelines.⁴ Prompt-fine tuning to set the tone and context of a question or concern is imperative in order to enhance accuracy of ChatGPT’s responses. It is noteworthy that despite clinical practice guidelines firmly supporting the aforementioned treatment plan, substantiated by literature evidence and expert review, ChatGPT did not supply an unequivocal response to the query.

Conclusion

The emergence of ChatGPT has revolutionized the accessibility of information available online, enabling patients to access a vast repository of medical knowledge. The reliability and comprehensiveness of ChatGPT’s responses are evident in a comparison with the AAO-HNSF guideline, where the tool provided predominantly comprehensive, easy to read answers, often including ancillary facts and references to support its responses. Although there was a slight discrepancy between official guideline statement and ChatGPT response in one question, there is no risk for the child if parents were to follow ChatGPT’s recommendation.

Physicians have to be aware of the limitations of ChatGPT and should consider it as an adjunct to the service they provide, which has to be supervised to ensure the safety and accuracy of the information patients receive.

Further studies are needed to establish the role and discover the potential of ChatGPT and similar AI tools in healthcare.

Supplemental Material

sj-docx-1-ear-10.1177_01455613241230841 – Supplemental material for Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy?

Supplemental material, sj-docx-1-ear-10.1177_01455613241230841 for Can ChatGPT Replace an Otolaryngologist in Guiding Parents on Tonsillectomy? by Alexander Moise, Adam Centomo-Bozzo, Ostap Orishchak, Mohammed K. Alnoury and Sam J. Daniel in Ear, Nose & Throat Journal

Footnotes

Acknowledgements

The authors would like to note that there are no acknowledgments for this article.

Author Contributions

Conceptualization and methodology, AM and SJD; data curation, AM and ACB; evaluation, OO and MKA; validation, SJD; writing—original draft preparation, AM; writing—review and editing, AM, ACB, OO, MKA and SJD; visualization, ACB; supervision SJD. All authors have read and agreed to the published version of the manuscript.

Data Availability

The authors confirm that the data supporting the findings of this study are available within the article and its supplementary data file.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethical Approval

Ethical review and approval were waived for this study by the McGill University Faculty of Medicine and Health Sciences Research Ethics Office on April 18, 2023, in accordance with the institutional requirements.

ORCID iDs

Alexander Moise

Mohammed K. Alnoury

Supplemental Material

Supplemental material for this article is available online.

References

How ChatGPT managed to grow faster than Tiktok or Instagram. Google Search. Accessed July 5, 2023. https://www.google.com/search?q=how+chatgpt+managed+to+grow+faster+than+tiktok+or+instagram&rlz=1C1GCEA_enCA987CA988&oq=How+ChatGPT+Managed+to+Grow+Faster+Than+TikTok+or+Instagram&gs_lcrp=EgZjaHJvbWUqBwgAEAAYgAQyBwgAEAAYgAQyCggBEAAYhgMYigUyCggCEAAYhgMYigUyCggDEAAYhgMYigUyCggEEAAYhgMYigXSAQc1NDZqMGo0qAIAsAIA&sourceid=chrome&ie=UTF-8

Sallam

. ChatGPT Utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare (Basel). 2023;11(6):887. doi:10.3390/healthcare11060887

Mesko

. The ChatGPT (Generative Artificial Intelligence) revolution has made artificial intelligence approachable for medical professionals. J Med Internet Res. 2023;25:e48392. doi:10.2196/48392

Rao

Kim

Kamineni

, et al Evaluating GPT as an adjunct for radiologic decision making: GPT-4 versus GPT-3.5 in a breast imaging pilot. J Am Coll Radiol. 2023;20(10):990-997. doi:10.1016/j.jacr.2023.05.003

Levkovich

Elyoseph

. Suicide risk assessments through the eyes of ChatGPT-3.5 versus ChatGPT-4: vignette study. JMIR Ment Health. 2023;10:e51232. doi:10.2196/51232

Crook

Park

Hurley

Richard

Pidgeon

. Evaluation of online artificial intelligence-generated information on common hand procedures. J Hand Surg Am. 2023;48(11):1122-1127. doi:10.1016/j.jhsa.2023.08.003

Shahsavar

Choudhury

. User intentions to use ChatGPT for self-diagnosis and health-related purposes: cross-sectional survey study. JMIR Hum Factors. 2023;10:e47564. doi:10.2196/47564

Mueller

Jay

Harper

Davies

Vega

Todd

. Web use for symptom appraisal of physical health conditions: a systematic review. J Med Internet Res. 2017;19(6):e202. doi:10.2196/jmir.6755

Eysenbach

Powell

Kuss

. Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. JAMA. 2002;287(20):2691-2700. doi:10.1001/jama.287.20.2691

10.

Kwan

Yip

HCA

Tan

Fan

. A quality assessment of online patient information regarding tonsillitis using the EQIP tool. Int J Pediatr Otorhinolaryngol. 2022;159:111224. doi:10.1016/j.ijporl.2022.111224

11.

Chi

Jabbour

Aaronson

. Quality and readability of websites for patient information on tonsillectomy and sleep apnea. Int J Pediatr Otorhinolaryngol. 2017;98:1-3. doi:10.1016/j.ijporl.2017.04.031

12.

Májovský

Černý

Kasal

Komarc

Netuka

. Artificial intelligence can generate fraudulent but authentic-looking scientific medical articles: Pandora’s box has been opened. J Med Internet Res. 2023;25:e46924. doi:10.2196/46924

13.

Vaishya

Misra

Vaish

. ChatGPT: is this version good for healthcare and research? Diabetes Metab Syndr. 2023;17(4):102744. doi:10.1016/j.dsx.2023.102744

14.

Alkaissi

McFarlane

. Artificial hallucinations in ChatGPT: implications in scientific writing. Cureus. 2023;15(2):e35179. doi:10.7759/cureus.35179

15.

Parviainen

Rantala

. Chatbot breakthrough in the 2020s? An ethical reflection on the trend of automated consultations in health care. Med Health Care Philos. 2022;25(1):61-71. doi:10.1007/s11019-021-10049-w

16.

Mitchell

Archer

Ishman

, et al Clinical practice guideline: tonsillectomy in children (update)—executive summary. Otolaryngol Head Neck Surg. 2019;160(2):187-205. doi:10.1177/0194599818807917

17.

Goodman

Patrinely

Stone

Jr , et al Accuracy and reliability of chatbot responses to physician questions. JAMA Network Open. 2023;6(10):e2336483. doi:10.1001/jamanetworkopen.2023.36483

18.

Javaid

Haleem

Singh

. ChatGPT for healthcare services: an emerging stage for an innovative perspective. TBench. 2023;3(1):100105. doi:10.1016/j.tbench.2023.100105

19.

Carlbring

Hadjistavropoulos

Kleiboer

Andersson

. A new era in internet interventions: the advent of Chat-GPT and AI-assisted therapist guidance. Internet Interv. 2023;32:100621. doi:10.1016/j.invent.2023.100621

20.

Chiesa-Estomba

Lechien

Vaira

, et al Exploring the potential of Chat-GPT as a supportive tool for sialendoscopy clinical decision making and patient information support. Eur Arch Otorhinolaryngol. Published online July 5, 2023. doi:10.1007/s00405-023-08104-8

21.

Ortel

. The rise of AI chatbots in healthcare: a comparison of ChatGPT and physicians in responding to patient questions [Internet]. Accessed November 29, 2023. Available from: https://www.linkedin.com/pulse/rise-ai-chatbots-healthcare-comparison-chatgpt-physicians-ortel

22.

Nov

Singh

Mann

. Putting ChatGPT’s medical advice to the (Turing) test. JMIR Med Educ. 2023;9:e46939. doi:10.2196/46939

23.

Ayers

Poliak

Dredze

, et al Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med. 2023;183(6):589-596. doi:10.1001/jamainternmed.2023.1838

24.

Zheng

Feng

Wang

Kang

Zhao

. Enhancing diabetes self-management and education: a critical analysis of ChatGPT’s role. Ann Biomed Eng. Published online August 8, 2023. doi:10.1007/s10439-023-03317-8

25.

Rajjoub

Arroyave

Zaidat

, et al ChatGPT and its role in the decision-making for the diagnosis and treatment of lumbar spinal stenosis: a comparative analysis and narrative review. Global Spine J. Published online August 10, 2023. doi:10.1177/21925682231195783

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.04 MB