Sage Journals: Discover world-class research

Abstract

Objective

Our study aims to compare the performance of different large language model chatbots on surgical questions of different topics and categories.

Materials and Methods

Four different chatbots (ChatGPT 4.0, Medical Chat, Google Bard, and Copilot Ai) were used for our study. 114 multiple-choice surgical questions covering 9 different topics were entered into each chatbot, and their answers were recorded.

Results

The performance of ChatGPT was significantly better than Bard (P < 0.0001) and Medical Chat (P = 0.0013) but not significantly better than Copilot (P = 0.9663). We also found a statistically significant difference in ENT (P = 0.0199) and GI (P = 0.0124) questions between each chatbot when we assessed their performances per surgical specialty. Finally, the mean scores of Bard, Copilot, Medical Chat, and ChatGPT 4.0 on the diagnosis questions were higher than those in the management questions. The difference was only statistically significant, however, for Bard (P = 0.0281).

Conclusion

Our study offers insight into the performance of different chatbots on surgery-related questions and topics. The strengths and shortcomings of each can provide us with a better understanding of how to use Chatbots in the surgical field, including surgical education.

Keywords

large language models chatbots surgical education educational tools resident education innovation

Get full access to this article

View all access options for this article.

References

Guze

. Using technology to meet the challenges of medical education. Trans Am Clin Climatol Assoc. 2015;126:260-270.

Milne-Ives

de Cock

Lim

, et al. The effectiveness of artificial intelligence conversational agents in health care: systematic review. J Med Internet Res. 2020;22(10):e20346.

Prensky

. Digital natives, digital immigrants Part 2: do they really think differently? On the Horizon. 2001;9(6):1-6.

Kung

Cheatham

Medenilla

, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198.

Araji

Brooks

. Evaluating the role of ChatGPT as a study aid in medical education in surgery. J Surg Educ. 2024;81(5):753-757.

Lee

Shin

Tessier

, et al. Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis. 2024;20:603.

Eysenbach

. The role of ChatGPT, generative language models, and artificial intelligence in medical education: a conversation with ChatGPT and a call for papers. JMIR Med Educ. 2023;9:e46885.

Lillian

Kao

. Pre Test Surgery. 13th ed. McGraw Hill Medical; 2012.

Wang

Xiao

. The application of ChatGPT in medicine: a scoping review and bibliometric analysis. J Multidiscip Healthc. 2024;17:1681-1692.

10.

Liu

Wright

Patterson

, et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc. 2023;30(7):1237-1245.

11.

Rizwan

Sadiq

. The use of AI in diagnosing diseases and providing management plans: a consultation on cardiovascular disorders with ChatGPT. Cureus. 2023;15(8):e43106.

12.

Ghanem

Nassar

El Bachour

Hanna

. ChatGPT earns American board certification in hand surgery. Hand Surg Rehabil. 2024;43:101688.

13.

Huang

Tan

. The role of ChatGPT in scientific communication: writing better scientific review articles. Am J Cancer Res. 2023;13(4):1148-1154.

14.

Kocoń

Cichecki

Kaszyca

, et al. ChatGPT: jack of all trades, master of none. Information Fusion. 2023;99:101861.

15.

Titus

. Does ChatGPT have semantic understanding? A problem with the statistics-of-occurrence strategy. Cognitive Systems Research. 2024;83:101174.

16.

Jamal

Solaiman

Alhasan

Temsah

Sayed

. Integrating ChatGPT in medical education: adapting curricula to cultivate competent physicians for the AI era. Cureus. 2023;15(8):e43036.

17.

Hanna

. Exploring the applications of ChatGPT in family medicine education: five innovative ways for faculty integration. PRiMER. 2023;7:26.

18.

Lee

. The rise of ChatGPT: exploring its potential in medical education. Anat Sci Educ. 2023;17:926.

19.

Siontis

Attia

Asirvatham

Friedman

. ChatGPT hallucinating: can it get any more humanlike? Eur Heart J. 2023;45(5):321-323.