Sage Journals: Discover world-class research

Abstract

Background:

Chat Generative Pre-Trained Transformer (ChatGPT) has emerged as a widely accessible large language model (LLM) with potential applications in medicine. While early literature has explored ChatGPT’s role in various surgical specialties, its impact on general surgery remains less defined. This systematic review evaluates current evidence on the educational, clinical, and research applications of ChatGPT within the field of general surgery.

Methods:

A comprehensive search was performed of PubMed, Cochrane Central, Scopus, SciELO, and LILACS from inception to December 2023. Studies were included if they evaluated the utility of ChatGPT in general surgery across educational, research, and clinical domains. We included both analytic data and descriptive studies. Studies involving other AI platforms and conference abstracts were excluded.

Results:

Of 550 screened studies, 23 met inclusion criteria and demonstrated ChatGPT’s broad applicability across surgical domains. Specifically, 6 studies demonstrated its capability to answer common questions about surgical diseases, 7 assessed its utility in clinical practice, 11 focused on educational applications, and 5 examined its potential role in research. Notably, ChatGPT exhibited proficiency in providing anatomical explanations and answering open-ended questions, achieving up to 87% accuracy for colorectal surgical questions, though performance was more variable for appendicitis queries. In board exam-style assessments, its accuracy ranged from 48% to 66% for open-ended questions and 68% to 76.4% in multiple-choice formats. Patient-facing responses were generally rated favorably, particularly in bariatric, transplant, and pancreatic surgery domains, with several studies highlighting ChatGPT’s clarity and comprehensiveness compared to traditional medical literature. In clinical decision-making scenarios, ChatGPT’s concordance with clinical experts varied widely across studies, from 0% to 86.7% in colorectal surgery studies and 30% in bariatric cases. ChatGPT proved effective in drafting informed consent documents and comprehensive surgical notes. However, limitations were observed in its ability to provide accurate references and in data extraction, though it did show promise in generating research ideas. Overall, while ChatGPT shows potential across education, clinical practice, and research, its reliance on human evaluation remains crucial.

Conclusion:

Overall, while ChatGPT shows significant potential across the realms of surgical education, clinical practice, and research, its outputs require ongoing human oversight and expert validation.

PROSPERO Registration:

CRD420251107155.

Keywords

artificial intelligence surgery large language model

Get full access to this article

View all access options for this article.

References

Borges

. A braver new world? Of chatbots and other cognoscenti. J Biosci, 2023; 48:10.

Graham

. Daily briefing: Will ChatGPT kill the essay assignment? Nature, 2022; doi: 10.1038/d41586-022-04437-2

Stokel-Walker

. AI bot ChatGPT writes smart essays — Should professors worry? Nature, 2022; doi: 10.1038/d41586-022-04397-7

Zhavoronkov

, GPT Transformer C. Rapamycin in the context of Pascal’s Wager: Generative pre-trained transformer perspective. Oncoscience, 2022; 9:82–84; doi: 10.18632/oncoscience.571

Castelvecchi

. Are ChatGPT and AlphaCode going to replace programmers? Nature, 2022; doi: 10.1038/d41586-022-04383-z

Rengers

, Thiels

, Salehinejad

. Academic surgery in the era of large language models. JAMA Surg, 2024; 159(4):445–450.

Kim

, Chua

, Rickard

, et al. ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. J Pediatr Urol, 2023; 19(5):598–604; doi: 10.1016/j.jpurol.2023.05.018

Thirunavukarasu

, Ting

DSJ

, Elangovan

, et al. Large language models in medicine. Nat Med, 2023; 29(8):1930–1940; doi: 10.1038/s41591-023-02448-8

Mello

, Guha

. ChatGPT and physicians’ malpractice risk. JAMA Health Forum, 2023; 4(5):e231938; doi: 10.1001/jamahealthforum.2023.1938

10.

Rafaqat

, Chu

, Kaafarani

. AI and ChatGPT Meet Surgery. Ann Surg, 2023; 278(5):e943–e944; doi: 10.1097/SLA.0000000000006000

11.

Sharma

, Ramchandani

, Thakker

, et al. ChatGPT in Plastic and Reconstructive Surgery. Indian J Plast Surg, 2023; 56(4):320–325; doi: 10.1055/s-0043-1771514

12.

Aljindan

, Shawosh

, Altamimi

, et al. Utilization of ChatGPT-4 in plastic and reconstructive surgery: A narrative review. Plast Reconstr Surg Glob Open, 2023; 11(10):e5305; doi: 10.1097/GOX.0000000000005305

13.

Chatterjee

, Bhattacharya

, Pal

, et al. ChatGPT and large language models in orthopedics: From education and surgery to research. J Exp Orthop, 2023; 10(1):128; doi: 10.1186/s40634-023-00700-1

14.

Kuang

, Zou

, Niu

, et al. ChatGPT encounters multiple opportunities and challenges in neurosurgery. Int J Surg, 2023; 109(10):2886–2891; doi: 10.1097/JS9.0000000000000571

15.

Lima

, Kasakewitch

, Nguyen

, et al. Machine learning, deep learning and hernia surgery. Are we pushing the limits of abdominal core health? A qualitative systematic review. Hernia, 2024; 28(4):1405–1412; doi: 10.1007/s10029-024-03069-x

16.

Goglia

, Pace

, Yusef

, et al. Artificial intelligence and ChatGPT in abdominopelvic surgery: A systematic review of applications and impact. In Vivo, 2024; 38(3):1009–1015; doi: 10.21873/invivo.13534

17.

Higgins

, Thomas

, Chandler

, et al. (editors). Cochrane Handbook for Systematic Reviews of Interventions version 6.4 (updated August 2023). Cochrane. 2023.

18.

Emile

, Horesh

, Freund

, et al. How appropriate are answers of online chat-based artificial intelligence (ChatGPT) to common questions on colon cancer? Surgery (United States), 2023; 174(5):1273–1275; doi: 10.1016/j.surg.2023.06.005

19.

Moazzam

, Cloyd

, Lima

, et al. Quality of ChatGPT responses to questions related to pancreatic cancer and its surgical care. Ann Surg Oncol, 2023; 30(11):6284–6286; doi: 10.1245/s10434-023-13777-w

20.

Lima

, Nogueira

, Chin

, et al. Appropriateness of online chat-based artificial intelligence (ChatGPT) answers to common questions on inguinal hernia repair. J Laparoendosc Adv Surg Tech A, 2024; 34(2):141–143; doi: 10.1089/lap.2023.0403

21.

Beaulieu-Jones

, Shah

, Berrigan

, et al. Evaluating capabilities of large language models: Performance of GPT4 on surgical knowledge assessments. medRxiv, 2023:2023.07.16.23292743; doi: 10.1101/2023.07.16.23292743

22.

Totlis

, Natsis

, Filos

, et al. The potential role of ChatGPT and artificial intelligence in anatomy education: A conversation with ChatGPT. Surg Radiol Anat, 2023; 45(10):1321–1329; doi: 10.1007/s00276-023-03229-1

23.

Gracias

, Siu

, Seth

, et al. Exploring the role of an artificial intelligence chatbot on appendicitis management: An experimental study on ChatGPT. ANZ J Surg, 2024; 94(3):342–352; doi: 10.1111/ans.18736

24.

, Choi

, Lee

. ChatGPT goes to operating room: Evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models ChatGPT goes to operating room. Ann Surg Treat Res, 2023; 104(5):269–273; doi: 10.1101/2023.03.16.23287340

25.

Endo

, Sasaki

, Moazzam

, et al. Quality of ChatGPT responses to questions related to liver transplantation. J Gastrointest Surg, 2023; 27(8):1716–1719; doi: 10.1007/s11605-023-05714-9

26.

Samaan

, Yeo

, Rajeev

, et al. Assessing the accuracy of responses by the language model ChatGPT to questions regarding bariatric surgery. Obes Surg, 2023; 33(6):1790–1796; doi: 10.1007/s11695-023-06603-5

27.

Moazzam

, Lima

, Endo

, et al. A paradigm shift: Online artificial intelligence platforms as an informational resource in bariatric surgery. Obes Surg, 2023; 33(8):2611–2614; doi: 10.1007/s11695-023-06675-3

28.

Lima

, Nogueira

, Liu

, et al. How appropriate are recommendations of online chat-based artificial intelligence (ChatGPT) to common questions on ventral hernia repair? J Laparoendosc Adv Surg Tech A, 2024; 34(4):365–367; doi: 10.1089/lap.2023.0475

29.

Breeding

, Martinez

, Patel

, et al. The utilization of ChatGPT in reshaping future medical education and learning perspectives: A curse or a blessing? Am Surg, 2024; 90(4):560–566; doi: 10.1177/00031348231180950

30.

Klang

, Portugez

, Gross

, et al. Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: A medical education pilot study with GPT-4. BMC Med Educ, 2023; 23(1):772; doi: 10.1186/s12909-023-04752-w

31.

Wang

, Liu

, Chen

, et al. AI’s deep dive into complex pediatric inguinal hernia issues: A challenge to traditional guidelines? Hernia, 2023; 27(6):1587–1599; doi: 10.1007/s10029-023-02900-1

32.

Ulloa

, Valenzuela

, Altamirano

, et al. Artificial intelligence-based decision-making: Can ChatGPT replace a multidisciplinary tumour board? Br J Surg, 2023; 110:1543–1544; doi: 10.1093/bjs/znad264

33.

Choo

, Ryu

, Kim

, et al. Conversational artificial intelligence (chatGPTTM) in the management of complex colorectal cancer patients: Early experience. ANZ J Surg, 2024; 94(3):356–361; doi: 10.1111/ans.18749

34.

Jazi

AHD

, Mahjoubi

, Shahabi

, et al. Bariatric evaluation through AI: A survey of expert opinions versus ChatGPT-4 (BETA-SEOV). Obes Surg, 2023; 33(12):3971–3980; doi: 10.1007/s11695-023-06903-w

35.

Decker

, Trang

, Ramirez

, et al. Large language model−based Chatbot vs surgeon-generated informed consent documentation for common procedures. JAMA Netw Open, 2023; 6(10):E2336997; doi: 10.1001/jamanetworkopen.2023.36997

36.

Robinson

, Aggarwal

. When precision meets penmanship: ChatGPT and surgery documentation. Cureus, 2023; 15(6):e40546; doi: 10.7759/cureus.40546

37.

ChatGPT

. Friend or foe? – Utility in trauma triage. Indian Journal of Critical Care Medicine, 2023; 27(8):561–564; doi: 10.5005/jp-journals-10071-24498

38.

Rafaqat

, Chu

, Kaafarani

. AI and ChatGPT meet surgery: A word of caution for surgeon-scientists. Ann Surg, 2023; 278(5):e943–e944; doi: 10.1097/SLA.0000000000006000

39.

Abu-Ashour

, Sherif

, Poenaru

, et al. Using artificial intelligence to label free-text operative and ultrasound reports for grading pediatric appendicitis; doi: 10.1101/2023.08.30.23294850

40.

WLJ

, Koussayer

, Sujka

. ChatGPT: Friend or foe in medical writing? An example of how ChatGPT can be utilized in writing case reports. Surg Pract Sci, 2023; 14:100185; doi: 10.1016/j.sipas.2023.100185

41.

Akabane

, Iwadoh

, Melcher

, et al. Exploring the potential of ChatGPT in generating unknown clinical questions about liver transplantation: A feasibility study. Liver Transpl, 2024; 30(2):229–234; doi: 10.1097/LVT.0000000000000246

42.

Khamassi

, Nahon

, Chatila

. Strong and weak alignment of large language models with human values. Sci Rep, 2024; 14(1):19399; doi: 10.1038/s41598-024-70031-3

43.

Grech

, Cuschieri

, Eldawlatly

. Artificial intelligence in medicine and research - The good, the bad, and the ugly. Saudi J Anaesth, 2023; 17(3):401–406; doi: 10.4103/sja.sja_344_23

44.

Stone

, Lurquin

. A tale of two cultures: How L. Luca Cavalli-Sforza bridged the gap between science and the humanities. Proc Natl Acad Sci U S A, 2024; 121(48):e2322878121; doi: 10.1073/pnas.2322878121

45.

Agathokleous

, Saitanis

, Fang

, et al. Use of ChatGPT: What does it mean for biology and environmental science? Sci Total Environ, 2023; 888:164154; doi: 10.1016/j.scitotenv.2023.164154

46.

Lee

, Choi

. Utilizing ChatGPT in clinical research related to anesthesiology: A comprehensive review of opportunities and limitations. Anesth Pain Med (Seoul), 2023; 18(3):244–251; doi: 10.17085/apm.23056

47.

Rezaeikhonakdar

. AI chatbots and challenges of HIPAA compliance for AI developers and vendors. J Law Med Ethics, 2023; 51(4):988–995; doi: 10.1017/jme.2024.15

48.

Spillias

, Ollerhead

, Andreotta

, et al. Evaluating generative AI to extract qualitative data from peer-reviewed documents. Research Square Preprint. doi: 10.21203/rs.3.rs-4922498/v1

The Evolving Role of ChatGPT (Chat-Generative Pre-Trained Transformer) in General Surgery: A Systematic Review

Abstract

Background:

Methods:

Results:

Conclusion:

PROSPERO Registration:

Keywords

Get full access to this article

References