Abstract
Background:
Chat Generative Pre-Trained Transformer (ChatGPT) has emerged as a widely accessible large language model (LLM) with potential applications in medicine. While early literature has explored ChatGPT’s role in various surgical specialties, its impact on general surgery remains less defined. This systematic review evaluates current evidence on the educational, clinical, and research applications of ChatGPT within the field of general surgery.
Methods:
A comprehensive search was performed of PubMed, Cochrane Central, Scopus, SciELO, and LILACS from inception to December 2023. Studies were included if they evaluated the utility of ChatGPT in general surgery across educational, research, and clinical domains. We included both analytic data and descriptive studies. Studies involving other AI platforms and conference abstracts were excluded.
Results:
Of 550 screened studies, 23 met inclusion criteria and demonstrated ChatGPT’s broad applicability across surgical domains. Specifically, 6 studies demonstrated its capability to answer common questions about surgical diseases, 7 assessed its utility in clinical practice, 11 focused on educational applications, and 5 examined its potential role in research. Notably, ChatGPT exhibited proficiency in providing anatomical explanations and answering open-ended questions, achieving up to 87% accuracy for colorectal surgical questions, though performance was more variable for appendicitis queries. In board exam-style assessments, its accuracy ranged from 48% to 66% for open-ended questions and 68% to 76.4% in multiple-choice formats. Patient-facing responses were generally rated favorably, particularly in bariatric, transplant, and pancreatic surgery domains, with several studies highlighting ChatGPT’s clarity and comprehensiveness compared to traditional medical literature. In clinical decision-making scenarios, ChatGPT’s concordance with clinical experts varied widely across studies, from 0% to 86.7% in colorectal surgery studies and 30% in bariatric cases. ChatGPT proved effective in drafting informed consent documents and comprehensive surgical notes. However, limitations were observed in its ability to provide accurate references and in data extraction, though it did show promise in generating research ideas. Overall, while ChatGPT shows potential across education, clinical practice, and research, its reliance on human evaluation remains crucial.
Conclusion:
Overall, while ChatGPT shows significant potential across the realms of surgical education, clinical practice, and research, its outputs require ongoing human oversight and expert validation.
PROSPERO Registration:
CRD420251107155.
Get full access to this article
View all access options for this article.
