Abstract
Background:
Smoking is associated with higher complication and recurrence rates in ventral and inguinal hernia repairs, but evidence is fragmented. This study evaluated the efficacy of AI-based large language models (LLMs) for identifying literature on the impact of smoking on hernia repairs.
Methods:
ChatGPT 4.0, ChatGPT 4o, Microsoft Copilot, and Google Gemini were instructed to search PubMed, Embase, and Scopus for retrospective/prospective studies and randomized controlled trials regarding smoking’s effects on ventral and inguinal hernia repairs. The models’ outputs were cross-checked against previous systematic reviews to assess accuracy.
Results:
The artificial intelligence (AI) tools generated 24 citations, of which only nine (37.5%) proved valid and relevant. Thirteen (54.2%) were fabricated references, and two (8.3%) cited studies that did not match the specified criteria. Additionally, the AIs identified two studies missed by previous systematic reviews but overlooked 35 (79.5%) recognized by those reviews.
Conclusions:
Although LLMs can quickly compile potentially relevant references, they are prone to fabricating or omitting crucial studies. Human verification remains essential for conducting reliable, comprehensive literature searches in systematic reviews and meta-analyses.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
