Abstract
Background
Solid organ transplantation relies on dynamic, evidence-based clinical algorithms. This study evaluates the educational reliability and feasibility of large language models (LLMs) for rapid protocol drafting by comparing GPT-4 and Gemini 2.5 Pro in generating step-wise diagnostic and management algorithms for common post-transplant scenarios. We assess LLMs as tools for synthesizing existing guidance rather than proposing new clinical practice guidelines.
Methods
Ten high-stakes post-transplant scenarios (eg, acute cellular rejection and unexplained graft dysfunction) were evaluated. Both LLMs received identical, structured prompts. Three transplant specialists independently scored each algorithm using a 5-point rubric for Clinical Concordance (primary outcome), Logical Flow, and Completeness. Inter-rater reliability was assessed with weighted kappa, and median scores were compared using non-parametric tests. Model identifiers, access dates, and prompting procedures were reported to support reproducibility.
Results
Inter-rater agreement was high (κ = 0.81). Gemini 2.5 Pro achieved a higher median Clinical Concordance score than GPT-4 (4.5 vs 3.8, P < .01) and higher Logical Flow and Completeness scores. Performance differences were most apparent in high-complexity scenarios requiring sequential differentiation of infectious vs alloimmune causes of graft dysfunction.
Conclusion
Gemini 2.5 Pro outperformed GPT-4 in generating clinically concordant, structured, and complete transplant algorithms. LLM outputs remain model- and version-dependent and require expert validation prior to any clinical use. In practice, LLMs should be used only within a governance framework (specialist review, institutional oversight, and periodic revalidation after model updates), not as autonomous clinical tools.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
