LLM-Generated Transplant Algorithms: Comparing GPT and Gemini in Producing Step-Wise Diagnostic and Management Pathways

Abstract

Background

Solid organ transplantation relies on dynamic, evidence-based clinical algorithms. This study evaluates the educational reliability and feasibility of large language models (LLMs) for rapid protocol drafting by comparing GPT-4 and Gemini 2.5 Pro in generating step-wise diagnostic and management algorithms for common post-transplant scenarios. We assess LLMs as tools for synthesizing existing guidance rather than proposing new clinical practice guidelines.

Methods

Ten high-stakes post-transplant scenarios (eg, acute cellular rejection and unexplained graft dysfunction) were evaluated. Both LLMs received identical, structured prompts. Three transplant specialists independently scored each algorithm using a 5-point rubric for Clinical Concordance (primary outcome), Logical Flow, and Completeness. Inter-rater reliability was assessed with weighted kappa, and median scores were compared using non-parametric tests. Model identifiers, access dates, and prompting procedures were reported to support reproducibility.

Results

Inter-rater agreement was high (κ = 0.81). Gemini 2.5 Pro achieved a higher median Clinical Concordance score than GPT-4 (4.5 vs 3.8, P < .01) and higher Logical Flow and Completeness scores. Performance differences were most apparent in high-complexity scenarios requiring sequential differentiation of infectious vs alloimmune causes of graft dysfunction.

Conclusion

Gemini 2.5 Pro outperformed GPT-4 in generating clinically concordant, structured, and complete transplant algorithms. LLM outputs remain model- and version-dependent and require expert validation prior to any clinical use. In practice, LLMs should be used only within a governance framework (specialist review, institutional oversight, and periodic revalidation after model updates), not as autonomous clinical tools.

Keywords

large language models (LLMs)transplantation clinical algorithms GPT-4 Gemini diagnostic pathway

Get full access to this article

View all access options for this article.

References

Hariharan

Johnson

Bresnahan

, et al. Improved graft survival after renal transplantation in the United States, 1988 to 1996. N Engl J Med. 2000;342(9):605-612. doi:10.1056/NEJM200003023420901. PMID: 10699159.

Fishman

. Infection in solid-organ transplant recipients. N Engl J Med. 2007;357(25):2601-2614. doi:10.1056/NEJMra064928. PMID: 18094380.

de Jonge

Kirkels

Klöpping

, et al. Guidelines for heart transplantation. Neth Heart J. 2008;16(3):79-87. doi:10.1007/BF03086123. PMID: 18345330; PMCID: PMC2266869.

Kasiske

Zeier

Chapman

, et al. Kidney disease: improving global outcomes. KDIGO clinical practice guideline for the care of kidney transplant recipients: a summary. Kidney Int. 2010;77(4):299-311. doi:10.1038/ki.2009.377. Epub 2009 Oct 21. PMID: 19847156.

Lenzer

. Why we can’t trust clinical guidelines. BMJ. 2013;346:f3830.

Alkalbani

Alrawahi

Salah

, et al. A systematic review of large language models in medical specialties: applications, challenges and future directions. Information. 2025;16(6):489. https://doi.org/10.3390/info16060489

Kung

Cheatham

Medenilla

, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198. PMID: 36812645; PMCID: PMC9931230.

Lee

Bubeck

Dong

, et al. The benefits of a foundation model in the medical domain. J Am Med Inform Assoc. 2023;30(4):783-793.

Elhaddad

Hamam

. AI-Driven clinical decision support systems: an ongoing pursuit of potential. Cureus. 2024;16(4):e57728. doi:10.7759/cureus.57728. PMID: 38711724; PMCID: PMC11073764.

10.

Dinc

Bardak

Bahar

Noronha

. Comparative analysis of large language models in clinical diagnosis: performance evaluation across common and complex medical cases. JAMIA Open. 2025;8(3):ooaf055. doi:10.1093/jamiaopen/ooaf055. PMID: 40510808; PMCID: PMC12161448.

11.

DeFilippis

Farr

Givertz

. Challenges in heart transplantation in the era of COVID-19. Circulation. 2020;141(25):2048-2051. doi:10.1161/CIRCULATIONAHA.120.047096. Epub 2020 Apr 21. PMID: 32314596; PMCID: PMC7302092.

12.

Ljungman

. Viral infections in hematopoietic stem cell transplant recipients. Allogeneic Stem Cell Transplantation. 2009;27:505-532. doi:10.1007/978-1-59745-478-0_29. PMCID: PMC7120500.

13.

Team G. Gemini . A Family of Highly Capable Multimodal Models. arXiv preprint arXiv:2312.11805. 2023.

14.

OpenAI . GPT-4 Technical Report. Arxiv Preprint arXiv:2303.08774. 2023.

15.

Brown

Mann

Ritter

, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877-1901.

16.

Martin

DiMartini

Feng

, et al. Evaluation for liver transplantation in adults: 2013 practice guideline by the American association for the study of liver diseases and the American society of transplantation. Hepatology. 2014;59(3):1144-1165. doi:10.1002/hep.26972. PMID: 24716201.

17.

Blyth

Lee

Sims

, et al. Risk factors and clinical outcomes of cytomegalovirus disease occurring more than one year post solid organ transplantation. Transpl Infect Dis. 2012;14(2):149-155. doi:10.1111/j.1399-3062.2011.00705.x. Epub 2012 Jan 20. PMID: 22260410.

18.

Zeyland

Lipiński

Słomski

. The current state of xenotransplantation. J Appl Genet. 2015;56(2):211-218. doi:10.1007/s13353-014-0261-6. Epub 2014 Dec 7. PMID: 25487710; PMCID: PMC4412840.

19.

Alowais

Alghamdi

Alsuhebany

, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23(1):689. doi:10.1186/s12909-023-04698-z. PMID: 37740191; PMCID: PMC10517477.

20.

Maity

Saikia

. Large language models in healthcare and medical applications: a review. Bioengineering (Basel). 2025;12(6):631. doi:10.3390/bioengineering12060631. PMID: 40564447; PMCID: PMC12189880.

21.

Sun

, et al. Large language models in medical diagnostics: scoping review with bibliometric analysis. J Med Internet Res. 2025;27:e72062. doi:10.2196/72062. PMID: 40489764; PMCID: PMC12186007.

22.

Salvadori

Tsalouchos

. Innovative immunosuppression in kidney transplantation: a challenge for unmet needs. World J Transplant. 2022;12(3):27-41. doi:10.5500/wjt.v12.i3.27. PMID: 35433332; PMCID: PMC8968476.

23.

Huisman

Kitamura

Cook

, et al. Pearls and pitfalls for LLMs 2.0. Radiology. 2024;313(1):e242512. doi:10.1148/radiol.242512. PMID: 39470427; PMCID: PMC11535876.

24.

Fine

Daly

Shankar

, et al. The role of donor-specific antibodies in acute cardiac allograft dysfunction in the absence of cellular rejection. Transplantation. 2014;98(2):229-238. doi:10.1097/TP.0000000000000047. PMID: 24675478; PMCID: PMC4101052.

25.

Fareed

Fatima

Uddin

Ahmed

Sattar

. A systematic review of ethical considerations of large language models in healthcare and medicine. Front Digit Health. 2025;7:1653631. doi:10.3389/fdgth.2025.1653631. PMID: 41019285; PMCID: PMC12460403.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.39 MB

0.00 MB