Abstract
Introduction
Large language models (LLMs) offer potential as clinical decision support systems (CDSS) for detecting drug-related problems (DRPs), yet their real-world performance compared to clinical pharmacists (CPs) remains unclear, especially in complex hematology care. We aimed to evaluate the concordance between a clinical pharmacist and three LLMs in identifying DRPs within a Bone Marrow Transplantation unit.
Methods
This prospective observational study evaluated the concordance between a CP and three LLMs (ChatGPT-4o, Grok-3, DeepSeek-v3) in a Bone Marrow Transplantation unit. Eighty-three anonymized patient cases encompassing 210 CP-identified DRPs, classified via the PCNE v9.1 system, were presented using a standardized CDSS-simulating prompt. Performance was assessed based on direct detection, prompted detection after structured follow-up, and the clinical relevance of AI-generated therapeutic recommendations against the CP's gold-standard assessments.
Results
Direct detection of intervention-requiring DRPs was limited (51.4%-60.5% across models), with nearly half missed initially. Guided prompting significantly improved overall detection rates to 93.8%-98.1%, with ChatGPT achieving the highest accuracy. All models produced hallucinations. Recommendation concordance with the CP exceeded 70% in most DRP categories. DeepSeek and ChatGPT showed more consistent performance in context-dependent evaluations, whereas Grok demonstrated higher direct detection but lower recommendation alignment. LLMs demonstrate meaningful potential to assist in DRP detection but are not sufficiently reliable as standalone tools. Expert-guided interaction substantially enhanced their performance, underscoring the critical value of hybrid pharmacist-AI workflows.
Conclusion
Future research should validate these findings across broader populations with multiple expert evaluators and integrate next-generation AI architectures for safer CDSS implementation.
Keywords
Get full access to this article
View all access options for this article.
