Abstract
Background
Large language models are a type of artificial intelligence that can understand language and generate responses to text inputs. This presents potential within healthcare to improve triage of common conditions with established care pathways, such as lateral elbow tendinopathy (LET). However, its application to clinical scenarios requires evaluation.
Methods
Four questions regarding LET investigation and management were posed to ChatGPT-3.5, which was asked to provide five evidence sources. Five clinical scenarios were posed to the model, simulating consultations with typical and red-flag features. Responses were evaluated by three upper-limb Consultants using the DISCERN tool.
Results
Overall quality was unanimously rated as moderate for both questions and scenario responses, representing potentially important but not serious shortcomings. The model correctly identified the diagnosis and red-flag features and sign-posted accordingly. References cited were found to not exist in 40% of cases. Where references were correctly cited, issues identified included erroneous terminology; exclusion of recent evidence; and misinterpretation of findings.
Conclusions
While this technology's ability to identify diagnosis and red-flag features when presented with clinical scenarios shows promise, application in the clinical setting is not yet justified due to limitations in evidence basis of recommendations and lack of real-time access to evidence.
Get full access to this article
View all access options for this article.
