Chain-of-Thought Reasoning Improves ChatGPT’s Diagnostic Accuracy in Radiology

Abstract

Get full access to this article

View all access options for this article.

References

Adams

Truhn

Busch

, et al. Llama 3 challenges proprietary state-of-the-art large language models in radiology board–style examination questions. Radiology. 2024;312(2):e241191.

Gupta

Bhaduri

Sathiadoss

Bhatnagar

Chong

. Comparative diagnostic accuracy of GPT-4o and LLaMA 3-70b: proprietary vs. open-source large language models in radiology. Clin Imaging. 2025;118:110382.

Ueda

Mitsuyama

Takita

, et al. Diagnostic performance of ChatGPT from patient history and imaging findings on the Diagnosis Please quizzes. Radiology. 2023;308(1):e231040.

Gupta

Bhaduri

Sathiadoss

Bhatnagar

Chong

. Comparing GPT-3.5 and GPT-4 accuracy and drift in Radiology Diagnosis Please cases. Radiology. 2024;310(1):e232411.

OpenAI. Introducing ChatGPT Pro. December 5, 2024. Accessed December 6, 2024. https://openai.com/index/introducing-chatgpt-pro/

Nishino

Ballard

. Multimodal large language models to solve image-based diagnostic challenges: the next big wave is already here. Radiology. 2024;312(1):e241379.