AdamsLCTruhnDBuschF, et al. Llama 3 challenges proprietary state-of-the-art large language models in radiology board–style examination questions. Radiology. 2024;312(2):e241191.
2.
LiDGuptaKBhaduriMSathiadossPBhatnagarSChongJ. Comparative diagnostic accuracy of GPT-4o and LLaMA 3-70b: proprietary vs. open-source large language models in radiology. Clin Imaging. 2025;118:110382.
3.
UedaDMitsuyamaYTakitaH, et al. Diagnostic performance of ChatGPT from patient history and imaging findings on the Diagnosis Please quizzes. Radiology. 2023;308(1):e231040.
4.
LiDGuptaKBhaduriMSathiadossPBhatnagarSChongJ. Comparing GPT-3.5 and GPT-4 accuracy and drift in Radiology Diagnosis Please cases. Radiology. 2024;310(1):e232411.
NishinoMBallardDH. Multimodal large language models to solve image-based diagnostic challenges: the next big wave is already here. Radiology. 2024;312(1):e241379.