Abstract
Objective:
This study evaluates the capability of DeepSeek, a large language model-based Artificial intelligence (AI) system, in passing the UK Fellowship of the Royal College of Radiologists (FRCR) examination by assessing its performance on text-based components.
Methods:
DeepSeek R1, a publicly available AI chatbot, was tested using standardised prompts on 200 Part 1 physics questions and two sets of 120 single-best-answer questions from Part 2A of the FRCR examination. The AI’s performance was compared against the 2024 FRCR pass marks (57%-75% for Part 1 and 55%-60% for Part 2A). Due to its inability to analyse images, DeepSeek was not assessed on the anatomy or Part 2B components.
Results:
DeepSeek achieved an accuracy of 82% on the Part 1 physics section and 81.67% and 80% on the two Part 2A papers, surpassing the required pass thresholds for all tested sections.
Discussion:
These findings demonstrate that DeepSeek possesses substantial knowledge relevant to the FRCR examination and suggest potential applications in radiology education. However, its current inability to process image-based questions limits its applicability in practical radiological assessments. Future advancements integrating image analysis capabilities may enhance its role in radiology training and clinical practice.
Conclusion:
DeepSeek demonstrates high accuracy in answering text-based FRCR questions, highlighting its potential as an AI-driven educational tool. However, further development is required to enable comprehensive AI integration into radiology training and diagnostic workflows.
Keywords
Introduction
Artificial intelligence (AI) has significantly impacted various sectors, including healthcare, where it has made notable advancements across different areas. DeepSeek Artificial Intelligence Co., Ltd. (referred to as ‘DeepSeek’) is a Chinese company founded in 2023 with a mission to advance the field of artificial general intelligence (AGI). AGI refers to highly autonomous systems that possess the ability to outperform humans at the most economically valuable work. Unlike narrow AI, which is designed for specific tasks, AGI aims to achieve human-like cognitive abilities across a wide range of domains. AGI can be applied to radiology as it can be trained to analyse medical images with high accuracy, assisting radiologists in diagnosing diseases.[1-4]
In previous studies, the AI language model ChatGPT has demonstrated a high level of performance on standardised medical examinations that closely align with the structure, content and complexity of professional board certifications. These include the United States Medical Licensing Examination (USMLE), the Canadian Royal College examinations, the American Board of Radiology assessments as well as the Fellowship of the Royal College of Radiologists (FRCR) and the Fellowship of the Royal College of Surgeons in orthopaedics in the United Kingdom.[5-9] However, to date, the capability of DeepSeek to achieve a passing score on the FRCR examination has not been evaluated.
This research focuses on assessing DeepSeek’s performance in relation to the FRCR examination, which is the standard for becoming a radiologist in the UK. The FRCR is a comprehensive assessment that tests candidates’ knowledge and comprehension of various aspects of clinical radiology. It consists of several components: Part 1 includes an anatomy module evaluated through an image-viewing session and a physics module that features 40 sets of 5-part true/false questions, totalling 200 questions. Part 2A comprises two papers, each containing 120 single-best-answer questions that cover a broad array of topics from the core curriculum. Part 2B includes rapid reporting, long case evaluations and two viva voce exams. Given the intricate nature of the FRCR exam, this study provides valuable insights into the strengths and weaknesses of advanced AI systems such as DeepSeek, in a challenging medical environment. The results may also have important implications for the use of AI in medical education, diagnostic practices and problem-solving within clinical contexts.
Methods
This study did not involve human subjects or the use of personally identifiable information; therefore, ethical approval was not necessary. The research employed DeepSeek R1, a publicly available, pre-trained AI chatbot built on a large language model. No specialised training for radiology was conducted for this study. The model assessed was the latest version of DeepSeek, available through a public plus-tier subscription.
The RCR offers a limited selection of sample questions for its examinations. To gather relevant material, questions were compiled from banks that closely align with the format and content of the FRCR exams, in consultation with recent successful candidates. However, DeepSeek’s performance was not assessed for the anatomy or Part 2B components of the FRCR exams, as the model currently lacks the capability to analyse image-based data which is essential for these sections.
To assess DeepSeek’s performance, standard prompts were used to elicit responses. For the Part 1 physics questions, the prompt ‘Mark as true or false (question)’ was utilised, while for the Part 2A questions, the prompt ‘Which is the single-best-answer (question)’ was employed. DeepSeek was then tasked with answering 40 five-part true/false questions from the Part 1 physics question bank (totalling 200 questions) and two sets of 120 questions each from the Part 2A question bank. The pass marks for the assessments were determined based on the most recent available FRCR pass marks. For the Part 1 physics component, the 2024 pass mark ranged between 57% and 75%, while for the 2 A component, the 2024 pass mark was set between 55% and 60%. These benchmarks were used to evaluate DeepSeek’s performance in answering the selected questions.
Results
In this study, DeepSeek was evaluated on its ability to answer questions from the FRCR examinations. The results are as follows:
FRCR Part 1-Physics Component:
Total Questions: 200 Correct Answers: 164 Accuracy: 82% FRCR Part 2A-Core Curriculum:
Paper 1:
Total Questions: 120 Correct Answers: 98 Accuracy: 81.67% Paper 2:
Total Questions: 120 Correct Answers: 96 Accuracy: 80%
These results indicate that DeepSeek achieved substantial accuracy across the components evaluated. There was no pattern observed for the questions DeepSeek got wrong. DeepSeek took approximately 35 seconds to answer the FRCR Part 1 physics component and 60 seconds to answer each paper of FRCR2A.
Discussion
The findings from this study highlight the significant potential of DeepSeek as an AI system capable of performing at a high level in the context of the FRCR examinations. The overall accuracy rates of 82% for the physics component and approximately 81% for the core curriculum questions suggest that DeepSeek can effectively process and respond to complex medical inquiries that align with radiological knowledge.
The ability of DeepSeek to achieve such high scores is indicative of its advanced capabilities and may serve as a benchmark for the application of AI in medical education and diagnostic processes. This performance raises several important considerations. The high accuracy of DeepSeek in answering examination questions suggests that AI could serve as a valuable tool for radiologists, assisting in the diagnostic process by providing quick and accurate information. This aligns with findings from previous studies, such as the work by Bhayana et al.[6] which explored the performance of ChatGPT on radiology board-style examinations and highlighted both its strengths and limitations in the field. The results also pave the way for the incorporation of AI systems such as DeepSeek into medical training. AI could be utilised to create practice examinations and interactive learning environments that enhance the educational experience for radiology trainees. This is supported by studies such as Kung et al. which demonstrated ChatGPT’s potential for AI-assisted medical education through its performance on the USMLE.[5]
It is important to note, however, that DeepSeek’s capabilities are limited to text-based questions and cannot currently process image data, which is critical for the anatomy and practical components of the FRCR exam. Ariyaratne et al. also emphasised the challenges faced by AI in passing the UK Radiology Fellowship Examinations, pointing out that while AI shows promise, significant barriers remain.[7] This limitation highlights the need for continued development of AI systems to encompass a broader range of diagnostic tasks, including image analysis. Future studies should aim to explore the integration of image-processing capabilities into AI models such as DeepSeek. This would provide a more comprehensive evaluation of their potential in radiology and further validate the efficacy of AI in clinical settings. As AI becomes more integrated into healthcare, ethical considerations regarding the use of AI in decision-making processes must be addressed. The implications of relying on AI for diagnostic purposes, including accountability and the need for oversight, warrant thorough discussion.
Conclusion
In conclusion, this study underscores the promising capabilities of DeepSeek in the context of the FRCR examinations and suggests that the continued evolution of AI technology may significantly impact the field of radiology, both in education and clinical practice. The results should be viewed as preliminary insight rather than conclusive. The results advocate for further exploration and development of AI systems to enhance their utility in comprehensive medical training and patient care. However, a potential limitation is the assessment of only text-based questions and not images, which form the core part of radiology training. A single study cannot serve as the final basis for the replacement or outperformance of trained radiologists by AI in high-stakes professional examinations. This should be evaluated further in a large-scale study involving the latest exam pattern and the latest version of the AI model.
Footnotes
Consent to Participate
No consent as no patients in the study.
Consent to Publish
Yes.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
Institutional Ethical Committee Approval Number
No ethical committee approval required as it does not involve patients.
Credit author statement
Conception and design, or acquisition of data, or analysis and interpretation of data by SS, RB Design, or acquisition of data, or analysis and interpretation of data by SS, RB Drafting the article or revising it critically for important intellectual content by SS, HU, RB Final approval of the version to be published by SS, HU, GG, KPI, RH, RB
Data Availability
Data is available to share on request.
Use of Artificial Intelligence
No AI was used.
