Abstract
Background:
Development of diagnostic reasoning (DR) is fundamental to medical students’ training, but assessing DR is challenging. Several written assessments focus on DR but lack the ability to dynamically assess DR. Oral assessment formats have strengths but have largely lost favour due to concerns about low reliability and lack of standardization. Medical schools and specialist medical colleges value many forms of oral assessment (eg, long case, Objective Structured Clinical Examination [OSCE], viva voce) but are increasingly searching for ways in which to standardize these formats. We sought to develop and trial a Standardized Case-Based Discussion (SCBD), a highly standardized and interactive oral assessment of DR.
Methods:
Two initial cohorts of medical students (n = 319 and n = 342) participated in the SCBD as part of their assessments. All students watch a video trigger (based on an authentic clinical case) and discuss their DR with an examiner for 15 minutes. Examiners probe students’ DR and assess how students respond to new standardized clinical information. An online examiner training module clearly articulates expected student performance standards. We used student achievement and student and examiner perceptions to gauge the performance of this new assessment form over 2 implementation years.
Results:
The SCBD was feasible to implement for a large student cohort and was acceptable to students and examiners. Most students and all examiners agreed that the SCBD discussion provided useful information on students’ DR. The assessment had acceptable internal consistency, and the associations with other assessment formats were small and positive, suggesting that the SCBD measures a related, yet novel construct.
Conclusions:
Rigorous, standardized oral assessments have a place in a programme of assessment in initial medical training because they provide opportunities to explore DR that are limited in other formats. We plan to incorporate an SCBD into our clinical assessments for the first year of clinical training, where teaching and assessing basic DR is emphasized. We will also explore further examiners’ understanding of and approach to assessing DR.
Keywords
Introduction
Accurate diagnostic reasoning (DR) is fundamental to ensuring patient care and safety, and thus the development of DR is a key component of medical training. Valid and reliable assessment methods are central to evaluating medical students’ DR skills. However, there is no consensus on the most effective approaches to evaluating these reasoning skills. Currently, several written assessments aim to assess DR, including the Script Concordance Test, 1 Key Features Test, 2 Clinical Reasoning Problems, 3 and the Extended Matching Format. 4 Although each of these assessments may capture some aspect of DR, the written format cannot capture the dynamic nature of DR in practice, such as changing reasoning in response to new information or explaining and defending one’s reasoning.
Oral assessments have been widely used to evaluate reasoning skills, 5 but face criticism because of lack of standardization, risk of bias, and poor reliability. 6 For example, traditional long cases provide extended assessment time for students to formulate and convey their understanding of a real patient’s problem. 7 Long case advocates value its authenticity; yet, the long case has also been subject to criticism of its lack of standardization due to examiner and patient variability. Such concerns have seen oral assessments lose favour, including for assessment in clinical education contexts. 8 It is possible to introduce, as we have done, direct questions about students’ DR into the Objective Structured Clinical Examination (OSCE), but time constraints limit extended exploration of DR in this setting. However, avoiding oral assessments altogether appears at odds with programmatic assessment, which gathers multiple authentic samples of student performance over time using different assessment methods linked to a variety of consequences. 9 Ignoring oral assessment formats significantly compromises a detailed and nuanced understanding of students’ DR development.
Appropriately designed and implemented oral assessments allow educators to clarify, probe, and confirm students’ reasoning, while also reflecting the reality of clinical learning in the workplace. Arguably, for a complex and contextual skill such as DR, it may be the most appropriate method of assessing medical students’ performance. Therefore, from a pedagogical perspective, we argue that it is desirable to use oral assessments to advantage while minimizing their weaknesses. We therefore developed a new standardized, oral assessment approach specifically to investigate students’ DR. Across a 2-year implementation, we sought to gather data on the performance of the Standardized Case-Based Discussion (SCBD; its score distributions, internal consistency, and relationship to other assessment components) and to determine whether SCBD examiners and students regarded the assessment as an acceptable approach to assessing DR. Our aim was to judge whether the SCBD had value as an ongoing component of our assessment programme.
Method
Development of the SCBD
We developed and implemented a single SCBD as a component of the assessment of the third year (the second year of fulltime clinical training) of a 4-year postgraduate medical course. SCBD comprises a face-to-face oral assessment between student and examiner, preceded by a video trigger of a doctor interviewing a patient about a new clinical problem. Our implementation design for the SCBD involved development, piloting, improvements based on feedback, trial implementation in 2015, and final implementation in 2016. A single case required substantial investment of resources at the development stage (writing, reviewing, filming, preparing stimulus materials). As a result, it is feasible in the context of limited time and financial resources to produce only 1 case per year. Our trade-off was to produce cases with a high degree of standardization and authenticity and to only allocate the case a 10% weighting in the aggregate subject mark to balance the possibility of low reliability of a single case.
SCBD administration comprises 2 phases. In phase 1 (15 minutes), students watch a 5-minute video trigger on an individual computer, with headphones, in a computer laboratory. All students receive the same patient information, may replay the video, and take notes. In phase 2 (15 minutes), students discuss the case, request additional relevant clinical information, and justify their DR with a trained clinical examiner.
Development of the SCBD involved scripting and filming the video trigger, developing comprehensive clinical information related to the patient presentation, producing a marking sheet to guide examiners through the history, examination, investigations, and management phases, and a marking rubric for defining the expectations for students’ performance. A panel of clinical experts reviewed a first draft of these elements of the SCBD to ensure accuracy and realism. The panel reviewed the materials critically and provided detailed feedback to the authors on aspects which the group agreed should be revised. To further standardize examiner marking, the expert panel also created a document outlining the performance expectations for a good and a borderline student at this stage of medical training.
We chose a video instead of a paper-based trigger to increase the assessment’s authenticity, to mimic real patient encounters, and because clinicians obtain valuable information from patients’ non-verbal and verbal cues. The cases were undifferentiated patient presentations based on real patients (eg, a young woman with abdominal pain, an undifferentiated dyspnoea in an older patient). Generalists developed the cases and trigger scripts, to maintain an undifferentiated feel to the cases, thereby allowing several diagnostic possibilities to be explored. The presenting complaints of the patients in the trigger videos were designed to be common complaints that the students would have been exposed to many times throughout their training and were not symptoms limited to 1 organ system (eg, dyspnoea can be a presenting complaint for cardiovascular, respiratory, haematological, and psychological conditions). SCBD trigger information was intentionally broad, so that no specific illness script was triggered, and the hypothetico-deductive method was required to further explore diagnostic possibilities. Fundamentally, the assessment required examiners to make an informed judgement about the capacity of the student to think critically about the case, through applying their biomedical knowledge and weighing up the available clinical information and additional investigations.
In total, 5 clinician examiners and 12 volunteer final-year medical students trialled a pilot case to refine the SCBD before implementation in the assessment programme. Examiners and students provided feedback on timing, materials, and examiner interaction, during separate group interviews. Overall, students and examiners regarded the SCBD pilot favourably. Minor suggestions for improvement prior to implementation included adjusting the timing of warning bells and developing more specific guidelines for probing students’ reasoning.
Students prepared for the initial implementation of the SCBD by watching an example video trigger and a simulated example of a good student performance on their Learning Management System. Our SCBD design involved a clinical conversation between a student and a single examiner (volunteer clinicians who are staff members at the university or affiliated clinical schools). Without a second rater, we established several measures to try to standardize examiner behaviour. Examiners prepared by completing the mandatory SCBD online training module. This module outlined the aims and process of the SCBD and included a calibration exercise where examiners watched and assessed a simulated student performance (at a borderline and a good level) and compared their ratings with those of an expert panel. Examiners were trained to ‘probe’ students’ reasoning as they articulated it, without prompting them for particular points. The video trigger and clinical information documents were distributed to examiners before the examination. On the day of the examination, examiners reviewed the video trigger as a group to reinforce the online training and to allow for case-specific questions.
All SCBD were completed in one half-day for more than 300 students (approximately 35 examiners work across the morning, examining individually in about 30 separate classrooms, examining up to 10 students across the morning). Examiners initiated the discussions by asking students to list their differential diagnoses and to justify their reasoning. To guide examiners to appropriately probe students’ reasoning, set questions were included on the mark sheet at the beginning and end of each section, for instance, ‘What further specific information would you like to know about the presenting complaint?’ and ‘How does this information help?’ Examiners provided further clinical information to the students on request. Students were responsible for managing their time, although examiners understood that they could ask for the student’s consent to move forward, if they believed that the student had exhausted their current line of reasoning or had satisfactorily justified their reasoning. To assist the students’ time management, there was a warning bell at 10 minutes with a final bell at 15 minutes.
Examiners marked students’ reasoning about the patient’s history and how this informed their thinking about likely diagnoses, requested examination findings, investigations requested and their interpretation, and the proposed patient management sections of the SCBD on a 5-point scale where 0 = unsatisfactory and 4 = excellent, and rated students’ overall performance using a 5-point global scale (where 1 = fail and 5 = excellent) for standard setting purposes. Examiners’ use of these scales was guided by a written rubric which included 4 to 7 descriptors of the characteristics of performance at each level of the scale. Students’ total score (out of 20) was the sum of the sections with history double-weighted. Double weighting reflected the high importance of medical history to diagnosis 10 and the expectation that more time would be allocated to this phase. This weighting also approximately reflected the weighting in the third-year curriculum and learning objectives.
Students and examiners completed a survey at the end of the assessment period that asked for feedback on their experience of the SCBD related to processes, acceptability, and value as an assessment. These data comprised quantitative Likert-type scale items on the student and examiner surveys as well as open-ended comments on the student survey. The data related to examining the performance of the SCBD and evaluation data from students and examiners about their experience of the SCBD were gathered as part of quality assurance and did not require institutional ethics approval but are conducted under the oversight of the medical course evaluation committee. All students are informed at the beginning of each year that assessment and evaluation data may be used anonymously for the purposes of quality assurance and curriculum development.
Data analysis
The trial and implementation of the SCBD yielded data related to student achievement on the SCBD (a total score out of 20), quantitative evaluative data (Likert-type agreement scales where 1 = strongly disagree and 5 = strongly agree) from students and examiners, and qualitative feedback data from students. The quantitative data on students’ total score on the SCBD were analysed descriptively, using means and standard deviations, and correlations with other assessment components and the final subject score (the weighted sum of all assessments in the year), Cronbach alpha was calculated as a measure of internal consistency of the 4 subscores, and the quantitative evaluative data were analysed descriptively, using means and standard deviations. The qualitative data were coded thematically by the second author (K.J.R.) with the themes verified independently by the first author (R.M.S.).
Results
We gathered information on the SCBD over 2 implementation years to provide detailed information on score distributions, internal consistency of the assessment (using Cronbach alpha), relationships between SCBD scores and other assessment forms (both written and clinical), and acceptability of the assessment for students and examiners.
It can be seen from Table 1 that the average raw SCBD scores in the first year were slightly higher than those in the second year, although examiners used the full range of scores in both years. Internal consistency was acceptable on both administrations of the SCBD. Correlations with other assessment formats in the medical course were relatively consistent across years and ranged from negligible (for Mini-Clinical Evaluation Exercise [Mini-CEX]: 0.07-0.11) to weakly positive for all other assessment types, both written (multiple-choice questions [MCQs]: 0.27-0.37, short-answer questions [SAQs]: 0.29-0.35, International Foundations of Medicine [IFOM]: 0.32-0.34) and clinical (OSCEs: 0.32-0.33). Overall, SCBD had a weak positive relationship with the final subject score (the weighted sum of all assessments in the year) in both years (0.35-0.41). Achievement on the SCBD contributed 10% to students’ final mark at the end of their second year of clinical training. We used the borderline regression standard setting method 11 to determine the cut score for satisfactory performance. The SCBD cut score was 8 out of 20 for both cases. To support our investment in a standardized clinical assessment format, we sought further information on the degree of examiner variability in scoring across the two years of implementation. A between-group analysis of variance (ANOVA) showed that, on average, only 17% of the variation in scores could be attributed to variation in scoring between examiners.
Descriptive statistics for the SCBD.
Abbreviations: SCBD, Standardized Case-Based Discussion.
Students and examiners evaluated their experience of the SCBD (Table 2). Most students agreed that the discussion with the examiner allowed them to demonstrate their DR. For examiners, 100% in each year agreed that the SCBD provided useful information on students’ DR. Students were particularly positive about the pre-assessment video trigger believing that they had sufficient time to think about the video trigger and that it provided sufficient information. In the first year, students were positive about their time management with the assistance of the warning bell. Students were less positive overall in the second year, particularly on features related to timing. In both years, students were least positive about the online learning resources in preparing for the assessment. Examiner evaluation of the SCBD was very positive. The exceptions were items related to student time management. Examiners were also less sure about their ability to probe students’ thinking and whether the training prepared them adequately.
Mean ratings for the student and examiner evaluation of the Standardized Case-Based Discussion.
Abbreviations: SCBD, Standardized Case-Based Discussion.
Evaluation items measured on a 5-point Likert-type scale where 1 = strongly disagree and 5 = strongly agree.
A total of 179 students (56%) provided additional qualitative feedback on the SCBD in the first implementation year. Of these students, 98 (55%) provided a positive comment related to the SCBD, whereas the remainder provided a negative comment (n = 63, 35%) or a suggestion for improvement (n = 18, 10%) (Table 3). Positive feedback on the SCBD suggested that it was a good assessment format that provided opportunities to demonstrate reasoning and was preferred over other formats such as OSCE. Student concerns about the SCBD related largely to perceived variation between examiners and time management. Students also sought more opportunities for practice and more guidance on the assessment requirements.
Percentage of students (n = 179) providing comments on the SCBD.
Abbreviations: OSCE, Objective Structured Clinical Examination; SCBD, Standardized Case-Based Discussion.
Percentages do not add to 100 because some students provided more than 1 comment.
Discussion
Our development and implementation of a new standardized oral assessment proved feasible. Acceptability of the SCBD among examiners and students was high, with both groups believing that this new oral assessment allowed exploration of students’ DR skills. Examiners used the full range of marks for the SCBD, the assessment had acceptable internal consistency, and the associations with other assessment forms were positive but not high, suggesting that the SCBD measures a related, yet novel construct.
In this pilot project, we focused on determining the utility of a single case at the end of the third year of the medical course in assessing students’ DR. We acknowledge that using a single case limits the format’s reliability; however, phased implementation over several years allows for the SCBD to be thoroughly trialled and for a larger bank of cases to be developed for future use. SCBD may eventually comprise several cases in students’ assessment. Although developing the SCBD is relatively resource intensive (regarding staff time and the costs of filming and editing the video trigger), this is balanced against the lower cost of administration (eg, there are no costs for simulated patients in this assessment). The lower reliability associated with a single case in the initial implementation is balanced against its high validity as part of an overall assessment programme. We plan to incorporate a similar assessment into our existing OSCE structure at the end of the second year of medical training, where a primary focus is teaching and assessing basic DR.
In constructing the SCBD, undifferentiated cases were developed to allow a very broad range of possible differential diagnoses. This approach maximized different possible avenues for students to explore and afforded students the opportunity to demonstrate their DR. The SCBD emphasized rewarding students for justifying their reasoning more fully than simply making the final diagnosis (which was rewarded if correct).
Examiner variability is a challenge in all forms of performance assessment and students had concerns that the SCBD lacked standardization in this area. The SCBD endeavours to minimize this risk with a comprehensive online training package for examiners and a marking rubric clearly articulating student performance levels. Moreover, an analysis of variation in SCBD scores as a function of examiner suggested that a relatively small percentage of the variation in SCBD scores could be attributed to examiner variation. This suggests reasonable success in our efforts to standardize the assessment. Such findings could be communicated to students to challenge their beliefs that their marks may be overly influenced by individual examiners. To even further improve examiner standardization, we are increasing face-to-face training on exploring students’ reasoning and managing transitions between different phases of the discussion. We see value in qualitative research to explore examiner thinking during the assessment, which could include triggered reflection after they watch videos of their own performance. Filming a selection of SCBD performances to undertake cross coding is also an essential next step to affirm the reliability of examiner scoring. We also now have an expanding pool of trained generalist examiners which should also further minimize examiner variability.
In summary, we have developed a highly standardized face-to-face oral assessment allowing in-depth exploration of DR based on an authentic clinical case. It has clear relevance to both examiners and students and will form part of a comprehensive programme of assessment that is designed to drive the teaching and acquisition of important DR skills for our medical students. Such work also has significant implications more broadly in medical education given the continuation of oral examinations as summative assessments in both initial and specialty medical training, in contexts where oral assessments are valued but where there is significant scrutiny to justify the rigour and fairness of assessment processes.
Footnotes
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author’s Note
David Smallwood is now affiliated to Austin Hospital, Heidelberg, VIC, Australia and Geoffrey J. McColl is now affiliated to Faculty of Medicine, The University of Queensland, Herston, QLD, Australia.
Author Contributions
RMS, DS, NGC and GJMc devised the innovation and KJR gathered the evaluation data. KJR analysed the assessment and evaluation data and all authors were involved in data interpretation. All authors contributed to the first draft of the manuscript and RMS and KJR revised subsequent drafts in line with author’s comments. All authors contributed to revisions of the manuscript and approved the final version for submission.
