Abstract

Dear Editor,
We read the publication by Amos et al. 1 with surprise and concern. Changing the College’s assessment program is complex and requires informed and nuanced conversations. The College has engaged with stakeholders through several stakeholder forums (SHFs).
We were, therefore, surprised to see the Journal publish a paper that only discusses the measurement aspects of assessment. That is a disservice to the complexity of the matter and the College’s careful approach and at the same time publicly casts doubt on the competence of recently ‘fellowed’ psychiatrists.
The publication is rife with misunderstandings and incorrect assumptions around assessment. For an extensive review of these issues, see Sidhu and Fleming’s critical narrative review. 2 In this short letter, we can only indicate the main errors.
For example, the authors confuse predictive validity with construct validity. Predictive validity is not useful in assessment for reasons identified by Cronbach and Meehl in 1955 3 ; construct validity has been universally used instead. 4 Predictive validity does not work because there is no single measurable gold standard. Competence is far too complex for that: like ‘health’, it cannot be captured in a single number (‘you are 42% healthy’).
In the context of assessment, the notion of false-positive/negative is mainly useful as an illustration not as a real calculation. The OSCE data presented at the SHFs were, therefore, about measurement imprecision and were based on actual data instead of assumed.
As a result, the authors confuse Standard Error of Measurement (SEM) used in the SHF communique with more standard comparisons of means of distribution in their comparison. But comparing means of vastly different assessments (one based on a single measurement and the other on longitudinal assessment and feedback) is not informative and can lead to harmful misconceptions. A more meaningful comparison would have been between the proportions of measurement error of the OSCE and the AAP.
The authors cite selectively. For example, Prentice et al did publish a literature review but also a meta-regression on national data showing a large positive effect of early assessment and intervention on learning outcomes. 5
Finally, the authors confuse the learning effects of cramming for a single examination with learning for long-term retention and application in practice. There is a body of research that shows the huge difference in favour of the latter. 6 Longitudinal assessment with feedback leads to better learning outcomes.
Simply comparing pass rates is, therefore, comparing apples and oranges.
