Abstract
OBJECTIVE
Clerkship grades are a component of determining a residency candidate's competitiveness. In 2017, the University of Minnesota Medical School's pediatric clerkship transitioned its standardized multiple-choice exam, the Aquifer Pediatrics Examination, to pass/fail with eligibility for honors being determined by clinical performance, not exam performance. We assessed the effect this change had on Aquifer exam performance and evaluated for correlation between Aquifer exam performance and clinical evaluation scores in order to gather insight into the validity of each type of assessment with respect to one another.
METHODS
We analyzed de-identified data from 750 medical students between the academic years of 2016 to 2017 and 2019 to 2020. Individual Aquifer exam scores were compared to individual clinical performance scores. Differences in exam performance before and after the transition to pass/fail were investigated with a two-sample t-test and Cohen's d for effect size.
RESULTS
No correlation was found between Aquifer exam scores and clinical performance scores. The mean Aquifer exam score prior to the transition to pass/fail was 80.02 ± 7.51 while the mean after the exam was made pass/fail was 77.8 ± 7.42. This difference was statistically significant (P < .001) with a Cohen's d (effect size) of 0.297.
CONCLUSIONS
A lack of correlation between the Aquifer exam scores and clinical performance scores was found. There was a small yet statistically significant decrease in Aquifer exam scores after the change to pass/fail; it is not clear if this represents a meaningful decrease in learning by students.
Introduction
Student performance during the clinical phase of medical school is primarily gauged through clerkship grades. The clerkship evaluations a student receives have implications for residency choice and obtaining a residency position as well as selection for awards and honor societies. 1 As preclinical exams and the United States Medical Licensing Examination (USLME) Step 1 transition to pass/fail, there is increasing interest being paid to how well clerkship grades accurately measure student performance, both to ensure proper attainment of competency and to provide information for the residency selection process. Students often feel inaccurately assessed by subjective clinical performance evaluations and research suggests grading policies yield inconsistent—and inequitable—results. 2 In addition, various medical schools (and clerkships within medical schools) often have dramatically different proportions of students who receive honors grades. 3 For example, at the University of Minnesota Medical School (UMMS), different clerkships have varying grade distributions. The variability among and within medical schools is likely at least in part due to grading rubrics, evaluator bias and socioeconomic, racial, and gender disparities as well as institutional cultural elements. In a study of internal medicine clerkship evaluations, there was greater variance in the scoring between evaluators than in scores between students. 4 This implies that variance in clinical performance scores depends more heavily on the evaluator than truly on student performance.
The correlation between objective and subjective measures in medical education is inconsistent. At some schools, there is a high correlation between exam scores and evaluator feedback. 5 However, other institutions have found a low correlation, suggesting subjective and objective measures may not address the same outcomes of education 6 —or that one or both of the measures is inaccurate. In several recent studies, subjective evaluator feedback scores varied based on gender and whether students belonged to a group underrepresented in medicine. Non-white and underrepresented students had lower subjective grades, shelf exam (NBME subject examination) scores, and clerkship grades, and were half as likely to receive honors when compared to white peers.7,8 This difference in honors distribution has furthered the argument for pass/fail clerkship grades in order to decrease bias in grading and potentially increase diversity in competitive fields during the residency application process. A potential detriment to pass/fail grading is the loss of incentive to study and a subsequent decrease in learning and worsening of academic performance. Previous studies have assessed for changes in academic performance with a transition to pass/fail grading and have not found any impact on USMLE Step 1 scores.9,10 The effect on preclinical exams has been more inconsistent but has ranged from no effect to only a minimal statistically significant worsening in exam performance.9–12
Given this context, the University of Minnesota pediatrics clerkship modified its grading policies. The clerkship grade is a combination of clinical performance scores (CPSs) and the Aquifer Pediatrics Examination. From the 2016 to 2017 academic year through the 2019 to 2020 academic year, the grading policies gradually shifted to decrease the impact the Aquifer exam had on the ability to obtain honors. In order to assess the potential effects of de-emphasizing the Aquifer exam and to better understand what standardized exam scores and CPSs measure in terms of student ability, we investigated if performance on the Aquifer exam correlates with CPSs. Additionally, we assessed student performance on the Aquifer exam when it was a component of the overall grade and when it transitioned to pass/fail.
Methods
This is a retrospective cross-sectional study evaluating the impact of changes made to grading in a pediatric clerkship at the University of Minnesota Medical School. The University of Minnesota Medical School is a public medical school with 2 preclinical and one clinical campus, graduating approximately 220 students per year. The pediatrics clerkship is a four-week experience. Students complete the clerkship at 1 of 5 participating sites in Minneapolis, St. Paul, or Duluth, Minnesota. The clerkship is designed to provide an introduction to inpatient pediatric medicine, with all 4 weeks of the rotation taking place in the inpatient setting. The typical team a student is assigned to consists also of a pediatric resident intern or interns, interns from other specialties such as family medicine or psychiatry, a supervising pediatric resident, and a supervising pediatric hospitalist attending.
Expectations of medical students on the pediatric clerkship include comprehensive history taking, formulation of treatment plans, investigation of clinical scenarios and application of learned knowledge, effective communication, and consistent demonstration of professionalism. Supervising residents and supervising pediatric hospitalists assess students regarding these clinical competencies among others after their time with the student by assigning numerical values with predescribed levels of performance within each evaluated competency, ranging from 1 to 4. The numerical scores given to each student are then averaged to create the CPS. The method of creating a CPS was not validated or pilot-tested prior to implementation at the University of Minnesota Medical School.
At the end of the rotation, each student takes the Aquifer Pediatrics Exam. Exam scores and CPS are then used to determine final grades. The Aquifer Pediatrics Exam is a validated assessment of the pediatric knowledge expected of a third-year medical student. It is administered at the end of the Pediatrics clerkship, similar to the traditional NBME shelf exam in many institutions which is not utilized for the Pediatrics clerkship at UMMS. In the 2016 to 2017 academic year, Aquifer exam scores accounted for 30% of the total grade with a minimum threshold for honors eligibility of 80% correct on the exam, or 50th percentile (Table 1). CPS accounted for the other 70%. Cutoffs for honors were determined by clinical site, aiming for 25% of students to receive honors. In the 2017 to 2018 and 2018 to 2019 academic years, students needed to pass the Aquifer exam, defined as scoring ≥ 60%, but it did not contribute to overall grade. However, students needed at least an 80% on the Aquifer exam to earn honors for CPSs ≥ 3.3/4. Students needed only a passing grade to qualify for honors for CPSs ≥ 3.5/4. In the 2019 to 2020 academic year, students needed to pass the exam (defined again as a score ≥ 60%) but the score was completely removed from grading calculations. Honors were based entirely on clinical performance. A CPS ≥ 3.3/4 qualified for honors as long as the student passed the Aquifer exam.
Aquifer exam scores, clinical performance scores (CPSs), and percent of students receiving honors by academic year.
For the 2016 to 2017 AY, Aquifer exam score contributed to 30% of the grade with a minimum threshold for honors of 80, or 50th percentile. CPS accounted for the other 70%. Cutoffs for honors were normed by site, aiming for 25% of students to receive honors.
For the 2017 to 2018 and 2018 to 2019 AYs, to qualify for honors, students needed either a ≥ 3.5 CPS and ≥ 60 Aquifer exam score or a ≥ 3.3 CPS and ≥ 80 Aquifer exam score.
For the 2019 to 2020 AY, to qualify for honors, students needed a ≥ 3.3 CPS and ≥ 60 Aquifer exam score.
Statistical analysis
We gathered CPSs, Aquifer exam scores, and clerkship grades for the 2016 to 2020 academic years. All data were de-identified prior to analysis. The difference in mean aquifer score before and after the pass/fail change was assessed using the Student's t-test and the effect size was characterized using Cohen's d. Linear regression analysis was performed to analyze for correlation between individual aquifer exam scores and CPSs. Since the 2 measures use different scales, the measures were standardized by comparing an individual's ranking on one measure to their ranking on the other measure. Data analysis was performed using Python 3.7.
Institutional Review Board approval
The study was determined by the University of Minnesota Institutional Review Board (IRB) to be exempt from IRB review and the need for informed consent as the study did not meet the US Department of Health and Human Services Code of Federal Regulations, 45 CFR 46, definition of human subject research.
Results
Aquifer exam scores were compared to CPSs in UMMS students from 2016 to 2020. No correlation was found between Aquifer scores and CPSs (Figure 1). Similarly, no correlation was found between the 2 when they were standardized to represent an individual's rank for each variable (Figure 2). Aquifer exam scores were evaluated in a total of 750 medical students from the 2016 to 2017 academic year through the 2019 to 2020 academic year. Of the 750, 213 (28.4%) students took the Aquifer exam when it accounted for 30% of the grade, and 537 (71.6%) took the exam when it was pass/fail (with varying effect on the ability to receive honors). The mean Aquifer exam score (percentage correct) when it accounted for 30% of the grade was 80.02 ± 7.51 while the mean after the exam was made pass–fail was 77.8 ± 7.42 (P < .001) with Cohen's d (effect size) of 0.297 (95% CI: 0.138-0.457).

Individual aquifer scores compared to their clinical scores, shown with original scoring scales.

Individuals’ percentile ranking on the aquifer exam compared to their percentile ranking on the clinical scores.
CPSs were evaluated in 656 medical students from the 2016 to 2017 academic year through the 2019 to 2020 academic year. One clinical site during the 2016 to 2017 academic year used a different clinical grading scale and therefore clinical scores from that site were not included in this analysis. The mean CPS for the entire study population (N = 656) was 3.14 ± 0.41. The mean CPS for each academic year, 2016 to 2017 through 2019 to 2020, is shown in Table 2. Regarding overall grades, of the 656 students for which Aquifer and CPS data were gathered, 239 (36.4%) received honors. The proportion of students receiving honors for each individual academic year is shown in Table 2.
Mean exam scores, clinical performance scores, and the proportion of students who received honors in each academic year.
For the 2016 to 2017 AY, Aquifer exam score contributed to 30% of the grade with a minimum threshold for honors of 80, or 50th percentile. CPS accounted for the other 70%. Cutoffs for honors were normed by site, aiming for 25% of students to receive honors.
For the 2017 to 2018 and 2018 to 2019 AYs, to qualify for honors, students needed either a ≥ 3.5 CPS and ≥ 60 Aquifer exam score or a ≥ 3.3 CPS and ≥ 80 Aquifer exam score.
For the 2019 to 2020 AY, to qualify for honors, students needed a ≥ 3.3 CPS and ≥ 60 Aquifer exam score.
Discussion
This study retrospectively assessed Aquifer exam scores and CPSs in medical students in the pediatrics clerkship at the University of Minnesota Medical School to evaluate for correlation between scores on standardized exams and subjective clinical performance evaluations. No correlation between the 2 scores was found. Our study also evaluated potential changes in exam performance and grading distribution when standardized exams are made pass/fail and the ability to attain honors is more dependent on the evaluation of clinical performance. A small, statistically significant decrease in Aquifer exam performance was found when the exam was made pass/fail.
Given the lack of any noticeable correlation, we hypothesize that Aquifer scores and clinical evaluations measure different aspects of medical education, as theorized in previous studies. 5 One potential explanation is that Aquifer exam scores measure content knowledge whereas the subjective evaluations aim to measure the application of this knowledge, clinical skills, and other key competencies demonstrated during clinical education. As discussed, subjective clinical evaluations could potentially exhibit more bias, serving as another factor eliminating correlation; however, standardized tests have also shown recapitulate bias given differential performance between student groups.13–16
Regarding student performance on the Aquifer exam, a small but statistically significant decrease in average exam score (80.02%-77.8%, P < .001) was noted after the change to a pass/fail exam. However, effect size analysis yielded a Cohen's d of 0.297, suggesting a small effect that may not be educationally meaningful. With the relatively small change in mean exam score (2.2%), we conclude that the change to a pass/fail exam likely did not result in a meaningful depreciation in obtained knowledge by students in the pediatric clerkship. These findings are consistent with the overall body of literature that pass/fail grading systems effectively measure competency and do not significantly degrade student performance.9–12 The findings of minimal changes in grading distribution also argue against concerns that pass/fail curriculums lead to students being less prepared by showing they were likely as prepared as in previous years. Given that students experience less stress under the pass/fail examinations, 17 this policy could potentially benefit student mental health within clerkship years by potentially decreasing rates of burnout, depression, anxiety, and substance use among medical students.18,19 There is a need for additional research to have a better understanding of the actual effects of a pass/fail grading system particularly as Step 1 is now pass/fail which has potential implications for the well-being of medical students and the residency matching process.20,21
This study has multiple limitations. This was a single-center study involving only University of Minnesota medical students. It is difficult to make generalizations on the correlation between clinical assessments and multiple-choice exams based on the results of one educational institution as policies and norms for clinical evaluations will vary from institution to institution. Similarly, while no correlation was found between Aquifer exam scores and clinical evaluations, the generalizability of this finding is not known. To confidently state that multiple choice exam scores do not correlate with clinical evaluation scores, a multitude of different exams and students of numerous different institutions would need to be analyzed. Additionally, this study reflects a period of change, and the sustained effects that the pass/fail policy could have on exam scores are not known. No validation study or pilot testing was done on CPS which limits interpretation of our results as it is not clear how accurately CPS reflects true clinical performance.
Conclusion
This study demonstrates no correlation between Aquifer exam scores and clinical evaluation scores. There was a small, statistically significant decrease in exam scores after a change to pass/fail grading that likely translates to minimal meaningful impact on educational attainment. Further studies should examine these trends in other clerkships and institutions as well as identify other intended and unintended consequences of changes to grading systems.
Supplemental Material
sj-docx-1-mde-10.1177_23821205231212771 - Supplemental material for The Effect of Pass/Fail Exam Grading on Exam Performance in a Pediatric Clerkship
Supplemental material, sj-docx-1-mde-10.1177_23821205231212771 for The Effect of Pass/Fail Exam Grading on Exam Performance in a Pediatric Clerkship by Madison E Kahle, Kayla M Hamann, Aliya A Sakher, Spencer R Goble, Katherine Murray, Yeng M Miller-Chang and Andrew PJ Olson in Journal of Medical Education and Curricular Development
Supplemental Material
sj-pdf-2-mde-10.1177_23821205231212771 - Supplemental material for The Effect of Pass/Fail Exam Grading on Exam Performance in a Pediatric Clerkship
Supplemental material, sj-pdf-2-mde-10.1177_23821205231212771 for The Effect of Pass/Fail Exam Grading on Exam Performance in a Pediatric Clerkship by Madison E Kahle, Kayla M Hamann, Aliya A Sakher, Spencer R Goble, Katherine Murray, Yeng M Miller-Chang and Andrew PJ Olson in Journal of Medical Education and Curricular Development
Footnotes
Acknowledgments
The authors acknowledge the pediatric clerkship staff who developed the grading policies examined in this work.
Author's Contribution
MK, KH, AS, SG, and KM designed the study. YM collected, analyzed, and interpreted the data. MK, KH, AS, and SG drafted the initial manuscript. SG, KM, and AO provided content editing for the manuscript. AO supervised the project.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by an Academic Education Investment Program grant at the University of Minnesota Medical School.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr Olson previously received honoraria to serve as Senior Director of the Aquifer Diagnostic Excellence course. Aquifer had no role in the design, conduct, or analysis of this study. All other authors declare no conflicts of interest.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
