Abstract
OBJECTIVES
This study aimed to evaluate the impact of formative assessment with case-based constructed-response question (CRQ) formats on student performance on the final summative assessment in the second-year periodontics course.
METHODS
Classroom quizzes with case-based CRQs were implemented as the formative assessment during the course. Each student received feedback on their responses from the course director. After all students (N = 128) took the second-year final examination, the Friedman test was conducted to compare student performances in each assessment over time. The multiple linear regression (MLR) model was used to evaluate the association between the second-year final examination score and plausible predictors—student gender, the second-year formative and midterm examination scores, and time spent on the final examination.
RESULTS
The mean % scores in the formative assessment (51) and midterm (84) examination were significantly lower than that of the final (87) examination (
CONCLUSION
Within the limitations of the study, student performance on case-based CRQs was correlated with student performance on the summative assessment.
Introduction
While summative assessment at the end of the course is critical to measure student learning, 1 predicting student performance in summative assessment is challenging because various known or unknown factors may affect student performance.2–4 Identifying deficiencies significantly affecting student performance in summative assessment would be helpful to improve student academic achievement. Therefore, low-stakes formative assessment may be useful to collect information and feedback from students with respect to the ongoing teaching and learning context, 5 and predict student performance in high-stakes summative assessment.6,7 However, the validity of scores from such low-stake assessments is potentially susceptible to examinees’ effort because they may not devote themselves to considerable work. 8 Providing high-quality feedback following the formative assessment is challenging especially for the large-sized class. 9
Classroom quizzes with multiple-choice questions (MCQs) can be implemented during the course as formative assessment prior to the midterm and final examinations. While MCQ tests can be graded quickly and accurately, one of the most common critiques for MCQs is their inability to assess in-depth comprehension on the subject matter. 10 In fact, most students performed well on the MCQ quizzes in the second-year periodontics course; student performance on the quizzes did not provide any estimations for their performance on the summative assessment. In contrast, tests with constructed-response questions (CRQs) are considered to be superior to measuring critical reasoning skills. 10 While MCQs are a closed form, in which students need to select the correct answer, CRQs are open-ended formats; students must provide their own answers. CRQs demand different skills from students to answer them correctly. The different skills range from factual recalling for fill-in-blank or short-answer questions (SAQs) to integrating multiple facts into a rational context for essay-type questions. 11
The most preferred aspect of CRQs may be the capability to evaluate students’ problem-solving skills by presenting real-life or similar situations which students may encounter in the clinic via case-based formats. 10 Case-based questions are based on the use of condition and/or disease scenarios. 12 Since the integrated National Board Dental Examination intends to test student problem-solving skills on various cases, 13 case-based CRQs have been incorporated in the preclinical second-year periodontics course for students to train problem-solving skills on simulated cases. However, students have shown poor performance in case-based CRQs on the second-year periodontics examinations. 3 Their poor performance in the CRQs negatively affected their scores in the final examinations that included MCQs, SAQs, and essay-type questions. The second-year final examination scores had been considerably lower than the first-year final examination scores within the classes. Therefore, two classroom quizzes with case-based CRQs have been implemented as the formative assessment in preparation for the second-year final examination.
The purpose of this interventional educational action research was to evaluate the impact of formative assessment with case-based CRQ formats on student performance on the final summative assessment in the second-year periodontics course. The null hypothesis was that there is no association between student performances in the implemented CRQ formative assessment and the final examination.
Methods
Participants and study outline
This study was conducted under the deemed exemption from the Institutional Review Board (IRB) at the University of Maryland, Baltimore (HP-00082574). The study was conducted with one cohort via a retrospective longitudinal assessment from September 2022 to May 2023. The second-year periodontics course is a year-long course, which takes place over the combined fall/spring semester. The fall semester included 10 lectures and the midterm examination, and the spring semester included 13 lectures and the course final examination. The course final examination is cumulative, covering all 23 lectures in the course. Faculty delivered the lectures in the classroom; the classroom lectures were recorded and posted on the online learning management system 14 after the completion of each lecture. All formative and summative assessments in the course were delivered via an online assessment platform 15 in the classroom under the surveillance of faculty proctor. Achieving <70% in each summative assessment was marked as failure.
The first classroom quiz with four case-based CRQs was administered 4 weeks before the midterm examination; the second one with two case-based CRQs was administered 6 weeks before the final examination. The number of CRQs and the timing of administration were determined based on lecture contents and course lecture and examination schedules. Due to spring break, the interval between the second formative assessment and the course final examination was increased to 6 weeks. A weight of 3% was assigned to the results of the formative assessments in calculating the final course grade.
Case-based CRQs covered the following topics: diagnosis and etiology for a gingivitis case, local contributing and risk factors, and stage and grade for a periodontitis case. These topics are essential to make a diagnosis and to further formulate appropriate treatment plans for patients. All case-based CRQs were written by the course director (SO) with another faculty member (DJ). Figure 1 presents examples of case-based CRQs. Three faculty members (SO, DJ, and GS) reviewed the questions and agreed on the correct answers. The grading of the CRQs in the quizzes was divided among three faculty (SO, DJ, and GS) to lessen the burden of grading CRQs for the large-sized class. Using the scoring rubric for CRQs (Table 1), the faculty members graded the same assigned questions for all 128 students to minimize potential variations. After grading, each student received feedback on their answers from the course director via emails. Table 2 presents examples of student responses and feedback from the course director. The CRQs in the quizzes were also reviewed during the subsequent classroom lectures with the best exemplars from the class.

Examples of case-based CRQs in formative assessment.
A scoring rubric.
Examples of student answers and feedback from the course director.
The course director (SO) and the course co-director (GS) wrote all the midterm and final examination questions, reviewed the questions, and agreed on the correct answers. The midterm examination contained 24 MCQs and 12 SAQs. The final examination was cumulative, covering all 23 lectures, and contained 45 MCQs, 7 SAQs, and 6 essay-type CRQs. The students had the opportunity to write comments on the final examination after they submitted their answers. Grading for SAQs and CRQs in the midterm and final examinations was conducted only by the course director (SO) using the same scoring rubric (Table 1). Item analyses were conducted for the midterm and final examinations to make sure that their question difficulty level was over 0.6 and discrimination power was ≥0.2. Two MCQs from the midterm and final examinations were dropped respectively because their item difficulty index was 0.2.
Statistical analysis
Given the nature of this coursework, we included the entire class. Because the sample size of 128 was robust for various analyses especially within educational settings, a formal sample size calculation was not performed.
The scores from the two quizzes were combined as the formative assessment score in statistical analysis because the topics covered by the formative assessments were from the lectures in fall semester. Student performances within the class in each assessment were compared with the Friedman test because the distributions of % scores did not pass the normality test conducted by the Kolmogorov-Smirnov normality and the Shapiro-Wilk's tests. Multiple comparisons were conducted by Dunn's test. The number of students who failed the second-year midterm and final examinations was compared with a chi-square test.
Descriptive statistics were used to identify deficient areas from the formative assessment. Gender, the second-year midterm and formative assessment scores, and time spent in taking the second-year final examination were selected as predictors. Multiple linear regression (MLR) model was built to investigate the associations between the four predictors and the second-year final examination score. All analyses were performed by tracking individual students on all variables (matched groups). The analyses were performed with GraphPad Prism (version 9.4.1; GraphPad Software, Inc. CA);
Results
The mean final examination score of this cohort in the first-year periodontics course was 86 ± 6. The study class included 74 (57.4%) female and 55 (42.6%) male students. The mean % examination scores with standard deviations were 84 ± 10 (the second-year midterm examination), 51 ± 19 (formative assessment), and 87 ± 9 (the second-year final examination), showing a significant variation over time (Friedman test,

Comparisons for
Sixteen students (13%) failed the second-year midterm examination while six students (5%) failed the second-year final examination. The number of students who failed the final examination was significantly reduced compared to that in the midterm examination (chi-square test,
Table 3 presents student performances in each question in the formative assessment. Less than 50% of the second-year students answered correctly on four of the six questions: diagnosis of gingivitis (6.2%), defining and recognizing risk factors (32.8%) and local contributing factors (24.2%), and assigning grade for a periodontitis case (6.2%). Seventy-eight students (61%) were assigned the correct stage for a periodontitis case and 99 students (77.3%) correctly answered the etiology for a periodontal disease. The correct answer rates in assigning stage (93%) and grade (73.6%) for a different periodontitis case on the final examination were improved.
Student performances in the formative assessment with CRQs.
Ten students left comments after the final examination. Students thought the second-year final examination was “difficult” and “tricky” although a few students, who left such comments, achieved over 90% scores. One student left a comment that it can be very challenging to see radiographs and determine the exact % of bone loss in the examination room with all the lights on. One student stated that too many questions related to the prognosis were included in the examination. This student answered “fair” or “questionable” to the classification questions; the answers were supposed to be Stage II or Grade B.
The MLR model was used to evaluate the associations between the selected predictors and the final examination score (Table 4). Based on the MLR model, the second-year midterm (
MLR model.
Discussion
This study examined the correlation between student performances on the formative assessment and the final examination in the second-year preclinical periodontics course. The regression model (Table 4) demonstrated that the case-based CRQ formative assessment significantly affected student performances on the summative assessment (
Periodontics primarily deals with the prevention, diagnosis, and treatment of periodontal diseases. Dental students must attain sufficient knowledge in periodontics during preclinical education. While assessment in dental education embraces a wide range of concepts, preclinical assessment should include a diagnostic tool to identify learning deficiencies among students. 16 Therefore, formative assessments with case-based CRQs were administered as assessment for learning. 5
Attention to low-stakes assessment without grading has been increased in medical education. 17 However, students may not give their best efforts to take those tests, especially with no grade assigned. 17 Therefore, low weight (3% in calculating the final course grade) has been assigned to the formative assessment in this study to provide incentive to the students who performed well.
The results of the formative assessment revealed that the most difficult tasks for the students were diagnosing gingivitis and determining a grade for a periodontitis case (Table 3). This may be due to clinical reasoning errors in interpretating clinical data. 18 Most students seemed not to grasp the meaning of the probing pocket depth. Therefore, they used the pocket depths as the primary diagnostic parameter. This also indicates that students had difficulty in evaluating radiographic images with respect to the normal crestal bone level. Students knew and described the basic information, but they could not apply their knowledge to solve clinical problems. Students paid more attention to medical history rather than dental clinical data because they were confused among the etiologic, risk, and local contributing factors for periodontal disease.3,19 This led to students’ poor performance in assigning the grade for a periodontitis case. After reviewing students’ responses, the course director identified these deficiencies and tried to provide constructive feedback rather than simple answers (Table 2).
Reviewing the results of the formative assessments influenced the following lecture contents and teaching focuses. Lecturers modified subsequent lectures to review the concepts which the students were struggling to understand. This adaptation allowed for a more tailored approach, aiming to enhance student comprehension.
Feedback is the information provided by instructors regarding aspects of student's understanding or performance. 20 While either negative or positive feedback can be delivered, feedback is supposed to be constructive. 21 Constructive feedback should be clear, tailored, factual, and descriptive based on the direct observation, provide a reasonable amount of information, and be delivered in a timely manner. 22 The course director tried to provide feedback to individual students via online (emails) and to the entire class via offline (classroom lecture) platforms. 23
While all 128 students received feedback and the correct answer rates for assigning stage and grade were improved on the final examination, it was not clear how they utilized feedback to enhance their performance on the summative assessment. Students’ attitude toward learning, as the receiving end, affects the effectiveness of constructive feedback. 24 While deep learners are more motivated to learn and enhance their performance, strategic learners may adopt tactics aiming toward academic success. 25 Furthermore, surface learners focus on the superficial features of the subjects and may skip over any content that they think is not related to their final goal. 25 Surface learners are often motivated by extrinsic factors and their engagement and commitment are not solid. Educational tactics to engage surface learners with feedback should be further explored.
Student performances on examinations are influenced by various factors, which may be controllable or not. 26 Uncontrollable factors include students’ health and personal issues, personal learning strategies, levels of test anxiety, and cultural or socioeconomic backgrounds. 26 This study maintained the same environments of examination classroom and the same style via online assessment. Student gender and time spent on the examination did not affect the final examination outcome (Table 4). Four students, who passed the midterm examination with scores between 79 and 90, failed the final examination. While students’ personal factors, such as family problems, over confidence/lack of confidence, or use of feedback, may affect their performances on the final examination, this study could not evaluate those factors.
External validity of the study findings is limited due to several factors. First, the formal sample size calculation was not performed. The decision to include the entire class was based on practical constraints and the unique context of educational research. The absence of a predetermined sample size calculation may raise questions regarding the generalizability or statistical power of the study findings. However, the inclusion of the entire class enabled us to collect comprehensive data. Second, the students in this study received the same experience with no control group. Although the MLR model tracked individual students to evaluate the correlation between the formative and summative assessments, we did not know whether the students would have performed more poorly without implementing CRQ formative assessments. Third, this study did not obtain students’ perceptions on case-based CRQ formative assessment and feedback from the course director. As students’ perceptions may affect their motivation and learning behavior, 27 obtaining students’ perceptions may be helpful to understand how they utilize formative assessment and feedback. 28 Finally, preclinical periodontics courses heavily focus on delivering and testing explicit knowledge as “know what”, although there is a component for manual skill development via simulation-based learning. 29 Therefore, the format of formative assessment in this study may not be applicable for disciplines in which attaining manual dexterity and testing manual skill levels are a major part of courses.30,31
Conclusions
Implementing case-based CRQs as the low-stakes formative assessment assisted the course director in identifying areas of deficiency on ongoing teaching. Within the limitations of the study, student performance on case-based CRQs was correlated with that on the summative assessment. Further study is needed to include more cohorts and to determine specific effects.
Footnotes
Acknowledgements
The authors would like to thank Dr Gary Swiec for his help in the second-year periodontics course as a co-director. The authors would like to thank Dr Man-Kyo Chung for his critical reading and insightful reviewing.
Authors’ contributions
DJ participated in formative assessment and in writing the manuscript. SO designed the study, conducted the study, collected data, performed statistical analysis, and participated in writing the manuscript. All authors have read and approved the manuscript, are aware of this submission, and agree with its publication.
Data availability statement
The datasets used and analyzed in this study are available from the corresponding author upon reasonable request.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, and authorship. The Department of Advanced Oral Sciences and Therapeutics partially supported the publication of this article.
Ethics approval
This study was conducted under the deemed exemption from the Institutional Review Board (IRB) at the University of Maryland, Baltimore (HP-00082574). All methods were conducted in accordance with relevant guidelines and regulations.
