Sage Journals: Discover world-class research

Abstract

Online-quizzes are an economic and objective method for formative assessment in universities. However, closed questions have been criticized for promoting shallow learning and resulting often in poor learning outcomes. These disadvantages can be overcome by embedding closed questions in effective instructional designs involving feedback. In the present field study, a final sample of N = 496 students completed the same online quiz, consisting of 60 true–false statements on the biological bases of psychology in two sessions. In order to enhance the benefit of formative testing on students’ test achievement in Session 2, students received elaborate feedback (i.e., by providing explanations for the in-/correctness) for half of their answers in Session 1, and corrective feedback (i.e., just indicating the in-/correctness) for the other half. The results showed that students scored higher in Session 2 if elaborate feedback had been provided in Session 1, compared with when corrective feedback was provided. More specifically, students profited more from elaborate feedback on incorrect answers in Session 1 than from feedback on correct answers. As a practical recommendation, self-administered formative tests with closed-question format should at least provide explanations why students’ answers are incorrect.

Keywords

Elaborate feedback formative assessment online quiz testing effect

Question Formats in Online Quizzes

In blended- and e-learning scenarios, students’ self-assessment by means of quizzes has recently become a very popular method to present study materials, specifically using a closed-question format (Nguyen & McDaniel, 2015). Closed questions consist of a stem, in which a question, a statement, or a case example is provided. Potential answers are displayed below in form of multiple sentences (also referred to as items; Haladyna et al., 2002). Each sentence is to be evaluated by students, and correct answers are to be selected, for example, by ticking a box. Typically, closed-question formats vary in the number of items presented (Haladyna et al., 2002), with single choice being the most conventional format: Only one of at least three presented items is correct. Alternate choice represents a special case of single-choice questions, including only two items; in the multiple–choice format, more than one of n items can be correct. In the true–false format, participants need to evaluate the correctness of a statement in relation to its stem by making a binary choice (“correct” vs. “incorrect”).

The closed-question format is a very efficient and objective method. Implementing automated evaluations, university lecturers save correction time and can provide self-assessment even for large student groups (e.g., Lindner et al., 2015). Students can work independently according to their own schedule and receive feedback on their learning gains via formative tests during the term. But there are various possibilities how to design tests and quizzes (e.g., single-, alternate- vs. multiple-choice format) and how to implement and enrich them in computer-based environments (Lindner et al., in press). Thus, university lecturers are faced with the question of how to integrate testing with closed questions into their lectures and seminars to make their students profit the most. Even though the existing feedback literature sheds light on the effectiveness of different feedback conditions, the ecological validity of corresponding instructional designs has yet to be tested empirically in naturalistic online-learning environments. This is addressed in the present article by implementing a field study.

Answering Questions as a Study Strategy

In some parts of the psychological curriculum, acquisition of factual knowledge seems to be an important learning outcome. For instance, facts about the structure and function of neurotransmitters, neurons, and larger brain structures is necessary to understand current practical and research questions in many fields of psychology (Simon-Dack, 2014). Under these circumstances, tests-often implemented as answering course questions online-are supposed to assess prior learning gains and academic achievement, but also to foster learning and to serve as study materials (for a seminal study, see Bjork, 1975). Some students tend to use formative assessments for study rather than for diagnostic purposes. That is, they skip attending classes or reading the course materials, and instead they attempt to acquire the relevant knowledge through quizzing (Marsh et al., 2007; Roediger & Marsh, 2005).

In fact, prior research showed that taking tests (i.e., successfully practicing retrieval from memory) during the learning phase can promote long-term retention, as compared to restudying (for meta-analyses, see Adesope et al., 2017; Rowland, 2014; Schwieren et al., 2017). This direct testing effect has been shown to be quite robust across various learning materials (e.g., text passages, Roediger & Karpicke, 2006; visuospatial information, Carpenter & Pashler, 2007; performed actions, Kubik et al., 2018) and also test formats (e.g., multiple-choice tests and short-answer tests, e.g., Butler & Roediger, 2007; Kang et al., 2007; Smith & Karpicke, 2014). However, its magnitude depends on the level of initial test performance (i.e., retrieval success, e.g., Greving & Richter, 2018; Smith & Karpicke, 2014) and feedback (Adesope et al, 2017; Phelps, 2012; Rowland, 2014), specifically in real-life educational contexts (McDaniel & Little, 2019). Recent meta-analyses indicate an overall positive effect size of d = .56 of testing also in psychology classrooms (for metaanalyses, see Schwieren et al., 2017) and a small positive effect of frequent testing in class on higher education achievement in general (d = .24; Schneider & Preckel, 2017).

Answering closed questions makes students engage in retrieval practice (cf., Karpicke, 2012), which is supposed to promote long-term retention and transfer of information (e.g., McDaniel et al., 2013; Pan & Rickard, 2018). Hence, online quizzes with closed questions should elicit retrieval-based learning of facts. In line with the processing-levels theory (Craik & Lockhart, 1972), a deeper information processing when studying with closed questions should improve learning outcomes in general (Collins et al., 2018). According to the retrieval effort hypothesis (Pyc & Rawson, 2009), successful but more difficult retrieval attempts, requiring more retrieval effort for learners, promote better long-term learning than less difficult retrieval attempts. Thus, instructors are supposed to design assessment tasks that create so called desirable difficulties (cf. Bjork & Bjork, 2011) to further enhance retrieval-based learning in (formative) tests. This can be achieved by manipulating the time interval in between the repetition of questions (with a longer time interval making learning more difficult) or the number of times a test item should be repeated by students (Pyc & Rawson, 2009). Yet, the more naturalistic the learning environment becomes, the more difficult it is to control the circumstances under which learning occurs. For example, in online classes, students are studying according to their own schedule by navigating through a learning environment and taking quizzes according to their own pace. An alternative way to enhance retrieval-based learning is to provide feedback on learners’ provided memory responses.

The Importance of Feedback for Successful Learning

Feedback is defined as “information about the gap between actual and desired performance” (Butler & Woodward, 2018, p. 2). Based on laboratory studies (Rowland, 2014) and in the context of learning and teaching psychology (Schwieren et al., 2017), meta-analyses reported positive additive effects of feedback beyond the learning benefits engendered by testing on retention, and on achievement in higher education in general (d =.47; Schneider & Preckel, 2017). Furthermore, feedback decreases the risk of students acquiring incorrect knowledge by correcting memory errors (Butler & Roediger, 2008; Ecker et al., 2020; Marsh et al., 2007; Roediger & Marsh, 2005). Especially in the multiple–choice and true–false question format, students are repeatedly exposed to lure test items which can result in students remembering false answers as they become familiar with the statements (negative suggestion effect, Toppino & Brochin, 1989). This in turn enhances the likelihood of choosing wrong answers in a subsequent test, especially when formative tests are used for study purposes, and lure items represent common misconceptions or personal experiences (Marsh et al., 2007). Furthermore, feedback is also helpful for reassuring low-confidence, correct answers (Butler et al., 2008; Fazio et al., 2010; Pashler et al., 2005). Thus, provided with formative testing include multiple–choice or true–false questions, feedback is argued to magnify the benefits of retrieval practice on retention, while prevents the negative side effects of learning lure items (Butler & Roediger, 2008; Butler & Woodward, 2018).

Note, however, that there are various types of feedback that are differently effective in fostering additional learning (for a review, see Butler et al., 2013; Butler & Woodward, 2018):

right/wrong feedback (also referred to as knowledge of results), in which the learner receives feedback on whether the provided answer(s) is (are) correct;

corrective feedback (also referred to as correct answer or verification feedback), in which the correct answer(s) is (are) presented to the learner;

elaborate feedback (sometimes referred to as explanatory feedback), in which an explanation is given why the learners’ answer(s) is (are) in-/correct.

Prior research has largely shown that right/wrong feedback, relative to no feedback, engenders no reliable benefits on later memory performance (Fazio et al., 2010; Pashler et al. 2005), with the exception of retaining low-confidence correct answers (e.g., Fazio et al., 2010). Importantly, corrective feedback was revealed to be more effective than right/wrong feedback (Fazio et al., 2010; Pashler et al, 2005). In part this seems to be the case as corrective feedback provides the to-be-corrected information. Furthermore, incorrect retrieval attempts can enhance learning in subsequent restudy opportunities, including corrective feedback (Izawa, 1961). Such test-potentiated learning on subsequent restudy phases has been robustly shown with various materials (Arnold & McDermott, 2013a, b; Kubik et al., 2016; Tempel & Kubik, 2017).

The current literature (e.g., Butler & Woodward, 2018) discusses the efficacy of elaborate feedback to provide learning benefits above and beyond corrective feedback. Until now, rather few studies directly compared both types of feedback (cf. Butler & Woodward, 2018; for a meta-analysis, see Bangert-Drowns et al., 1991), with some studies finding a benefit of elaborate feedback (e.g., Corall & Carpenter, 2020; Gilman, 1969; Petrović et al., 2017), while other studies did not (e.g., Pridemore & Klein, 1995; Thijssen et al., 2019; Whyte et al. 1995). For example, an early study by Gilman (1969) tentatively suggested that elaborate feedback can reduce the necessary number of encounters with the learning material and promote retention. More recent work shows also rather mixed results on the benefits of providing additional information on why a response is correct. For example, Petrović et al. (2017) revealed positive effects of such elaborate feedback when providing three online quizzes in a course on digital signal processing in electrical engineering. More specifically, they manipulated the type of feedback that followed participants’ responses to multiple-choice questions: participants received either (1) no feedback, (2) corrective feedback pointing out the correct answer, or (3) elaborate feedback involving the correct answer plus a short explanation including a misconception that might have led to the incorrect response and a reference to the course material. Knowledge gains were larger with elaborate feedback as compared to corrective feedback, which in turn was still more beneficial than providing no feedback. However, other recent studies, for example, Thijssen et al. (2019) did not show any additional benefit of receiving elaborate feedback above and beyond the benefits produced by formative testing and corrective feedback. They employed a multiple app-based, formative tests during a 4-week long course and also assessed the final exam grades. Note however, that the general performance level was relatively high (i.e., approximating a potential ceiling effect), and corrective feedback included specific source information for background materials to close the gap for the course exam. This source information also enhanced study behavior before the exam in general and may have encouraged the students to look up this information which might have blurred the difference to elaborate feedback.

The potential benefit of elaborate (over corrective) feedback has also been investigated in other research domains, such as knowledge revision and text comprehension: different types of refutation texts are typically provided to correct false beliefs and to enhance the comprehension and application of knowledge statements, concepts, or mental models (e.g., Ecker et al., 2020; Prinz et al., 2019; Rich et al., 2017). For example, in the study by Rich et al. (2017), students were instructed to judge statements that included common misconceptions as correct or incorrect, followed by a feedback phase: refutation texts stated and dispelled the misconceptions, and either included an explanation of the refutation (i.e., elaborate feedback) or not (i.e., corrective feedback). Results showed that—in the long run-learners were more likely to correct misconceptions when an explanation was provided why a statement was incorrect, compared to when only a refutation and the correct answer was given, and this even after longer retention intervals (Ecker et al., 2020; Rich et al., 2017). Similarly, recent studies using more educationally relevant materials (e.g., Prinz et al., 2019) also demonstrated beneficial effects of refutation (over standard) texts on statistics in terms of a decreased rate of misconceptions and, associated with it, an enhanced textual comprehension and metacomprehension accuracy (i.e., ability to predict future comprehension performance).

So far, there is still little known about why and under which circumstances elaborate feedback is beneficial for learning. It has been generally argued that providing an explanation nudges deeper processing of the (to-be-learned) materials and to prevent rote learning of false statements (Butler et al., 2013). More specifically, it has been argued that elaborate feedback on Test 1 may enhance students’ metacognitive awareness of memory errors and previously misunderstood concepts (cf. Metcalfe, 2009) as well as of critical versus superfluous aspects of the to-be-learned materials in terms of enhanced schema abstraction (cf., Corral & Carpenter, 2020). Furthermore, elaborate feedback may trigger students to a higher degree to critically evaluate and integrate the provided feedback, the error or misconception, and prior knowledge (Rich et al., 2017). This, in turn, may elicit processes of conceptual change that are relevant to correct errors or misconceptions (Kendeou & O’Brien, 2014; Prinz et al., 2019).

The above studies leave open relevant questions for research on the effect of feedback on acquiring factual knowledge in educational real-life contexts. For instance, Petrović et al. (2017) mixed in-class teaching with computerized quizzes. Elaborate feedback might potentially draw less attention in large online courses where quizzing is self-administered and self-paced. Given that, at the time of writing, the Covid-19 pandemic restricts education increasingly to online formats, there is a great potential to also use formative assessments for study purposes (and not only as a diagnostic tool) and to reap the benefits of feedback in formative testing. If students attempt to learn by quizzing before more seriously engaging with the course material, elaborate feedback might even strengthen learning on items students answered correctly, which may often be based on guessing or rather weak knowledge. Yet, additional feedback information on why correct answers are correct can be redundant when assessing retention of knowledge (Butler et al., 2013). This research question merits further investigation as the majority of studies have recently focused on the effect of elaborate feedback on revising misconceptions or allocating elaborate feedback to incorrect items (e.g., Ecker et al., 2020; Petrović et al. 2017; Rich et al., 2017). Elaborate feedback may be specifically effective in enhancing metacognitive awareness and inducing conceptual change process in simpler materials, such as factual true–false statements, compared to complex and abstract misconceptions. Yet, there is only sparse research on the relative benefits of different feedback types in formative assessment scenarios (cf. Lindner, 2015, 2018).

To close this gap, the main objective of this study was to evaluate the benefits of corrective versus elaborate feedback on knowledge acquisition in a naturalistic setting. More specifically, we conducted this study in a course at a distance university, including a large sample of psychology undergraduate students. More specifically, we implemented formative testing by means of an online quiz and manipulated feedback type in a within-subjects, mixed-list design, as discussed above. To assess factual knowledge in Session 1, students were tested on 60 verbal statements from course materials in the domain of biological psychology, followed by either corrective or elaborate feedback. After a self-regulated delay, the students’ test achievement was assessed in Session 2. According to the theorizing above in terms of enhanced metacognitive awareness and conceptual-change processes, elaborate feedback should be more potent than corrective feedback in fostering learning. We expected this to be the case for incorrect answers after both shorter and longer delays (see Butler et al., 2013; Ecker et al., 2020; Rich et al., 2017). Furthermore, we explored whether retention of correct answers may profit from elaborate feedback as well. Especially after longer self-regulated inter-session intervals, potential effects of memory consolidation might materialize for guessed or weak responses. Note that in prior studies, the potential benefit of elaborate feedback on knowledge revision may be confounded by a general increase in motivation: with feedback type being manipulated between-subjects, students received elaborate or corrective feedback on all quiz items for students (e.g., Chase & Houmanfar, 2009; Petrović et al., 2017). This might simultaneously enhance students’ motivation and knowledge revision within a student as a function of feedback type. To better isolate the effects on knowledge revision, a within-subjects, mixed-list design was implemented by providing corrective and elaborate feedback on different subsets of test questions. This way students were not able anticipate which items were followed by corrective versus elaborate feedback, and thus the differential benefits of feedback type on knowledge revision can be more selectively studied by holding motivational effects constant.

Method

Materials of this study are available in a public repository (cf. Grahe et al., 2020): https://osf.io/ebaq4/

Participants

As a prerequisite to taking the final course exam, students participated in a course activity, in which this online study was conducted. Before data collection, students were asked for informed consent to use their anonymized study data for research purposes (i.e., evidence-based improvements in teaching formats and materials). Importantly, students were informed that their consent was not required for successful participation in the online-course activity or gaining access to the final course exam. Also, they were specifically instructed on how informed consent could be withdrawn after completed study participation.

In total, 557 students took part in the course activity. Of these, 61 students were not included in the study (i.e., n = 38 did not provide informed consent; n = 23 did not take part in Session 2), resulting in a final sample of 496 students for the study. Data on age and gender were not collected within the course. However, prior studies using psychology students at the FernUniversität in Hagen revealed a rather diverse sample (Stürmer et al., 2018); mean age of 35 years, and 70% women, alleviating the gender bias that typically is observed in other psychology programs in Germany. Furthermore, 80% of the students are working while studying, and around 10% of the sample have acquired the university entrance qualification based on vocational qualifications rather than via a regular A-levels certificate.

Materials

Quiz statements were generated from the course materials on the biological bases of psychology. As a first step, contents were selected that could be transformed into a (1) correct statement (e.g., Action potentials transmit information based on their frequency), (2) paired incorrect statement (e.g., Action potentials transmit information based on their amplitude), and (3) a possible explanation for why the statement is (not) correct (e.g., Because there is a variation in frequency while form and amplitude show little variation among different instances of the action potential). In total, 60 correct and 60 paired incorrect statements were generated. As a second step, 60 statements were selected for the quiz from this material pool, with 30 statements being correct and 30 different statements being incorrect.

Design

In the current study, we employed a three-factorial 2 × 2 × 4 mixed design, with the first two factors being varied within-subjects. The first within-subjects factor is feedback type being provided in Session 1; half of the sentences were receiving corrective and half of the sentences elaborate feedback, with both halves being randomly intermixed (mixed-list design). The second within-subjects factor is response status of the sentences in Session 1, being answered either correctly or incorrectly. The third factor is the self-regulated delay by students to enter Session 2. To this end, we split the delays into four intervals (1st, 2nd, 3rd–5th, vs. >5th day after Session 1) and treated as a between-subjects variable.

Procedure

Data were collected during regular online course activities in a module covering biological bases of psychology (5 credit points) as well as learning, motivation, and emotion (5 credit points) at FernUniversität in Hagen. The module is offered every term and completed with a written exam. The course is delivered using the Moodle learning management system (Costello, 2013). Teaching materials include video lectures that can be watched on demand, texts that resemble textbook chapters, chats for posing, and discussing questions, as well as self-tests of the learned materials. To be eligible for an exam, students have to complete at least two out of three exam-related tasks in the first half of the semester. These tasks are offered in Moodle and include self-tests and questionnaires, with the aim to ensure that students refrain from delaying exam preparation to the second half of the semester. One of the three tasks offered in the timeframe of one month (November–December 2019) was to take a quiz on the course contents on the biological bases of psychology at two separate time points. Students were only eligible to participate once in the quiz in Session 1. It was necessary to complete the quiz in Session 1 for gaining access to the quiz in Session 2. To maximize learning gains, we instructed students to start the quiz in Session 2 following at least a 1-day delay after completing Session 1. Due to the applied setting of the study, students were provided with a flexible time frame in between Sessions 1 and 2. This was thought to make their studies meet the demands of their own private schedule (e.g., work or family occupation). Therefore, they were free to choose any time interval they liked within four weeks. In consequence, the course design did not permit an experimental manipulation of the delay between sessions.

In Session 1, 60 statements were presented one at a time in a uniquely randomized order, and students could process them in a self-regulated manner. Students were asked to indicate whether the statement was true, and received corrective feedback (see Figure 1). Feedback was visible until students, at their own pace, clicked to proceed to the next statement of the quiz. Critically, the feedback type was manipulated in Session 1 between statements. For half of the correct and incorrect statements that each student received, additional elaborate feedback (i.e., why the statement was correct/incorrect) was provided.

Figure 1.

Session 1 Test Phase and Feedback Phase. Note: The experimental procedure consisted of a test phase and a feedback phase, which typically involved 6 steps (see Step 5a for the manipulation of whether elaborate feedback was presented for a given statement or not). The example statement on the left includes elaborate feedback, while the example statement on the right includes corrective feedback.

In Session 2, the identical statements were presented for a self-regulated duration. This time, however, elaborate feedback was provided for all of these statements.

Results

Students paused between Sessions 1 and 2 for an average of 2.72 days (SD = 4.97 days, range 0 to 28 days). Test performance on average increased significantly from 63.19% correct in Session 1 (SD = 9.38%, chance baseline = 50% correct) to 78.49% correct (SD = 13.11%) in Session 2, t(495) = 29.18, p < .001, d = 1.31. This suggests that participants profited from testing and subsequent study activities. The low accuracy level in Session 1 suggests that many of the correct answers were based on guessing.

Next, we investigated the specific benefits of elaborate versus corrective feedback. In a two-factorial ANOVA, we tested whether the response status (correct vs. incorrect) and feedback type (corrective vs. elaborate) in Session 1 influenced percentage correct of the response rate in Session 2. As expected, correctly answered sentences in Session 1 were more often correctly answered in Session 2, as indicated by a main effect of response status (correct vs. incorrect) in Session 1, F(1, 495) = 744.61, p < .001, $η_{p}^{2}$ = .601. Critically, percentage correct in Session 2 was higher if elaborate feedback was provided in Session 1 compared to corrective feedback, as indicated by a main effect of feedback type, F(1, 495) = 87.87, p < .001, $η_{p}^{2}$ = .151. Most importantly, there was an interaction between the two factors, F(1, 495) = 65.80, p < .001, $η_{p}^{2}$ = .117, showing that the effect of feedback type was more pronounced for sentences answered incorrectly in Session 1. For them, the correct response rate in Session 2 varied significantly as a function of feedback type: It was around 10 percentage points higher when elaborate feedback (M = 70.25%, SD = 22.26%) compared to corrective feedback (M = 60.93%, SD = 22.78%) was provided, t(495) = 9.65, p < .001, d = 0.43. However, for sentences answered correctly in Session 1, the correct response rate in Session 2 was less than 1% higher for elaborate feedback (M = 86.82%, SD = 12.25%), compared to corrective feedback (M = 85.92%, SD = 11.98%), t(495) = 2.02, p = .044, d = 0.09.

We additionally explored whether the effect of elaborate feedback was present for short versus long delays between sessions. To this end, we split the self-regulated between-session delay into four time intervals (1st day after Session 1, n = 242; 2nd day, n = 82; 3rd–5th day, n = 92; > 5th day, n = 81), aiming for similar sizes of the categories except for the shortest delay group (1st day). As shown in Figure 2, there was no indication that the effect of feedback type changed with increasing time delay. To qualify this impression, we conducted two separate 2 (feedback type: corrective vs. elaborate) × 4 (delay: 1st, 2nd, 3rd–5th, vs. > 5th day) mixed ANOVAs on percentage of correct responses in Session 2. For sentences with incorrect responses in Session 1, there was a significant main effect of feedback type, F(1, 492) = 76.95, p < .001, $η_{p}^{2}$ = .135, and delay, F(3, 492) = 12.46, p < .001, $η_{p}^{2}$ = .071, but there was no interaction between both factors, F(3, 492) < 1. Single comparisons showed a significant benefit of elaborate feedback for all delays (p-values ≤ .003).

Figure 2

. The percentage of correct responses in Session 2 for sentences as a function of response status (incorrect vs. correct) in Session 1, feedback type (corrective vs. elaborate), and the self-regulated delay between both sessions (1st day after Session 1, n = 242; 2nd day, n = 82; 3rd–5th day, n = 92; > 5th day, n = 81). Error bars reflect the 95% confidence interval of the effect of feedback type (cf. Pfister & Jancyk, 2013).

For sentences with correct responses in Session 1, there was only a significant main effect of delay, F(3, 492) = 8.24, p < .001, $η_{p}^{2}$ = .048. There was neither a main effect of elaborate feedback, F(1, 492) = 3.13, p = .077, $η_{p}^{2}$ = .006, nor an interaction of delay and feedback, F(3, 492) = 1.17, p = .319, $η_{p}^{2}$ = .007.

We checked whether the delay-stability of the effect of elaborate feedback was robust with respect to the delay intervals chosen as categories. To do so, Spearman rank correlations were computed between (1) the delay between the sessions and (2) the recall advantage of items with elaborate feedback over items with corrective feedback for the full sample. Such a correlation was not significant, neither for items that had been answered correctly in Session 1 (r = −.033, p = .47) nor for items that were answered incorrectly (r = .038, p = .401).

General Discussion

In formative online-assessment, feedback is an important ingredient to foster learning. Hence, we compared the benefit of feedback on true–false questions for students’ knowledge acquisition in a biological psychology class. Although lab experiments have manipulated feedback type, few researchers have examined how the nature of feedback affects learning in a naturalistic learning environment. Furthermore, especially research on corrective versus elaborate feedback is sparse. The present field study helps to fill this gap in the literature.

The results show that students’ performance increased between Sessions 1 and 2, largely reflecting the well-known benefits of testing and feedback as well as the general cumulative learning gains that would be expected as the e-learning course progresses during the term. Most importantly, adding a supplementary explanation to the provided feedback helped students to acquire more fact-based knowledge, and thus students profited from this elaborate feedback even beyond potential benefits of testing and corrective feedback. The results are in line with previous studies showing that adding a supplementary explanation to refutation texts reduced the number of misconceptions in the context of knowledge revision (Ecker et al., 2020; Rich et al., 2017). Similarly, refutation (over standard) texts decreased the rate of statistical misconceptions and thereby enhanced textual comprehension and metacomprehension accuracy (Prinz et al, 2019). The results of this study help to enrich the perspective offered by earlier studies (Chase & Houmanfar, 2009; Gilman, 1969; Petrović et al., 2017) and provide support for the effectiveness of elaborate feedback on learning and knowledge acquisition. It would have been conceivable that elaborate feedback does not further enhance retention (see e.g., Butler et al., 2013; Thijssen et al., 2019) or even harm fact learning by drawing attention away from the statement towards processing the explanation. Such a potential disadvantage was not supported by our data.

The positive effect of elaborate feedback in our study was exclusively shown for statements that were answered incorrectly in Session 1. Given the low accuracy rate in Session 1, many of the correct answers likely were based on guessing. One could have expected that elaborate feedback on correct items, which were in the majority guesses, might have helped students to reassure their low-confidence correct answers and thereby covertly improved their learning. As we did not assess confidence ratings, future studies are encouraged to include such students’ metacognitive judgement and also a no-feedback baseline condition to more systematically examine the relative benefits of elaborate feedback in formative testing for both correct and incorrect answers. It is yet to be determined whether practitioners should also provide feedback on correct responses for their students and whether this will enhance long-lasting learning, confidence ratings, and transfer of knowledge. Through automated testing in e-assessments, elaborate feedback can be given cost- and time-effectively.

So far there is a dearth of evidence-based knowledge why elaborate feedback (relative to corrective) feedback enhanced retrieval-based learning in formal testing settings (Butler et al., 2013; Butler & Woodward, 2018). Several potential mechanisms are discussed in the literature. One mechanism may be that elaborate feedback on Test 1 might have raised students’ metacognitive awareness. That is, students were likely to be more sensitive to errors and previously misunderstood concepts, as well as they may have more actively monitored their learning activities (cf. Metcalfe, 2009). Relatedly, in example-based learning of more complex concepts, students may become more sensitive to the properties of the to-be-learned materials that are superfluous and can be ignored, or respectively to the more critical features (Corrall & Carpenter, 2020). With increased metacognitive awareness and accuracy, students might have engaged in more strategic self-regulation and thereby more effectively (re-)study the test materials and the provided explanations (Kornell & Finn, 2016; Metcalfe, 2009; Thiede, Anderson, & Therriault, 2003)-a possible explanation that should be examined more closely in future research on knowledge acquisition. A related mechanism is that elaborate feedback, similar to refutation texts, might have triggered a critical evaluation, elaboration, and integration of the provided feedback, misconception, and prior knowledge (Rich et al., 2017). Such conceptual-change processes (Kendeou & O’Brien, 2014; Prinz et al., 2019) are important to straighten out incorrect answers or misconceptions, and foster comprehension and retention of the studied materials (cf. Prinz et al, 2019). On a speculative note, elaborate feedback (compared to other feedback types) might be particularly effective in correcting high-confidence wrong answers or misconceptions (cf. Butler et al., 2008; Rich et al., 2017). While the present study cannot address this research question, future research is encouraged to assess students’ own confidence judgements about their single-choice answers, and to examine the effectiveness of different feedback types to correct misconceptions held with different confidence levels.

The current findings are of high practical utility for formative assessment, specifically in e-learning contexts. While in formative assessment, tests typically are designed to provide diagnostic information for the students, the present study demonstrated that elaborate feedback can further enhance the learning outcome and academic achievement beyond potential benefits produced by formative testing with true–false questions and corrective feedback. Importantly, the beneficial effects of elaborate feedback were demonstrated with educationally relevant materials in a naturalistic, self-regulated learning environment of formative testing, suggesting a rather high ecological validity of our findings. We speculate that providing elaborate feedback in retrieval-based learning is specifically effective in applied settings such as online courses, as the case in the present study. Here, elaborate feedback was provided to students’ answers on exam-relevant quiz questions, while the materials in many laboratory studies were instead course-unrelated and rather arbitrary. For this reason, students of the present study might have been additionally motivated to fully take advantage of the benefit provided by elaborate feedback, as it was of higher value to achieve better results in the final exam. Furthermore, prior experimental work even showed that elaborate feedback is effective in promoting the application of knowledge to new questions and contexts (cf. Butler et al., 2013; Corral & Carpenter, 2020; Moreno, 2004), and likely more so than enhancing retention (cf. Butler et al., 2013).

The present study has, however, also several limitations. Here we focused on the retention of knowledge by assessing the correct responses to the same true–false questions as in the initial quiz. However, providing an additional explanation may further the comprehension (from superficial fact knowledge to a deeper understanding; Butler et al., 2013) even for correct responses, and thereby their successful application to new contexts. Follow-up studies should examine whether elaborate feedback can further enhance transfer of learning to new questions in real-life educational contexts. Furthermore, the interval between both test sessions in this study was not controlled. Future research is encouraged to experimentally manipulate other factors such as type of the learning material, the feedback timing following the test question in Session 1, and the delay of Session 2 (Butler & Woodward, 2018; Fazio et al, 2010).

In sum, elaborate feedback provides an additional and effective learning gain beyond the potential benefits that testing knowledge and corrective feedback produces in true–false questions. Importantly, this pattern of results was obtained in a self-regulated online quiz embedded in a regular course and with a student sample characterized by large diversity in demographical and educational background, supporting the robustness, generalizability, and practical relevance of the presented findings. Future research is directed towards identifying the specific effects of different question formats and feedback types on knowledge acquisition and transfer of learning as well as to explore the moderating role of students’ characteristics in educational, cognitive, and motivational domains.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Author Biographies

Natalie Enders, PhD, is a research associate at the University of Hildesheim, Germany. She studied psychology for a diploma and holds a masters’ degree in Higher Education. Her research interest lies in improving students’ learning and instructional design in higher education.

Robert Gaschler graduated from Humboldt-Universität Berlin in 2009 with a dissertation on how people learn and manage to ignore irrelevant aspects of stimuli, followed by Post-Doc work on learning and motivation, before becoming an assistant professor for experimental psychology at Universität Koblenz-Landau in 2012. Since 2015 he is professor of experimental psychology (learning, motivation, emotion) at FernUniversität in Hagen with a particular interest in research on evidence-based (distance) learning.

Veit Kubik, PhD, is a research associate at Bielefeld University in Germany. Dr. Kubik teaches courses in educational and cognitive psychology, covering topics such as digital learning and teaching, training interventions, and self-regulated learning. His research focuses on assessing and enhancing students’ learning and metacognition in the lab, schools, and distant online courses, with a recent emphasis on learning with multiple media and self-explanations.

ORCID iD

Natalie Enders

References

Adesope

O. O.

Trevisan

D. A.

Sundararajan

(2017). Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research, 87(3), 659–701. https://doi.org/10.3102/0034654316689306

Arnold

K. M.

McDermott

K. B.

(2013a). Free recall enhances subsequent learning. Psychonomic Bulletin & Review, 20, 507–513. https://doi.org/10.3758/s13423-012-0370-3

Arnold

K. M.

McDermott

K. B.

(2013b). Test-potentiated learning: Distinguishing between direct and indirect effects of tests. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 940–945. https://doi.org/10.1037/a0029199

Bangert-Drowns

R. L.

Kulik

C. C.

Kulik

J. A.

Morgan

(1991). The instructional effect of feedback in test-like events. Review of Educational Research, 61, 213–238. https://doi.org/10. 3102/00346543061002213

Bjork

R. A.

(1975). Retrieval as a memory modifier. An interpretation of negative recency and related phenomena. In Solso

R. L.

(Hrsg.), Information Processing and Cognition. The Loyola Symposium (pp. 123–144). Erlbaum.

Bjork

E. L.

Bjork

R. A.

(2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In Gernsbacher

M. A.

Pew

R. W.

Hough

L. M.

Pomerantz

J. R.

(Eds.). Psychology and the Real World: Essays Illustrating Fundamental Contributions to Society (pp. 56–64). Worth Publishers.

Butler

A. C.

Godbole

Marsh

E. J.

(2013). Explanation feedback is better than correct answer feedback for promoting transfer of learning. Journal of Educational Psychology, 105(2), 290–298. https://doi.org/10.1037/a0031026

Butler

A. C.

Karpicke

J. D.

Roediger

H. L.

(2008). Correcting a metacognitive error: Feedback increases retention of low-confidence correct responses. Journal of Experimental Psychology. Learning, Memory, and Cognition, 34(4), 918–928. https://doi.org/10.1037/0278-7393.34.4.918

Butler

A. C.

Roediger

H. L.

(2008). Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Memory & Cognition, 36(3), 604–616. https://doi.org/10.3758/mc.36.3.604

10.

Butler

A. C.

Woodward

N. R.

(2018). Toward consilience in the use of task-level feedback to promote learning. Psychology of Learning and Motivation, 69, 1–38. https://doi.org/10.1016/ bs.plm.2018.09.001

11.

Carpenter

S. K.

Pashler

(2007). Testing beyond words. Using tests to enhance visuospatial map learning. Psychonomic Bulletin & Review, 14(3), 474–478. https://doi.org/10.3758/bf03194092

12.

Chase

J. A.

Houmanfar

(2009). The differential effects of elaborate feedback and basic feedback on student performance in a modified, personalized system of instruction course. Journal of Behavioral Education, 18(3), 245–265. https://doi.org/10.1007/s10864-009-9089-2

13.

Collins

D. P.

Rasco

Benassi

V. A.

(2018). Test-enhanced learning. Teaching of Psychology, 45(3), 235–238. https://doi.org/10.1177%2F0098628318779262

14.

Corral

Carpenter

S. K.

(2020). Facilitating transfer through incorrect examples and explanatory feedback. Quarterly Journal of Experimental Psychology, 73 (9), 1340–1359. https://doi.org/10.1177/1747021820909454

15.

Costello

(2013). Opening up to open source. Looking at how Moodle was adopted in higher education. Open Learning: The Journal of Open, Distance and e-Learning, 28(3), 187–200. https://doi.org/10.1080/02680513.2013.856289

16.

Craik

F. I. M.

Lockhart

R. S.

(1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behavior, 11(6), 671–684. https://doi.org/10.1016/s0022-5371(72)80001-x

17.

Ecker

U. K. H.

O’Reilly

Reid

J. S.

Chang

E. P.

(2020). The effectiveness of short-format refutational fact-checks. British Journal of Psychology, 111(1), 36–54. https://doi.org/10.1111/ bjop.12383

18.

Fazio

L. K.

Huelser

B. J.

Johnson

Marsh

E. J.

(2010). Receiving right/wrong feedback: Consequences for learning. Memory, 18(3), 335–350. https://doi.org/10.1080/09658211003652 491

19.

Gilman

D. A.

(1969). Comparing of several feedback methods for correcting errors by computer-assisted instruction. Journal of Educational Psychology, 60, 503–508. https://doi.org/10.1037/h0028501

20.

Grahe

J. E.

Cuccolo

Leighton

D. C.

Cramblet Alvarez

L. D.

(2020). Open science promotes diverse, just, and sustainable research and educational outcomes. Psychology Learning & Teaching, 19(1), 5–20. https://doi.org/10.1177/1475725719869164

21.

Greving

Richter

(2018). Examining the testing effect in university teaching: Retrievability and question format matter. Frontiers in Psychology, 9:2412. https://doi.org/10.3389/fpsyg. 2018.02412

22.

Haladyna

T. M.

Downing

S. M.

Rodriguez

M. C.

(2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–333. https://doi.org/10.1207/s15324818ame1503_5

23.

Izawa

(1966). Reinforcement-test sequences in paired-associate learning. Psychological Reports, 18(3), 879–919. https://doi.org/10.2466/pr0.1966.18.3.879

24.

Kang

S. H.

McDermott

K. B.

Roediger

H. L.

III . (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528–558. https://doi.org/10.1080/09541440601056620

25.

Karpicke

J. D.

(2012). Retrieval-based learning. Current Directions in Psychological Science, 21(3), 157–163. https://doi.org/10.1177%2F0963721412443552

26.

Kendeou

O’Brien

E. J.

(2014). The Knowledge Revision Components (KReC) framework: Processes and mechanisms. In Rapp

D. N.

Braasch

J. L. G.

(Eds.), Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences (pp. 353–377). MIT Press.

27.

Kornell

Finn

(2016). Self-regulated learning: An overview of theory and data. In Dunlosky

Tauber

S. K.

(Eds.), Oxford library of psychology. The Oxford handbook of metamemory (p. 325–340). Oxford University Press.

28.

Kubik

Olofsson

J. K.

Nilsson

L.-G.

Jönsson

F. U.

(2016). Putting action memory to the test: Testing affects subsequent restudy but not long-term forgetting of action events. Journal of Cognitive Psychology, 28(2), 209–219. https://doi.org/10.1080/20445911.2015.1111378

29.

Kubik

Jönsson

F. U.

Knopf

Mack

(2018). The direct testing effect is pervasive in action memory. Analyses of recall accuracy and recall speed. Frontiers in Psychology, 9, 1632. https://doi.org/10.3389/fpsyg.2018.01632

30.

Lindner

M. A.

Strobel

Köller

(2015). Multiple-Choice-Prüfungen an Hochschulen? Ein Literaturüberblick und Plädoyer für mehr praxisorientierte Forschung [Are multiple choice exams useful for universities? A literature review and argument for a more practice oriented research]. Zeitschrift für Pädagogische Psychologie, 29, 133–149. https://doi.org/10.1024/1010-0652/a000156

31.

Lindner

M. A.

Mayntz

S. M.

Schult

(2018). Studentische Bewertung und Präferenz von Hochschulprüfungen mit Aufgaben im offenen und geschlossenen Antwortformat [Students’ appraisal of open and closed response formats in university exams]. Zeitschrift für Pädagogische Psychologie, 32, 239–248. https://doi.org/10.1024/1010-0652/a000229

32.

Lindner

M. A.

Sparfeldt

J. R.

Köller

Lukas

Leutner

(in press). Ein Plädoyer zur Qualitätssicherung schriftlicher Prüfungen im Psychologiestudium [A plea for quality assurance of written exams in the study of psychology]. Psychologische Rundschau.

33.

Marsh

E. J.

Roediger

H. L.

Bjork

R. A.

Bjork

E. L.

(2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14(2), 194–199. https://doi.org/10.3758/bf03194051

34.

McDaniel

M. A.

Thomas

R. C.

Agarwal

P. K.

McDermott

K. B.

Roediger

H. L.

(2013). Quizzing in middle-school science: Successful transfer performance on classroom exams. Applied Cognitive Psychology, 27, 360–372. https://doi.org/10.1002/acp.2914

35.

McDaniel

M. A.

Little

J. L.

(2019). Multiple-choice and short-answer quizzing on equal footing in the classroom: Potential indirect effects of testing. In Dunlosky

Rawson

K. A.

(Eds.), The Cambridge handbook of cognition and education (p. 480–499). Cambridge University Press. https://doi.org/10.1017/9781108235631.020

36.

Metcalfe

(2009). Metacognitive judgments and control of study. Current Directions in Psychological Science, 18(3), 159–163. https://doi.org/10.1111/j.1467-8721.2009.01628.x

37.

Moreno

(2004). Decreasing cognitive load for novice students: Effects of explanatory versus corrective feedback in discovery-based media. Instructional Science, 32, 99–113. https://doi.org/10.1023/B:TRUC.0000021811.66966.1d

38.

Nguyen

McDaniel

M. A.

(2015). Using quizzing to assist student learning in the classroom. The good, the bad, and the ugly. Teaching of Psychology, 42(1), 87–92. https://doi.org/10.1177/0098628314562685

39.

Pan

S. C.

Rickard

T. C.

(2018). Transfer of test-enhanced learning: Meta-analytic review and synthesis. Psychological Bulletin, 144(7), 710–756. https://doi.org/10.1037/bul0000151

40.

Pashler

Cepeda

N. J.

Wixted

J. T.

Rohrer

Pashler

Cepeda

N. J.

, et al. (2005). When does feedback facilitate learning of words? Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(1), 3–8. https://doi.org/10.1037/0278-7393.31.1.3

41.

Petrović

Pale

Jeren

(2017). Online formative assessments in a digital signal processing course: Effects of feedback type and content difficulty on students learning achievements. Education and Information Technologies, 22(6), 3047–3061. https://doi.org/10.1007/s10639-016-9571-0

42.

Pfister

Janczyk

(2013). Confidence intervals for two sample means: Calculation, interpretation, and a few simple rules. Advances in Cognitive Psychology, 9, 74–80. https://doi.org/10.5709/acp-0133-x

43.

Phelps

R. P.

(2012) The Effect of testing on student achievement, 1910–2010, International Journal of Testing, 12 (1), 21–43, https://doi.org/10.1080/15305058.2011.602920

44.

Pridemore

D. R.

Klein

J. D.

(1995). Control of practice and level of feedback in computer-assisted instruction. Contemporary Educational Psychology, 20, 444–450. https://doi.org/10. 1006/ceps.1995.1030

45.

Prinz

Golke

Wittwer

(2019). Refutation texts compensate for detrimental effects of misconceptions on comprehension and metacomprehension accuracy and support transfer. Journal of Educational Psychology, 111(6), 957–981. https://doi.org/10.1037/edu0000329

46.

Pyc

M. A.

Rawson

K. A.

(2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language, 60(4), 437–447. https://doi.org/10.1016/j.jml.2009.01.004

47.

Rich

P. R.

van Loon

M. H.

Dunlosky

Zaragoza

M. S.

(2017). Belief in corrective feedback for common misconceptions: Implications for knowledge revision. Journal of Experimental Psychology. Learning, Memory, and Cognition, 43(3), 492–501. https://doi.org/10.1037/xlm0000322

48.

Roediger

H. L.

Karpicke

J. D.

(2006). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1(3), 181–210. https://doi.org/10.1111/j.1745-6916.2006.00012.x

49.

Roediger

H. L.

Marsh

E. J.

(2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology. Learning, Memory, and Cognition, 31(5), 1155–1159. https://doi.org/10.1037/0278-7393.31.5.1155

50.

Rowland

C. A.

(2014). The effect of testing versus restudy on retention. A meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463. https://doi.org/10.1037/a0037559

51.

Schneider

Preckel

(2017). Variables associated with achievement in higher education: A systematic review of meta-analyses. Psychological Bulletin, 143(6), 565–600. https://doi.org/10.1037/bul0000098.supp

52.

Schwieren

Barenberg

Dutke

(2017). The testing effect in the psychology classroom: A meta-analytic perspective. Psychology Learning & Teaching, 16(2), 179–196. https://doi.org/10. 1177/1475725717695149

53.

Simon-Dack

S. L.

(2014). Introducing the action potential to psychology students. Teaching of Psychology, 41(1), 73–77. https://doi.org/10.1177/0098628313514183

54.

Smith

M. A.

Karpicke

J. D.

(2014) Retrieval practice with short-answer, multiple-choice, and hybrid tests, Memory, 22:7, 784–802, https://doi.org/10.1080/09658211.2013.831454

55.

Stürmer

Christ

Jonkmann

Josephs

Gaschler

Glöckner

, et al. (2018). 10 Jahre universitäres Fernstudium in Psychologie an der Fernuniversität in Hagen. Psychologische Rundschau, 69(2), 104–108. https://doi.org/10.1026/0033-3042/a000400

56.

Tempel

Kubik

(2017). Test-potentiated learning of motor sequences. Memory, 25 (3), 326–334. https://doi.org/10.1080/09658211.2016.1171880

57.

Thiede

K. W.

Anderson

M. C. M.

Therriault

(2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95(1), 66–73. https://doi.org/10.1037/0022-0663.95.1.66

58.

Thijssen

D. H. J.

Hopman

M. T. E.

van Wijngaarden

M. T.

, et al. (2019). The impact of feedback during formative testing on study behaviour and performance of (bio)medical students: A randomised controlled study. BMC Medical Education, 19 (97). https://doi.org/10.1186/s12909-019-1534-x

59.

Toppino

T. C.

Brochin

H. A.

(1989). Learning from tests: The case of true-false examinations. The Journal of Educational Research, 83(2), 119–124. https://doi.org/10.1080/00220671.1989. 10885940

60.

Whyte

M. M.

Karolick

D. M.

Nielsen

M. C.

Elder

G. D.

Hawley

W. T.

(1995). Cognitive styles and feedback in computer-assisted instruction. Journal of Educational Computing Research, 12(2), 195–203. https://doi.org/10.2190/M2AV-GEHE-CM9G-J9P7

Online Quizzes with Closed Questions in Formal Assessment: How Elaborate Feedback can Promote Learning

Abstract

Keywords

Question Formats in Online Quizzes

Answering Questions as a Study Strategy

The Importance of Feedback for Successful Learning

Method

Participants

Materials

Design

Procedure

Results

General Discussion

Footnotes

Declaration of Conflicting Interests

Funding

Author Biographies

ORCID iD

References