Sage Journals: Discover world-class research

Abstract

This quasi-experimental study investigated the role of prior psychology knowledge and in-class retrieval activity in the testing effect. Undergraduate introductory psychology students (N = 53) from two classes at a small liberal arts college practiced retrieving information in class with multiple-choice quizzing and concept mapping. Prior psychology knowledge was measured using a 25-item multiple-choice pretest. Both students with high and low prior psychology knowledge had higher scores on examination material that was practiced in class with retrieval-based concept mapping compared to traditional multiple-choice quizzes and to no organized in-class retrieval activity at all. Only students with high prior psychology knowledge had higher scores on quizzed material compared to no organized in-class retrieval practice, and these scores were lower than those on material that was practiced with in-class concept mapping. In comparison to administering multiple-choice quiz questions, a more useful in-class activity might be to have students, especially those with less prior psychology knowledge, practice retrieving material through free recall and connection building activities such as a concept map.

Keywords

Concept maps prior knowledge testing effect

Introduction

Introductory psychology instructors must grapple with what in-class instructional activities will benefit their students most. There is a growing body of evidence in cognitive and educational psychology showing that retrieving information from memory can be used as an instructional tool instead of merely being a way to assess and document what has been learned (e.g., Carpenter, Pashler, & Vul, 2006; McDaniel, Roediger, & McDermott, 2007; Schwieren, Barenberg, & Dutke, 2017). However, it is unclear how this “testing effect” is influenced by students’ prior knowledge about the concepts being learned. The present study investigates the moderating role of prior knowledge in the testing effect, expands on the application of retrieval-based practice learning in an introductory psychology classroom setting, and demonstrates the effectiveness of a concept map retrieval practice tool that can be used in the classroom.

Cue Creation During Retrieval Practice

The dominant theoretical perspective used to explain the testing effect is that testing enhances the retrieval trace of the memory and therefore enhances learning (Karpicke & Blunt, 2011). Further, the testing effect is greater when the initial retrieval involves more effort than when it requires less effort on the part of the learner (e.g., Carpenter & DeLosh, 2006; Glover, 1989; Kornell, Hays, & Bjork, 2009; Pyc & Rawson, 2009). The retrieval effort hypothesis argues that “not all successful retrievals are created equal: given that retrieval is successful, more difficult retrievals are better for memory than less difficult retrievals” (Pyc & Rawson, 2009, p. 438). However, the nature of the memory trace that enhances subsequent recall is unclear. The elaborative retrieval hypothesis (Carpenter, 2009; 2011; Pyc & Rawson, 2010) suggests that retention is enhanced when individuals create effective mediating connections (i.e., words, phrases, or ideas that link concepts) during retrieval practice that they then can use as cues and pathways to better recall the information in a subsequent test. Carpenter and Yeung (2017) found that mediators activated during retrieval practice led to enhanced subsequent test performance, particularly after longer lags between the retrieval practice and testing. Thus, intentional semantic cue creation during retrieval practice seemed to lead to better subsequent recall of similar or related information.

However, in a meta-analysis of testing effect research, Rowland (2014) found limited support for a relationship between increased elaboration and the testing effect and suggested other more episodic or contextual contributions may be driving the testing effect. An episodic context account suggests the benefits of retrieval practice are derived from increasing the number of episodic memories tied to the information to be remembered (Karpicke, Lehman, & Aue, 2014). For example, Lehman, Smith, and Karpicke (2014) gave participants in a “retrieval practice” group 1 minute to recall as many words as possible from a list they had just studied. Participants in an “elaboration” group had to type the first two words that came to mind for each word in the list next to the word itself. Participants in the control group did math problems for 1 minute after viewing a list of words. Recall was greatest for the retrieval practice group, which suggested that elaboration or semantic cue creation did not benefit participants as much as free recall and was taken as support for the episodic context account. The authors suggested that during retrieval practice items become associated with contextual features associated with different temporal contexts. After repeated retrieval, the items may become easier to recall because they are no longer to be found in only a single episodic memory or context. However, in Lehman et al.’s (2014) experiment, students did not practice elaborative retrieval, only elaboration with the words present. Further, participants were asked to learn word lists, which is unlike the more complex material encountered in the classroom. Therefore, it is important to consider how the testing effect has been applied and studied in real world instructional settings.

Implementing the Testing Effect in the Classroom

A number of studies have established the robustness of the testing effect across a variety of instructional settings (see Agarwal, Bain, & Chamberlain, 2012, for a review; Batsell, Perry, Hanley, & Hostetter, 2017; Carpenter, Pashler, & Cepeda, 2009; Cogliano, Cardash, & Bernacki, 2019; McDaniel, Anderson, Derbish, & Morrisette, 2007; McDaniel, Thomas, Agarwal, McDermott, and Roediger, 2013; McDaniel, Wildman, & Anderson, 2012; McDermott, Agarwal, D’Antonio, Roediger, & McDaniel, 2014; Wooldridge, Bugg, McDaniel, & Liu, 2014). Many of these studies have employed multiple-choice quizzing as a retrieval tool. For example, Batsell and colleagues (2017) found that multiple-choice quizzes enhanced examination (exam) performance on non-lectured material in a college-level introductory psychology course both when the exam items were identical to quiz items and when the exam items were closely related but not identical. In a college-level online brain and behavior course, McDaniel, Anderson, Derbish, and Morrisette (2007) found that students performed better on unit-level multiple-choice exam questions when they had reviewed the material by taking either short-answer or multiple-choice quizzes instead of simply rereading. However, the benefits of short-answer quizzing were larger. Moreover, in a cumulative final exam the benefits of multiple-choice quizzing had nearly disappeared whereas significant benefits of short-answer quizzing remained. Finally, Wooldridge, Bugg, McDaniel, and Liu (2014) found that when multiple-choice exam items were identical to those on the quiz, the testing effect occurred, but when the exam items were only topically related to the quiz items (e.g., from the same chapter but different concepts), no testing effect occurred. Even students who could study from the quizzes did not benefit more than those using highlighting during restudy for those items only topically related. Thus it is clear the testing effect can be produced in the classroom but the benefits of certain retrieval tools, such as multiple-choice quizzing, despite popularity, might be limited in magnitude, duration, and extent or scope of generalization.

Instructors often aim for some degree of generalization of knowledge (Wooldridge et al., 2014). Therefore, the present study focused on the role of retrieval practice on exam material that was related, but not identical, to material that had appeared in the practice testing. Moreover, given that multiple-choice quizzing has been shown to be inconsistent in its effectiveness when applied to exam items that are related but not identical, this study compared multiple-choice quizzing to a retrieval tool that should, at least according to the elaborative retrieval hypothesis reviewed above, be more effective for related but not identical test items: concept mapping.

Retrieval Tools for the Testing Effect: Multiple-Choice Quizzes and Concept Maps

Although many studies have used multiple-choice questions as retrieval tools for practice testing, there is evidence that recall-based activities, such as in the short-answer questions used by McDaniel et al. (2007), can lead to greater benefits. One recall-based retrieval tool that may be particularly advantageous for encouraging elaboration and the generalization of material is concept mapping. A concept map is a graphic organization that overtly represents not only a person’s knowledge but also the connections among the concepts of a particular subject matter. Previous research has shown that a useful, perhaps necessary component in using a concept map as an effective retrieval-based learning activity is free recall. Karpicke and Blunt (2011) found no benefit over and above mere repeated study for participants who were asked to create a concept map using their notes from the studied material. However, in a follow-up investigation, Blunt and Karpicke (2014) found that when free recall was added to concept mapping (material such as notes was not present while students created the concept map), then the recall-based concept map enhanced memory as much as recalling information in the form of a paragraph. Thus, recall is an important component in effective retrieval-based learning activities, perhaps because of the effort it requires.

Over and above effort, the elaborative retrieval hypothesis (Carpenter, 2009; 2011; Pyc & Rawson, 2010) suggests that creating meaningful connections during retrieval practice enhances retention of material. Understanding similarities and differences, or the connections, among concepts is a form of relational processing. Encouraging relational processing rather than item-specific processing has been shown to enhance testing effects in laboratory learning paradigms (Mulligan & Peterson, 2015; Peterson & Mulligan, 2013). Concept mapping encourages explicit relational processing by requiring students to make connections between concepts. It is possible that relational connections are created and used during multiple-choice quizzing, but given the constraints of the retrieval tool, such connections would be created incidentally and covertly, rather than intentionally and overtly.

The Testing Effect and Prior Knowledge

The elaborative retrieval hypothesis suggests the more a student is able to connect ideas, the more they should benefit from the testing effect. Preexisting knowledge and experience provide a foundation into which to-be-learned information can be connected, anchored, and integrated during retrieval practice. Thus, it is possible that benefits of elaboration during retrieval practice may be influenced by how much prior knowledge a student has going into retrieval practice.

When assessing the role of prior knowledge in the benefits of learning strategies, results have been mixed. In their investigation of retrieval-induced forgetting, or impaired recall of related but unpracticed items, Carroll, Campbell-Ratcliffe, Murnane, and Perfect (2007) had first year psychology students (“novices”) and students with 4 or more years of psychology experience (“experts”) read two abnormal psychology case studies. Students answered 10 short-answer questions during the retrieval practice and then answered these same 10 questions as well as 10 novel items from the practiced case study and 10 items from the unpracticed case study 15 minutes or 24 hours later. Results showed there was no interaction between expertise level and whether the questions had been practiced. Thus, prior knowledge did not seem to influence benefits of retrieval practice using short-answer questions. Xiaofeng, Xiao-e, Yanru, and AiBao (2016) also found no difference in the benefits of retrieval practice for high and low prior knowledge learners. Using the same paradigm as Lehman and colleagues (2014), Xiaofeng and colleagues (2016) had psychology majors and non-majors study a list of psychology-related words and subsequently either try to recall the list or generate free associates to each of the list’s words. As with Lehman and colleagues (2014), subsequent recall was greatest for the retrieval practice condition. Interestingly, for the retrieval practice group there was no difference between performance among the majors and non-majors. However, psychology majors did recall more than the non-majors in both the elaboration and control conditions, suggesting that free-association-style elaboration without retrieval practice may benefit those with greater prior knowledge more than those with less prior knowledge.

Similarly, in exams asking novel, newly encountered questions, an elaboration-based technique involving self-explanation of causal connections, called “elaborative interrogation,” seems to be highly effective. However, the technique depends on prior knowledge. Specifically, this study showed that the more students already knew about a topic domain from their life experience, the more elaborative interrogation helped them add to that knowledge (Woloshyn, Pressley, & Schneider, 1992). In addition to using experience with a topic, prior knowledge of a discipline can also be assessed using a pretest. For example, Thompson and Zamboanga (2004) found that psychology knowledge from a 25-item, five-alternative-multiple-choice pretest predicted class achievement. The pretest included questions from a variety of topics covered in introductory psychology courses. Thompson and Zamboanga (2004) found that although pretest scores did predict subsequent achievement in the course, prior coursework in psychology was unrelated to course achievement. Thus, experience was not enough to adequately measure prior knowledge.

Cogliano, Kardash, and Bernacki (2019) applied the pretest approach in a study of retrieval practice. Specifically, these authors measured prior knowledge of topics by giving students a multiple-choice pretest comprising five items from each of five chapters. The items on the pretest were then used as retrieval practice items. Cogliano et al. (2019) compared exam performance on multiple-choice items that were identical to those that had appeared on both a multi-topic pretest and a subsequent in-class practice test (practiced items) to performance on items never seen before that tapped into the same concepts (indicative of transfer). Average exam scores for topics in which students had low prior knowledge were significantly higher for the practice-tested items than the new related items. For high prior-knowledge topics, the difference between practice-tested and new related items was negligible, suggesting that transfer of learning from practiced items to new related items only occurred with high prior knowledge topics.

Importantly, for Xiafeng et al. (2016), Carroll et al. (2007), and Cogliano et al. (2019), the target material to be learned via retrieval practice was all identical to the material being practiced, whether in attempted retrieval via free recall, as individual stimuli for elaboration via free association, or as items in multiple-choice quizzes. As noted before, instructors often care more about performance on non-identical, newly encountered exam material than on material that has already been encountered before in an identical form. Therefore, a question arises as to whether general prior psychology knowledge moderates the testing effect when items on the exam are only conceptually related to practiced material (rather than identical) and elaboration-based retrieval tools (such as concept maps) are used instead of multiple-choice testing.

Concept mapping produces a concrete, visual representation of the connections among the facts and ideas being learned, rather than leaving those connections implicit in a recall protocol, or haphazardly represented or even absent in a collection of multiple-choice or short-answer quiz questions, or individually connected to whatever might come to mind as in Lehman et al.’s (2014) free-association technique. When the learner has more prior knowledge of the subject matter (and hence is able to create more relationships and mediators that connect and integrate concepts), both implicit and explicit connection building ought to be useful. However, when a student has less prior knowledge, a retrieval tool that explicitly directs the learner to create mediators and write down concepts and the connections among them might be expected to increase subsequent performance on novel test items more than a retrieval tool that directs the learner to memorize facts, attend to fewer connections, and hold those connections in working memory as in multiple-choice quizzing. Thus, it is important to consider whether intentional elaborative retrieval practice would benefit students above and beyond multiple-choice quizzing, and whether the explicit production of connections required by concept mapping would increase retention for low knowledge learners as well as high knowledge learners. Both the elaborative retrieval hypothesis and the episodic context account would suggest that students will benefit from retrieval practice. The elaborative retrieval hypothesis further suggests all students will benefit more from intentional semantic cue creation (concept mapping) during retrieval than incidental semantic cue creation (such as might occur in multiple-choice testing), and that higher (relative to less) prior knowledge will allow students to benefit from both intentional and incidental cue creation.

Method

Participants

Participants were 53 of the 61 students enrolled in two introductory psychology classes at a small liberal arts college in the Midwest (M_age = 18.136, SD_age = 0.525; female n = 19, male n = 32, not reported = 2; 46 first years, seven sophomores). Four students did not sign the consent form and four others did not complete the course requirements. The two classes were taught by the same instructor, used the same materials (e.g., lectures, retrieval activities, exams, and Myers’ Psychology (2013) textbook), and followed the same class format (two sessions a week for 110 minutes each session). However, because class dynamics can be quite different between otherwise identical classes, class was included as a factor in the research design and analyses.

Design

A 2 (class: class 1, class 2) × 2 (prior psychology knowledge: low, high) × 3 (type of retrieval practice: multiple-choice, concept mapping, no retrieval) mixed-measures design was used to examine the role of class dynamic, prior psychology knowledge, and retrieval tool on exam performance. Similar to Thompson and Zamboanga (2004), prior knowledge was assessed with a 25-question multiple-choice pretest taken by all students. Scores ranged from 5 to 19. A median split was used to identify a low knowledge group (24 students who got 10 or fewer correct) and a high knowledge group (27 students who got 11 or more correct). Four topics were assigned to be practiced with concept maps by all students in both classes, four more were assigned to be practiced with quizzes by all students, and two chapters did not receive any practice. This was done in lieu of counterbalancing assignment of chapters to type of retrieval practice across classes to prevent contamination of the manipulation by across-class communication. All students received the same exam material.

Materials

Prior Knowledge Assessment

The pretest was comprised of 25 four-alternative multiple-choice questions, with three on research methods, four on biological psychology, one on developmental psychology, two on sensation and perception, two on learning, four on memory, four on thinking and language, two on personality and social psychology, and three questions about abnormal psychology. Questions on the pretest, the multiple-choice quizzes, and the four criterion exams described below were either directly taken or modified from questions in Brink’s (2013) Test Bank Volumes 1 and 2 for the corresponding chapter in the Myers (2013) textbook.

Retrieval Practice Manipulation

A quiz, a concept map, or no retrieval activity was the first task each class worked on during class and covered content from the previous class session. The instructor informed students in advance of the specific topic to be mapped or quizzed in the next class. To make sure students were using retrieval processes associated with the testing effect, no outside materials could be used while creating the concept map or taking the quiz.

Quizzes were comprised of six, four-alternative multiple-choice questions, and covered the topics of research design and analysis, vision (sensation and perception), biases and heuristics, and personality theories. Students practiced four other topics with concept maps: Piaget’s Theory of Development, The Modal Model of Memory, Psychological Disorders, and Forms of Conditioning. Material from the biopsychology and social psychology chapters was not practiced with a specific tool in class. The pretest tapped material relevant to the three types of retrieval practice at approximately equal rates. Specifically, 10 pretest questions pertained to chapters containing the four topics that were quizzed, 10 pretest questions pertained to chapters containing the four topics that were concept mapped, and five pretest questions pertained to chapters containing the two topics that were not retrieval practiced.

In the class prior to completing their first recall-based concept map, the instructor completed a concept map with the students. Students were given a modified version of Cañas and Novak’s (2009) “Constructing your First Concept Map” instructions (www.ihmc.us). The instructor and the students then created a concept map on the front chalkboard about descriptive methods in psychology. In both classes, this map covered case studies, surveys, and naturalistic observations as descriptive methods. This concept map was done together with notes as a guide for students and was not considered a retrieval-practice activity in the data analyses of this study.

For each concept map, students were told the number of first-level concepts that would be required and their titles. For example, for the concept map on Piaget’s Theory of Development, students were given the names of the four stages of development. Students were told the concepts inside the circles could be facts, examples of the concept, or applications in their own life. The teacher emphasized the importance of elaborating on the concepts and that each link could be used later to help the student remember the information in a different context. Such elaboration was not emphasized during quiz completion.

The instructor monitored student progress during concept map and quiz completion and asked students to turn over their map or quiz when they were finished. No time limit was imposed for either activity and the time on task was not measured. The instructor went over the quizzes and concept maps immediately after collecting them. Previous research has shown that items that are incorrectly retrieved may reduce retention when assessments are not accompanied by feedback (Pashler, Cepeda, Wixted, & Rohrer, 2005), and explicit expert guidance can act as a way to direct attention to relevant material during problem solving (Salden, Koedinger, Renkl, Aleven, and McLaren, 2010). Therefore, a discussion of why certain answers were correct or incorrect occurred in the review of both concept maps and quizzes. The graded quizzes and concept maps were returned within a week so students could use them when preparing for their exams. Although it is possible students took more time to construct concept maps than to take quizzes, the instructor noted the amount of time taken to review and provide feedback on the retrieved material was approximately the same (between 5 and 10 minutes).

In an attempt to ensure equal levels of motivation to study for the two different retrieval activities, the instructor formally graded all quizzes and concept maps. Students were also told a minimum number of correct connections that would be needed to receive full credit on their concept maps. For example, to receive all six points for the concept map on Piaget’s Theory of Development, students needed to have at least 12 correct connections or links (excluding the four primary links). Students lost points for missing or incorrect connections. An example of an incorrect connection would be a student who wrote that a child gains abstract thought processes in the concrete operations stage instead of the formal operations stage.

Criterion Assessment

Over the semester, four non-cumulative exams consisting of 25 four-alternative multiple-choice questions and either five or six short-answer questions were given. Because of the possible subjective nature of grading short-answer questions, only multiple-choice items were used in the analysis.

Multiple-choice exam items consisted of questions directly related to material practiced with quizzes and concept maps, which was from the same chapter as the topic covered on the quiz or concept map but was not directly related to the topics, and questions not related (from different chapters) to material retrieved in class. None of the exam questions had been presented before in the courses. To avoid potential confusion about how to categorize items that were in the same chapter (e.g., development) but not practiced during retrieval (e.g., Piaget’s theory was practiced in a concept map whereas teratogens were not practiced in class), we chose not to include same-chapter but unpracticed items in the analysis and instead compared questions directly related to practiced concepts with questions on topics from different chapters. Table 1 includes an example quiz item and a corresponding exam item as well as a concept map with a corresponding exam item. Again, none of the exam items were the same as any question that had been seen before in the course.

Table 1.
Example Items.

Example item from quiz Corresponding example item from exam

Myra has such low self-esteem that she is often on the lookout for critical moments about her appearance and personality. Myra’s behavior best illustrates the dangers of:a. confirmation biasb. trial and errorc. prototypesd. base-rate fallacy
Because she already believes boys are naughtier than girls, Mrs. Zumpano, a second-grade teacher watches boys more closely than she watches girls from any signs of misbehavior. Mrs. Zumpano’s surveillance strategy best illustrates:a. conjunction fallacyb. confirmation biasc. sunk cost effectd. hindsight bias

Example student produced concept map
Corresponding example item from exam

For no apparent reason, Marty had a sudden onset of anxiety with major autonomic nervous system arousal. This is characteristic of which of the following:a. a panic attackb. schizophreniac. phobiasd. depression

After removing items from the same chapters as the retrieved material, but not directly related to the topics covered during retrieval practice, there were 18 multiple-choice exam items that were directly related to concept-mapped material, 28 items that were directly related to quizzed material, and 16 items that were not retrieval practiced. Table 2 contains descriptive statistics.

Table 2.
Descriptive statistics

Measure Range of scores Observed range M (SD) Average percent correct

Pretest 0–25 5–19 10.94 (3.15) 43.76%

Concept mapped material 0–18 10–18 15.21 (2.23) 85.10%

Quizzed material 0–30 13–30 23.72 (3.93) 78.91%

No in-class retrieval practice 0–16 6–16 12.25 (3.93) 77.08%

Procedure

On the second class meeting of the semester, the professor described the study and asked if students were willing to let their grade information be used as part of the study. Then students completed the 25-question pretest. The first quiz was given to both classes during week 2. A practice concept map was completed as a class at the beginning of the third week of class and the first retrieval-practice concept map was created in class later that week. The first exam took place at the beginning of week 4. Students subsequently completed the other three non-cumulative exams at approximately 1-month intervals throughout the semester. Between each exam students completed one quiz and one concept map. This week-by-week order was followed in the second and fourth month. The concept map retrieval practice took place prior to the quiz in month 3. The order of the quiz and concept map between each of the four non-cumulative exams was driven by the teacher’s schedule of topics.

Results

A regression analysis found that overall pretest scores significantly predicted criterion assessment scores, β = 1.214, t(49)= 4.874, p < .001. Pretest scores also explained a significant proportion of variance in criterion assessment scores, F(1,49) = 23.758, p < .001, R² = .327.

Because the number of criterion items related to each type of retrieval practiced material differed, proportions of items answered correctly were used for further analysis. A mixed design analysis of variance was conducted to examine the role of class dynamic (class 1 and class 2), prior knowledge (high and low), and retrieval practice (concept maps, multiple-choice quizzes, and no retrieval practice) on exam performance. Retrieval practice was entered as a within-subjects variable whereas both class and prior knowledge were entered as between-subjects variables.

Retrieval Practice and Class

There was no difference in criterion assessment scores between classes, F(1, 47) = 1.222, p = .275. The results revealed a main effect for retrieval practice, F(2, 94) = 9.186, p <.001. Tukey honest significance difference (HSD) tests showed that concept mapping was associated with the highest exam performance (M = 85.10%), which was significantly greater (critical difference for p < .01 = 5.7%, for p < .05 = 4.5%) than the performance associated with either quizzing (M = 78.9%) or no retrieval practice (M = 77.08%). In comparison, Tukey HSD tests revealed no difference in performance between the quizzing and no retrieval practice conditions. The interaction between retrieval practice and class was not significant, F(2, 94) = 2.007, p = .140.

Retrieval Practice and Prior Knowledge

The high prior knowledge group scored higher overall (M = 83.6%) than the low prior knowledge group (M = 75.6%), F(1,47) = 12.045, p = .001. Importantly, the interaction between retrieval practice and prior knowledge was significant, F(2, 94) = 5.120, p = .008. The means pertaining to the interaction are depicted in Figure 1. Post hoc analyses found no difference between the scores for students with high and low prior knowledge on the non-retrieved material (p = .371), but they were different for multiple-choice quizzed, t(49) = 4.202, p < .001, d = 1.247 and concept-mapped material, t(49) = 3.251, p = .003, d = 1.645. For students with low prior psychology knowledge, performance on exam material practiced with concept mapping was greater than performance on the quizzed material, t(23) = 4.547, p < .001, d = .948 and the non-retrieved material, t(23) = 2.080, p = .049, d = .426. Performance on the quizzed and non-retrieved material did not differ significantly (p = .167). For students with high prior psychology knowledge, performance on exam material practiced with concept mapping was greater than performance on either the quizzed material, t(26) = 2.144, p = .042, d = .414, or the non-retrieved material, t(26) = 3.177, p = .004, d =.623, and performance on the quizzed material was greater than performance on the non-retrieved material, t(26) = 2.354, p = .026, d = .437. The three-way interaction between retrieval practice, prior knowledge, and class was not significant, F(2, 94) = .604, p = .549.

Figure 1.
The effect of prior knowledge and retrieval tool on exam performance. For students with low prior psychology knowledge, only concept mapping increased exam performance. For students with high prior psychology knowledge, both concept mapping and multiple-choice quizzing increased exam performance. Bars represent standard error.

Discussion

The current study focused on the role of prior knowledge and retrieval practice tool (multiple-choice quizzing, concept mapping, or no retrieval practice) in subsequent exam performance across two undergraduate psychology courses. Immediate feedback on quiz or concept map accuracy was provided as part of retrieval practice, and exam items were closely related to (but not exactly the same as) the content targeted by the retrieval tools (e.g., different questions about the same concepts, as illustrated in Table 1). We found no evidence of any differences between classes in testing effect outcomes. The remaining discussion will therefore focus on the key components of prior knowledge and retrieval practice tool.

Overall, data suggest that as pretest scores increased, so did scores on the criterion exams. This supports Thompson and Zamboanga’s (2004) findings that a 25-item multiple-choice pretest predicted course achievement in an introductory psychology course. However, in our results this relationship was moderated by the type of retrieval tool used to enhance performance in class.

Consistent with the elaborative retrieval hypothesis, results suggest students benefited more from intentional elaborative rehearsal in the form of concept mapping than they did from multiple-choice quizzing. Importantly, this main effect was moderated by prior psychology knowledge. For students with more prior psychology knowledge, both concept mapping and multiple-choice quizzing enhanced performance beyond no in-class retrieval at all, with the impact of concept mapping being the greatest. These results go beyond previous classroom studies in which multiple-choice quiz items have helped subsequent exam performance (Batsell, et al., 2017; Carpenter et al., 2009; McDaniel et al., 2013; McDaniel et al., 2011; McDaniel, Wildman, & Anderson, 2012, McDermott et al., 2014) by showing that although quizzing helped, concept mapping helped more. However, for students with less prior psychology knowledge, only concept mapping was associated with enhanced performance above and beyond no in-class retrieval practice.

Prior Knowledge and Cue Creation During Retrieval Practice

The testing effect has been criticized as simply encouraging teaching to the test when the items on a retrieval task (such as a quiz) are identical to the items on the final task (e.g., Ambrose, Bridges, DiPietro, Lovett, & Norman, 2010; Willingham, 2009). A true testing effect, where one is practicing the skill of retrieval noted in the testing-effect/retrieval-enhanced learning literature, can only be illustrated when the retrieval abilities that are practiced during the retrieval task generalize to problems on exams (or in real life) that were not present on the task used in the initial retrieval practice. Indeed, instructors commonly include exam questions that cover similar concepts to those found on the quizzes, but are discussed or applied differently than those given on the quiz (Wooldridge et al., 2014). These questions require transfer of learning to a novel situation. However, to date, laboratory assessments that take into account prior knowledge have only involved recall of exact items (Lehman, Smith, & Karpicke, 2014; Xiaofeng et al., 2016). Because the items utilized to examine the testing effect were identical in these studies, it is unclear whether students benefitted from repeatedly seeing the same items or whether learning was indeed driven by connections built during the retrieval practice. When Cogliano and colleagues (2019) compared performance of exam items that were identical to those practiced during retrieval to performance on never-seen but related exam items, they found that when students had low knowledge of topics, they did much better on identical items compared to related items. However, the difference was negligible when students had greater prior knowledge of the topics. Thus, they suggested that “retrieval practice can generalize to related, non-tested items, but only for content which is familiar” (p. 26). That multiple-choice quizzing benefitted students with more prior psychology knowledge even though quiz and exam items were only topically related in our study extends previous research by suggesting that multiple-choice quizzing can be useful as an in-class retrieval tool, but only if students have enough prior knowledge of the general topic to build upon.

A possible alternative explanation for our findings is that students with higher pretest scores are simply better test takers and this is why they did well on the pretest and benefitted from the quizzes. However, even students with higher levels of prior knowledge benefitted more from the concept mapping exercise than the quizzing. It is also possible the students with more prior knowledge did not have as much to learn, which is why both forms of retrieval practice benefitted them. However, if this were true then one would expect that higher levels of prior knowledge would also lead to higher exam performance on the non-retrieved material as well, which was not the case. Thus, the nature of the significant interaction lends support to the notion that elaboration and connection-building, or cue creation, plays an important part in the testing effect during retrieval practice. Further, when explicit intentional cue creation occurred during concept map retrieval practice, students seemed to benefit from the testing effect regardless of prior psychology knowledge level.

Evaluating Possible Mechanisms of the Testing Effect

In the current study we are unable to directly ascertain the underlying mechanism of how the use of concept mapping and quizzing relates to the testing effect. However, there are several possibilities that can be evaluated using the data from our study. First, it is possible that rather than the in-class retrieval activity, it was the ability to use the concept maps and quizzes to study for exams that led to the testing effect. If this was the case though, then one would have expected the students with low prior knowledge to also benefit from quizzing and not only from concept mapping. It is also possible that students spent more time studying for and creating the concept maps than the quizzes and that is why students benefitted from the concept maps more. If so, this would be a valuable use of study and instructional time rather than a problem, given the positive results of concept mapping that held regardless of amount of preexisting knowledge. However, this does not explain why higher prior knowledge students benefitted from quizzing whereas the lower prior knowledge students did not. Thus, it seems unlikely that concept mapping benefitted students solely because of the amount of time spent studying and creating concept maps.

Another possible explanation could be that students are exposed to more concepts when creating a concept map than when answering multiple-choice items. Whereas on the surface multiple-choice items appear to only be covering one concept (such as panic attack in our example multiple-choice question), students actually need to be familiar with at least four concepts presented in the response alternatives (panic attack, schizophrenia, phobia, and depression) and at least two concepts within the actual multiple-choice question (anxiety and autonomic nervous system arousal). Each multiple-choice question therefore actually represents perhaps five to six or more concepts each. This means that across a six-item quiz, students will be exposed to at least 30 to 36 concepts. It is therefore unlikely that students completing concept maps outperform students completing a multiple-choice quiz simply because of the number of concepts each group is exposed to. Future studies investigating more closely the number of concepts represented in a concept map versus a quiz could further clarify this possibility.

Yet another possible explanation could be that topics covered by the concept maps were less complex or were easier topics overall compared to the multiple-choice quizzes. This would explain why all students did better on material practiced with concept maps, but would not explain why high knowledge learners had higher exam scores on material practiced with multiple-choice retrieval than non-retrieved material whereas the low knowledge learners did not.

Finally, it is possible that benefits of retrieval practice may be driven by repeated exposure to the concepts in different contexts, as is suggested by the episodic context account of the testing effect (Karpicke, Lehman, & Aue, 2014). However, as in Thompson and Zamboanga’s (2004) work, prior psychology knowledge was operationalized with a 25-item multiple-choice rather than a measure of how much previous exposure students may have had with the material. Thompson and Zamboanga (2004) found it was not experience with the material, but knowledge of the material that predicted academic performance. Because no measure of experience or exposure to the content was assessed, we can only determine the role of measured previous psychology knowledge in the testing effect and not the role of exposure to material. Future research studies are needed to further disentangle the roles of topic difficulty, prior psychology knowledge, exposure to material, and retrieval activity in the testing effect.

Consistent with the elaborative retrieval hypothesis (Carpenter, 2009; 2011), it is possible that the testing effect occurred because students were able to draw and write out the concepts and connections on paper rather than having to hold them in working memory, which would be required when processing and answering multiple-choice quiz questions. That is, making the conceptual organization visually explicit may be what enhances learning and exam performance beyond traditional multiple-choice quizzing.

For multiple-choice quizzing, the lack of explicit connection making and the integration of knowledge that results from it may also be why it is not beneficial for some learners. In their review of the testing effect, Nguyen and McDaniel (2015) write that it is possible “that quizzing may strengthen memory for some information at the expense of related information” (p. 89). Peterson and Mulligan (2013) suggest a negative testing effect can occur when individuals attend to item-specific processing (information that differentiates items from one another) instead of relational processing (associations among a set of items) (also see Mulligan & Peterson, 2015). If multiple-choice quizzing does tend to fragment knowledge representation through item-specific processing, then its impact on exam performance might well be unhelpful, especially if exam items are not identical or very similar to those on the quizzes. Future research is needed to understand why explicit visually expressed organization during recall enhanced performance for learners with low prior knowledge, whereas the potential incidental connection building, episodic context practice, or other kinds of processing that are done during multiple-choice quizzing did not.

Conclusions

Given the ill-defined and mercurial nature of a classroom environment, it is important to consider the numerous factors that can impact student performance. In this study, we specifically examined how in-class retrieval activities and prior psychology knowledge played a role in exam performance. An in-class retrieval activity, concept mapping, that enabled and encouraged students to create effective connections that integrated and consolidated new knowledge enhanced subsequent performance. In particular, concept mapping helped students with lower prior psychology knowledge overcome the obstacles that stand in the way of mastering material. Thus, in comparison to creating and using multiple-choice quiz questions, a useful in-class activity might be to have students, especially those with less prior psychology knowledge, practice retrieving material through free recall and connection building activities such as a concept map.

Example item from quiz	Corresponding example item from exam
Myra has such low self-esteem that she is often on the lookout for critical moments about her appearance and personality. Myra’s behavior best illustrates the dangers of:a. confirmation biasb. trial and errorc. prototypesd. base-rate fallacy	Because she already believes boys are naughtier than girls, Mrs. Zumpano, a second-grade teacher watches boys more closely than she watches girls from any signs of misbehavior. Mrs. Zumpano’s surveillance strategy best illustrates:a. conjunction fallacyb. confirmation biasc. sunk cost effectd. hindsight bias
Example student produced concept map	Corresponding example item from exam
	For no apparent reason, Marty had a sudden onset of anxiety with major autonomic nervous system arousal. This is characteristic of which of the following:a. a panic attackb. schizophreniac. phobiasd. depression

Measure	Range of scores	Observed range	M (SD)	Average percent correct
Pretest	0–25	5–19	10.94 (3.15)	43.76%
Concept mapped material	0–18	10–18	15.21 (2.23)	85.10%
Quizzed material	0–30	13–30	23.72 (3.93)	78.91%
No in-class retrieval practice	0–16	6–16	12.25 (3.93)	77.08%

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Andrea P. Francis

Author Biographies

Andrea P. Francis is an Assistant Professor of Psychological Science at Albion College, where she has taught since 2010. As an educational psychologist, her teaching interests include educational psychology, child and adolescent development, research methods, and introductory psychology. Andrea’s research, which has been published in both educational and psychological journals, focuses on how individual differences in social experience influence the cognitive processes involved in learner creativity and criterion performance.

Mareike B. Wieth is a Professor of Psychological Science at Albion College, where she has taught since 2005. She is trained as a cognitive psychologist and regularly teaches research in cognitive psychology, introductory psychology, sensation and perception, and a course on drugs, brain, and behavior. Mareike’s research focuses on the impact of various individual differences on higher-order cognitive processes such as creativity, problem solving, and decision making. She has been an expert contributor on NPR and the BBC and her research on creativity has been featured in a variety of media and news outlets.

Kevin L. Zabel is an Assistant Professor in the Department of Psychology at the University of Wisconsin La Crosse. From 2015–2019, he served as an Assistant Professor in the Department of Psychology at Western New England University. A social psychologist, his teaching interests include social cognition, prejudice and stigma, diversity training, and research methods. His research focuses on examining factors that allow individuals to overcome the influence of their automatically activated attitudes on judgments and behaviors, as well as factors that orient individuals to multicultural messages and communication.

Thomas H. Carr is Professor Emeritus of Cognition and Cognitive Neuroscience in the Department of Psychology at Michigan State University, a member of the Board of Visitors of the Learning Research and Development Center at the University of Pittsburgh, and a member of the Advisory Panel of the James S. McDonnell Foundation’s Understanding Human Cognition Program. From 2005–2007 he served as the Frank W. Mayborn Chair of Cognitive Studies at Peabody College of Vanderbilt University and has held other visiting positions at Lake Forest College, the Sackler Institute for Developmental Psychobiology, the CNRS Laboratory for Cognitive Neuroscience, the University of Oregon, and IBM Watson Research Center. His teaching interests include human cognition, cognitive neuroscience, cognitive development, and psycholinguistics. Currently he teaches a freshman seminar called The Science of Learning: Studying, Learning, and Performing Under Pressure. A former Editor of Perception & Psychophysics and the Journal of Experimental Psychology: Human Perception and Performance, his research addresses learning and deployment of knowledge and skills, performance under pressure, and the neural substrates of these cognitive and motivational processes.

References

Agarwal

P. K.

Bain

P. M.

Chamberlain

R. W.

(2012). The value of applied research: Retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist. Educational Psychology Review, 24, 437–448.

Ambrose

S. A.

Bridges

M. W.

DiPietro

Lovett

M. C.

Norman

M. K.

(2010). How learning words: Seven research-based principles for smart teaching. Jossey-Bass.

Batsell

W. R.

Perry

J. L.

Hanley

Hostetter

A. B.

(2017). Ecological validity of the testing effect: The use of daily quizzes in introductory psychology. Teaching of Psychology, 44, 18–23.

Blunt

J. R.

Karpicke

J. D.

(2014). Learning with retrieval-based concept mapping. Journal of Educational Psychology, 106, 849–858.

Brink

(2013). Test bank volume 1 and 2 for David G. Myers Psychology tenth edition. Worth Publishers.

Cañas

A. J.

Novak

J. D.

(2009). Constructing your first concept map. Retrieved from: http://cmap.ihmc.us/docs/constructingaconceptmap.php

Carpenter

S. K.

(2009). Cue strength as a moderator of the testing effect: The benefits of elaborative retrieval. Journal of Experimental Psychology: Learning, Memory, & Cognition, 35, 1563–1569.

Carpenter

S. K.

(2011). Semantic information activated during retrieval contributes to later retention: Support for the mediator effectiveness hypothesis of the testing effect. Journal of Experimental Psychology, 37, 1547–1552.

Carpenter

S. K.

DeLosh

E. L.

(2006). Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect. Memory & Cognition, 34, 268–276.

10.

Carpenter

S. K.

Pashler

Cepeda

N. J.

(2009). Using tests to enhance 8th grade students’ retention of U. S. history facts. Applied Cognitive Psychology, 23, 760–771.

11.

Carpenter

S. K.

Pashler

Vul

(2006). What types of learning are enhanced by a cued recall test? Psychonomic Bulletin & Review, 13, 826–830

12.

Carpenter

S. K.

Yeung

K. L.

(2017). The role of mediator strength in leaning from retrieval. Journal of Memory and Language, 92, 128–141.

13.

Carroll

Campbell-Ratcliffe

Murnane

Perfect

(2007). Retrieval-induced forgetting in educational contexts: Monitoring, expertise, text integration, and test format. European Journal of Cognitive Psychology, 19(4–5), 580–606. https://doi.org/10.1080/09541440701326071

14.

Cogliano

M. C.

Kardash

C. A. M.

Bernacki

M. L.

(2019). The effects of retrieval practice and prior topic knowledge on test performance and confidence judgments. Contemporary Educational Psychology, 56, 117–129. https://doi.or10.1016/j.cedpsych.2018.12.001

15.

Glover

J. A.

(1989). The “testing” phenomenom: Not gone but nearly forgotten. Journal of Educational Psychology, 81, 392–399.

16.

Karpicke

J. D.

Blunt

(2011). Retrieval practice produced more learning than elaborative studying with concept mapping. Science, 331, 772–775.

17.

Karpicke

J. D.

Lehman

Aue

W. R.

(2014). Retrieval-based learning: An episodic context account. In Ross

B. H.

(Ed.), The psychology of learning and motivation: Vol. 61. (p. 237–284). Elsevier Academic Press.

18.

Kornell

Hays

M. J.

Bjork

R. A.

(2009). Unsuccessful retrieval attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 989–998.

19.

Lehman

Smith

M. A.

Karpicke

J. D.

(2014). Toward an episodic context account of retrieval-based learning: Dissociating retrieval practice and elaboration. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1787–1794. https://doi.org/10.1037/xlm0000012

20.

McDaniel

M. A.

Agarwal

P. K.

Huelser

B. J.

McDermott

K. B.

Roediger

H. L.

(2011). Test-enhanced learning in a middle school science classroom: The effects of quiz frequency and placement. Journal of Educational Psychology, 103(2), 399–414.

21.

McDaniel

M. A.

Anderson

J. L.

Derbish

M. H.

Morrisette

(2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494–513.

22.

McDaniel

M. A.

Roediger

H. L.

McDermott

K. B.

(2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200–206.

23.

McDaniel

M. A.

Thomas

R. C.

Agarwal

P. K.

McDermott

K. B.

Roediger

H. L. III.

(2013). Quizzing in middle-school science: Successful transfer performance on classroom exams. Applied Cognitive Psychology, 27, 360–372

24.

McDaniel

M. A.

Wildman

K. M.

Anderson

J. L.

(2012). Using quizzes to enhance summative-assessment performance in a web-based class: An experimental study. Journal of Applied Research in Memory and Cognition, 1, 18–26.

25.

McDermott

K. B.

Agarwal

P. K.

D’Antonio

Roediger

H. L.

McDaniel

M. A.

(2014). Both multiple-choice and short-answer quizzes enhance later exam performance in middle and high school classes. Journal of Experimental Psychology, Applied, 20, 3–21.

26.

Mulligan

N. W.

Peterson

D. J.

(2015). Negative and positive testing effects in terms of item-specific and relational information. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41, 859–871.

27.

Myers

D. G.

(2013). Psychology. Worth Publishers.

28.

Nguyen

McDaniel

M. A.

(2015). Using quizzing to enhance student learning in the classroom: The good, the bad, and the ugly. Teaching of Psychology, 42, 87–92.

29.

Pashler

Cepeda

N. J.

Wixted

J. T.

Rohrer

(2005). When does feedback facilitate learning of words? Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 3–8.

30.

Peterson

D. J.

Mulligan

N. W.

(2013). The negative testing effect and multifactor account. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1287–1293.

31.

Pyc

M. A.

Rawson

K. A.

(2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory? Journal of Memory and Language, 60, 437–447.[10.1016/j.jml.2009.01.004]

32.

Pyc

M. A.

Rawson

K. A.

(2010). Why testing improves memory: Mediator effectiveness hypothesis. Science, 330, 335.

33.

Salden

R. J. C. M.

Koedinger

K. R.

Renkl

Aleven

McLaren

B. M.

(2010). Accounting for beneficial effects of worked examples in tutored problem solving. Educational Psychology Review, 22(4), 379–392.

34.

Schwieren

Barenberg

Dutke

(2017). The testing effect in the psychology classroom: A meta-analytic perspective. Psychology Learning & Teaching, 16, 179–196.

35.

Thompson

R. A.

Zamboanga

B. L.

(2004). Academic aptitude and prior knowledge as predictors of student achievement in introduction to psychology. Journal of Educational Psychology, 96(4), 778–784.

36.

Willingham

D. T.

(2009). Why don’t students like school? A cognitive scientist answers questions about how the mind words and what it means for the classroom. Jossey-Bass.

37.

Woloshyn

V. E.

Paivio

Pressley

(1994). Use of elaborative interrogation to help students acquire information consistent with prior knowledge and information inconsistent with prior knowledge. Journal of Educational Psychology, 86(1), 79–89.

38.

Wooldridge

C. L.

Bugg

J. M.

McDaniel

M. A.

Liu

(2014). The testing effect with authentic educational materials: A cautionary note. Journal of Applied Research in Memory and Cognition, 3, 214–221.

39.

Xiaofeng

Xiao-e

Yanru

AiBao

(2016). Prior knowledge level dissociates effects of retrieval practice and elaboration. Learning and Individual Differences, 51, 210–214. https://doi.org/10.1016/j.lindif.2016.09.012

A Classroom Study on the Role of Prior Knowledge and Retrieval Tool in the Testing Effect

Abstract

Keywords

Introduction

Cue Creation During Retrieval Practice

Implementing the Testing Effect in the Classroom

Retrieval Tools for the Testing Effect: Multiple-Choice Quizzes and Concept Maps

The Testing Effect and Prior Knowledge

Method

Participants

Design

Materials

Prior Knowledge Assessment

Retrieval Practice Manipulation

Criterion Assessment

Procedure

Results

Retrieval Practice and Class

Retrieval Practice and Prior Knowledge

Discussion

Prior Knowledge and Cue Creation During Retrieval Practice

Evaluating Possible Mechanisms of the Testing Effect

Conclusions

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Author Biographies

References