Sage Journals: Discover world-class research

Abstract

Sleep right after studying new material is more conducive to memory than a period of wakefulness. Another way to counteract forgetting is to practice retrieval: taking a test strengthens memory more effectively than restudying the material. The current work aims at investigating the interaction between sleep and testing by asking if testing adds to, neutralizes, or decreases the effect of sleep on memory? We tested this in one pilot and one experiment by manipulating the timing of the practice test as well as whether practice was followed by sleep or wakefulness when learning foreign language vocabulary. Taking a delayed practice test significantly reduces forgetting for both the sleep and the wakefulness group. An immediate practice test, in contrast, had no such effect; here we find the standard beneficial sleep effect. However, the immediate practice test leads to higher recall in the final test in comparison to a delayed practice test, but only for the sleep group. Practical recommendations imply two things: first, if students study in the evening, they should test themselves immediately after learning. Second, if students study during the day the practice test should be delayed in order to reinforce memory and reduce forgetting of the material.

Keywords

Sleep retrieval practice forgetting memory consolidation

Introduction

Sleep that follows directly upon the encoding of new information reduces time-dependent forgetting, as is demonstrated in studies on the beneficial effect of sleep on memory. Jenkins and Dallenbach (1924), for example, first demonstrated superior memory for nonsense syllables following an interval containing sleep compared to an equally long interval of wakefulness. The beneficial effect of sleep on declarative memory has been observed for different study materials like nonsense syllables (Benson & Feinberg, 1975; Jenkins & Dallenbach, 1924), associative word pairs (Plihal & Born, 1997), vocabulary (Gais, et al., 2006), or word lists (Ficca et al., 2000). Overall, several studies have shown a beneficial effect of sleep over a wake period on memory leaving little doubt that sleep is a potent way to reduce forgetting of newly acquired material (e.g. Barrett & Ekstrand, 1972; Diekelmann et al., 2009).

The exact mechanisms underlying the benefit of sleep have not yet been fully uncovered, but different theoretical candidates have been proposed. Earlier theories assumed that sleep is a passive state, protecting learned material against retroactive interference (Jenkins & Dallenbach, 1924; for a discussion see Ellenbogen et al., 2006). However, newer research emphasizes active memory consolidation as explanation for the benefit of sleep (for a review, see Conte & Ficca, 2013). Memory consolidation is a process whereby previously formed labile memory traces are transformed and integrated into a network of pre-existing long-term memories. Consolidation during sleep is thought to diminish forgetting of the material leading to better performance at the delayed test after sleep compared to wakefulness (Diekelmann et al., 2009).

Another powerful strategy to reduce time-dependent forgetting is retrieval practice. Retrieving newly acquired information instead of simply restudying it has been shown to be a successful way to reduce forgetting (also called the testing effect; e.g. Roediger & Karpicke, 2006; Roediger & Butler, 2011) even without feedback in the testing situation (Karpicke & Roediger, 2010). Furthermore, the testing effect could be demonstrated in teaching and learning psychology (Schwieren et al., 2017) showing that the effect is not restricted to laboratory studies. The strength of this effect depends on different factors like the difficulty of the retrieval task (Carpenter & DeLosh, 2006), the retention interval between retrieval practice and final test (Roediger & Karpicke, 2006), the number of the practice tests (Karpicke & Roediger, 2010), or the timing of the practice test (Karpicke & Roediger, 2007; Karpicke & Roediger, 2010). Karpicke and Roediger (2007), for instance, showed that delaying the practice test was more conducive to long-term retention of the learned material than an immediate practice test. Testing an item immediately after learning led subjects to recall this item while it was still in their immediate awareness, which is similar to a massed learning situation (Karpicke & Roediger, 2007; Karpicke & Roediger, 2010). However, the influence of the timing of the practice test seems to be more complicated. In contrast to Karpicke and Roediger (2007), Karpicke and Roediger (2010) reported positive and negative effects of delayed testing. In Experiment 1, they found that taking an immediate first test improved long-term retention, in Experiment 2 no such effect could be detected. However, it has to be noted that the used materials differed between Karpicke and Roediger (2007) and Karpicke and Roediger (2010). Whereas Karpicke and Roediger (2007) used vocabulary word pairs, Karpicke and Roediger (2010) used brief text passages. Furthermore, the delayed cued recall test used by Karpicke and Roediger (2007) occurred already after five trials after studying a wordpair. The first delayed test used by Karpicke and Roediger (2010) occurred approximately after eight minutes.

The current set of experiments examines the interaction between sleep and retrieval practice more closely. A previous study by Bäuml et al., (2014) found that sleep that followed directly upon encoding affected word material that was retrieved versus restudied differently: whereas the wakefulness group benefited from testing by showing better memory performance for previously retrieved versus restudied items, a reduced or even eliminated testing effect for the sleep group could be found. Similar to other studies, the sleep and the wakefulness group were tested 12 hours after the initial learning phase. Interestingly, the reason for this reduced testing effect was that sleep benefited recall of restudied items but left recall of retrieved items unaffected. When comparing memory for items between the sleep and the wakefulness group, the sleep group showed improved performance for restudied items, but no enhanced memory for retrieved items. It seems that beneficial effects from retrieval practice reduce the beneficial effect of sleep. Moreover, these results were replicated using spatial memory by Antony and Paller (2018). However, a recent study from Abel et al. (2019) showed benefits of sleep on recall after retrieval practice, but only if it was combined with feedback. How can these results be explained? As the authors discussed, it may be that retrieving information strengthens fragile memory traces. Therefore, the beneficial effect of sleep on memory cannot improve recall any further. When giving feedback, previously non-retrieved items can now be again forgotten in the time between learning and final test or benefit from the positive effect of sleep on recall.

In contrast to Bäuml et al. (2014), we were not interested in evaluating the differential effect of restudied versus retrieved items. As studies have repeatedly shown the superiority of testing as a learning event (e.g. Roediger & Butler, 2011), we instead aimed at manipulating the timing of the practice test to determine its optimal placement in a sleep-wake-paradigm. There are several studies showing that students often prefer massed (= immediate learning of all the material) instead of spaced learning (e.g. Taraban et al., 1999). Given the amount of accumulating evidence and the general recommendation from learning scientists (Dunlosky et al., 2013; Roediger & Pyc, 2012) that students should take tests to boost their memory, how does this recommendation systematically interact with other factors that are part of their daily lives such as sleep, which has also been shown to reduce forgetting? Does the timing of a practice test make a difference, and if so, when exactly should it optimally be taken to reduce forgetting in the sleep-wakefulness paradigm?

To evaluate this research question, we drew on a design often used in studies addressing research on sleep-associated memory consolidation to the extent that in one condition participants’ memory for the material was tested immediately after acquisition (which was considered as a learning trial) and again assessed after a retention interval filled with wakefulness or sleep 12 hours later. Similar to other studies (Gais et al., 2006, 2007), we were primarily interested in forgetting between the first test after studying and the final test following the retention interval. Our experiments added a delayed retrieval practice condition that tested newly acquired material at a delay of two hours¹ after acquisition to evaluate how this affects forgetting after sleep versus wakefulness.

This has practical implications, students would benefit from knowing the optimal time for a practice test during the day and before going to bed to improve their study outcome. Psychology students, for example, have to learn a lot of facts, like the names of brain structures, neurotransmitters, and hormones, but also a lot of different theories and corresponding experiments. Knowing the optimal schedule for learning could be an effective and time saving strategy. Optimal learning schedules not only improve memory, they can also improve the metacognitive monitoring and, therefore, the regulation of study behaviour. Students that accessed course material continuously through the semester showed better metacognitive learning outcomes and a higher accuracy of confidence judgements (Barenberg, et al., 2018).

We think that two hypotheses can be tested: first, based on Karpicke and Roediger (2007) it is possible that delayed testing aids to strengthen fragile memory traces result in less forgetting. This may particularly benefit the wakefulness group and not the sleep group as much because sleep-induced strengthening already leads to a strengthening of fragile memory traces. Consequently, we would not expect a difference in forgetting between the wakefulness and sleep group when delayed testing is in place. On the other hand, the sleep benefit over wakefulness would occur in the immediate testing condition because no additional boost for the wakefulness group occurs here. We call this the wakefulness aid hypothesis.

Second, it may be the case that beside the positive effect of delayed testing on memory, sleep still has an additional effect on memory. Thus, in this case we would expect a main effect of time of the practice test and sleep versus wakefulness – with less forgetting occurring after sleep than after a period of wakefulness – but no interaction.

Overview of the Experiments

We conducted a pilot experiment in the laboratory and a main experiment on the Internet. In both experiments, we manipulated the timing of the practice test (immediate vs. delayed) as well as whether practice was followed by an interval including or excluding sleep. During the initial learning session, participants studied foreign vocabulary and took a practice test without feedback immediately or two hours later. After a 12-hour retention interval, we assessed memory performance of the vocabulary and calculated forgetting during the 12-hour interval between practice and final test performances (see Figure 1 for an overview).

Figure 1.

Schematic presentation of all conditions in the pilot and the main experiment: In the 12-h wake condition (group I and II), the learning of the material took place at 9 am. In the 12-h sleep condition (groups III and IV), the learning of the material took place at 9 pm. For the immediate testing groups the practice test followed immediately after the two learning rounds; in the delayed testing groups the practice test started at 11 a.m. or p.m. (depending on sleep or wakefulness condition). (LL) two learning cycles, (PT) practice test, (FT) final test.

Pilot Experiment

To obtain initial data for our research question and to test if the strategical use of a two-hour delay between learning and practice test in the delayed test groups was appropriate we conducted a pilot experiment in the laboratory.

Method

Participants

Thirty-eight German-speaking Psychology undergraduates at the University of Mannheim participated in this laboratory experiment. Two participants had to be excluded due to an experimenter error which had these participants complete both the immediate and the delayed test. The remaining 36 participants (26 female, M_age = 21.61 years, SD_age = 2.80 years, age range = 19 to 36 years) took part in the experiment in exchange for course credits. Participants were randomly assigned to four experimental conditions and were fairly evenly distributed across experimental conditions (N_{sleepimmediate} = 8; N_sleepdelayed = 10; N_{wakeimmediate} = 9; N_wakedelayed = 9).

Design

We manipulated the delay of the practice test and sleep versus wakefulness in a 2 (condition: sleep versus wakefulness) × 2 (delay of practice test: immediate versus delayed) between-subjects design.

Materials and Procedure²

Session 1. Depending on the experimental condition, participants started the experiment either at 9 a.m. (wakefulness group) or at 9 p.m. (sleep group). Participants were tested in groups in a lab in Mannheim ranging in size from one to ten. All lab sessions were programmed using E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA); the online session was programmed using PHP. They were instructed that the experiment consisted of different parts. In addition, participants were informed about the timing of the later tests. After signing consent forms, they started the experiment at the computer. They were asked to study 40 Polish-German vocabulary pairs (Undorf & Erdfelder, 2013) in order to remember the German word in a later cued recall task. Participants completed a total of two study cycles: a study cycle contained a randomized presentation of all 40 vocabulary pairs at a rate of 6 seconds per pair followed by a short 2-min arithmetic distractor task. During the distractor task, simple math equations with solutions were presented and participants had 8s to decide if the equation was correctly solved or not. Afterwards, participants were either given the practice test (immediate practice test condition) or dismissed and emailed the link to the online version of the practice test two hours later (delayed practice test condition). The practice test was a cued recall test where the Polish words were presented randomly one at a time as cues to remember the German translation. Participants then had to write down the word they remembered via the keyboard. The test was self-paced, and no feedback was provided. Participants were randomly assigned to the immediate or the delayed practice test groups.

Session 2. Participants came to the laboratory for their final session for the final cued recall test. Participants were asked to recall the German translation to all 40 Polish words. Again, the Polish words were presented randomly, one at a time on the screen and participants had to write down their answers via the keyboard. The test was self-paced, and no feedback was provided. In the present experiment, the final test session always took place at either 9 a.m. in the sleep condition or at 9 p.m. in the wakefulness condition. Therefore, the retention interval (measured from the end of the practice test) was effectively 10 hours in the delayed practice test condition and 12 hours in the immediate practice test condition. This was done for the sake of simplicity of running the study in the laboratory. We are aware that this may be a potential confounding factor. However, after consulting our data we find no indication of a confounding effect as participants in the delayed practice test condition did not generally outperform participants in the immediate practice test condition. Thus, the difference in retention interval did not affect our results.

Analysis Strategy

To give an optimal overview of our data, we report recall rates as a measure of memory and memory change as a measure of forgetting. Interestingly, there is an ongoing debate on how to measure forgetting (see Loftus, 1985; MacDonald, et al., 2006). Gais et al.(2006) measured recall and forgetting to test if sleep affects performance differently than a period of wakefulness. Recall was measured as the number of correctly remembered words, and forgetting was measured as the average individual per cent change in recall performance across the retention interval. For Gais et al.(2006), forgetting was ’indicated as average individual percent change in recall score across periods of sleep and wakefulness’ (p. 261). According to MacDonald et al. (2006), forgetting represents a decline in performance across different testing times. It is not possible to measure forgetting in a direct way; it can only be examined by analyzing performance changes over time. The authors also point out that forgetting always meant an absolute decrement in performance. Therefore, it is ‘independent of initial performance level’ (p. 369). Moreover, because of this independence, different baseline performances between groups should not influence forgetting curves and are not of interest for the forgetting research (MacDonald et al., 2006).

Results

The significance level was set to α = .05 for all statistical tests.

Participants in the sleep condition reported having slept regularly during the night (M = 7.22 hours; SD = .86), whereas those in the wakefulness condition reported not having taken naps during the day. None of the participants reported alcohol intake between sessions.

Forgetting Between the Practice Test and Final Test

A 2 (test: practice test vs final test) × 2 (condition: sleep vs wakefulness) × 2 (delay of the practice test: immediate vs delayed) repeated measurement ANOVA revealed no main effect of the test, F(1, 32) = 0.003, p = .95, η_p² < .01, no significant main effect of condition, F(1, 32) = 3.65, p = .07, η_p² = .10, nor a significant interaction between test and condition, F(1, 32) = 0.99, p = .33, η_p² = .03, or between the test and delay of the practice test, F(1, 32) = 1.24, p = .27, η_p² = .04 or between condition and delay of the practice test, F(1, 32) = 0.99, p = .33, η_p² = .03. However, there was a significant main effect of delay of the practice test, F(1, 32) = 7.73, p = .009, η_p² = .19, showing better memory in the immediate than in the delayed practice test. Furthermore, there was a significant three-way interaction between the test, condition and delay of the practice test, F(1, 32) = 4.71, p = .038, η_p² = .13 indicating that memory in the two conditions (sleep vs wakefulness) was differently affected by immediate and delayed testing. Specifically, in the immediate practice test condition the standard positive sleep effect occurred: participants who slept between the end of practice and the final test showed less forgetting than participants who were awake during that time period. In contrast, in the delayed practice test condition the sleep effect was eliminated: it seems to be that participants in both the sleep and wakefulness condition showed similar forgetting between practice and final test session.³ Figure 2 (lower panel) shows the mean recall rates for all four groups.

Figure 2.

Forgetting over time (based on the total number of correct recalled items) between the practice test and final test (immediate testing: immediately after the learning phase; delayed testing: 2 hours after the learning phase) for cued recall of the English word in the pilot (lower panel) main experiment (upper panel). Error bars represent standard errors of the means.

To assess forgetting between the practice test and the final test, we also calculated memory changes by subtracting the performance on the practice test (= number of correct German translations of the Polish vocabulary) from the performance on the final test. Figure 3 shows the memory change for all four groups. We report the Bayes Factors (BFs) to weigh the null and alternative hypotheses against each other. The BF indicates the likelihood with which the data are obtained under one hypothesis as compared to the other hypothesis. According to Jeffreys (1961, Appendix B), Raftery (1995) and Wetzels et al. (2011), a BF between 1 and 3 can be interpreted as weak evidence, a BF between 3 and 20 as positive or substantial evidence, a BF between 20 and 150 as strong or very strong evidence, and a BF larger than 150 as very strong or decisive evidence for one of the hypotheses. Because the BF can be used to measure the likelihood of the H₁ as well as the likelihood of the H₀. an index denotes which hypothesis is tested (BF₀₁ for the null hypothesis; BF₁₀ for the alternative hypothesis). All reported BFs were conducted using JASP (JASP Team, 2020; Marsman & Wagenmakers, 2017). We used the default prior options for the effects. The BF₁₀ for the main effect of condition (sleep vs wakefulness) was 0.49. However, the BF₀₁ for the null hypothesis was also only 2.04. Similarly, for the main effect of delay of the practice test (immediate vs. delayed) the BF₁₀ was 0.56, the BF₀₁ was 1.79. For the interaction between condition and delay of the practice test the BFs were BF₁₀ = 0.84 for the alternative hypothesis and BF₀₁ = 1.19 for the null hypothesis. Overall, the BFs were inconclusive.

Figure 3.

Memory change (based on the total number of recalled items) between the practice test (immediate group: immediately after the learning phase; delayed group: 2 hours after the learning phase) and final test (test session) for cued recall of the German word (given the Polish word as cue) in the pilot. Error bars represent standard errors of the means.

Discussion

We found less forgetting after a period of wakefulness for the delayed testing group in comparison to the immediate testing group. The benefit of sleep was revealed only in the immediate practice test condition. The elimination of the sleep-related memory benefit was mainly driven by a decrease in forgetting in the wakefulness group. In terms of forgetting, the sleep group was not affected by the delay of the practice test. Forgetting was similar irrespective of the time of the practice test. In line with this, we found a comparable pattern of forgetting over time for the sleep and the wakefulness condition in the delayed testing condition but not in the immediate testing condition (Figure 2, lower panel). However, the BFs showed inconclusive results: there was neither a clear indication for the alternative hypothesis nor for the null hypothesis. However, the pilot study used only 36 participants. This could be the reason for these inconclusive results.

Overall, the findings from our pilot experiment are more in line with the proposed wakefulness aid hypothesis. Delayed testing presumably aided the strengthening of memory traces in the wakefulness group whereas sleep in itself had a sufficiently beneficial effect on memory stabilization and protection leading to less forgetting. Immediate testing reveals the typical benefit of sleep in comparison to wakefulness. However, we also had a few limitations in the pilot that we address in turn. First, as the BFs indicated, we may have had too few participants to come to a clear decision between the alternative and null hypothesis. Second, the final test session was 12 hours after learning for the immediate testing group, but only 10 hours for the delayed testing group. Third, participants took the practice test either in the laboratory (immediate testing) or on the Internet (delayed condition). It is possible that the change of context in the delayed testing group led to a disruption in processing leading to an overall decrease in performance compared to the immediate testing group. Fourth, we did not control whether the delayed testing groups did the online test exactly 2 hours after the first learning session. Fifth, some of the Polish-German words were similar to each other and therefore easier to learn.

Experiment

The main experiment was run to test the replicability of the new finding using a larger and more heterogeneous sample of participants while remedying the abovementioned points. The experiment differed from the pilot in three major aspects: (i) we used Lithuanian–English vocabulary (Grimaldi & Rawson, 2010) instead of Polish-German vocabulary, (ii) the experiment was run on the Internet and participants were native English speakers, and (iii) the delay between the practice test and final test was the same in all conditions and fixed at a length of 12 hours.

Method

Participants

Sixty-three English-speaking people participated in this web-based experiment. All participants were recruited from the online platform Amazon Mechanical Turk. To obtain high quality data, we restricted study access to people who had at least completed 500 studies on Amazon Mechanical Turk before and who had an approval rate of at least 95% (see Finley, 2015⁴). The study was accessible to participants on Amazon Mechanical Turk located in the United States or Canada. Of all participants, six had to be excluded because they either reported they were familiar with the Lithuanian language or had cheated on the memory tests (e.g. writing down vocabulary pairs during initial presentation and using their notes during the test). A final sample of 57 (39 female) participants remained and was included in the analyses (N_{sleepimmediate} = 14; N_sleepdelayed = 16; N_{wakeimmediate} = 15; N_wakedelayed = 12). Their ages ranged from 22 to 40 years (M = 32.30; SD = 5.05). Participants received a total of $1.50 for their participation in the experiment.

Materials

Materials comprised 40 Lithuanian–English word translations of similar difficulty taken from Grimaldi and Rawson (2010). All translations were unfamiliar to the participants. Participants were asked to study the English translation of each Lithuanian word.

Design

The experiment included a 2 × 2 between-subject design with the factors delay of the practice test (immediate vs. delayed) and condition (12-h wake versus 12-h sleep). The final test session followed 12 hours after the practice test regardless of whether it was immediate or delayed. Although we did not find any effect of slightly different retention intervals in the pilot, we felt that it would be experimentally cleaner to hold the retention interval constant across conditions. In addition, the links leading to the study were only active for a short time, therefore, it was not possible to conduct either the learning or the testing sessions later or earlier. Moreover, this time, the entire experiment was conducted online. Therefore, all sessions were comparable in regard to context between the groups.

Procedure

The procedure of the present experiment was the same as that of the pilot. The only major difference was that participants used their own computer devices to access and run the experiment through their web browsers. To make sure that participants logged on to the experiment at the correct times, automated reminder emails were sent to them at the respective times. The experiment was programmed in HTML, PHP, and Javascript and participants could only access the study at their predetermined times and for a short time only.⁵

Results

The significance level was set to α = .05 for all statistical tests. Again, we report recall rates as a measure of memory and memory change as a measure of forgetting. Participants in the sleep condition reported having slept regularly during the night (M = 6.9 hours; SD = 1.34) whereas participants in the wakefulness condition reported not having taken naps during the day.

Forgetting Between the Practice Test and Final Test

A 2 (test: practice test vs final test) × 2 (condition: sleep vs wakefulness) × 2 (delay of practice test: immediate vs delayed) repeated measurement ANOVA revealed no main effect of test, F(1, 53) = 1.80, p = .19, η_p² = .03, no significant effect of condition, F(1, 53) = 0.63, p = .43, η_p² = .01, and no significant effect of delay of the practice test, F(1, 53) = 0.29, p = .60 η_p² = .01. There was a significant interaction between test and condition, F(1, 53) = 5.95, p = .02, η_p² = .10, and between test and delay of the practice test, F(1, 53) = 6.15, p = .02, η_p² = .10, but not between condition and delay of practice test, F(1, 53) = 6.15, p = .02, η_p² = .10. Again, there was a significant three-way interaction between test, condition and delay of the practice test, F(1, 53) = 0.19, p = .67 η_p² = .003.⁶ Figure 2 (upper panel) shows the mean recall rates for all four groups.

Also similar to the pilot, we calculated the BFs for the memory change to corroborate our results. Figure 4 shows the memory change for all four groups (M_{sleep;immediate} = .71, M_{sleep;delayed} =.25, M_{wake;immediate} = -4.07, M_{wake; delayed} = 0.75). The BF₁₀ for the main effect of condition (sleep vs wakefulness) was 14.12 (BF₀₁ = 0.07) showing that the alternative hypothesis was 14 times more likely than the null hypothesis and indicating that in comparison to wakefulness, sleeping was more beneficial for memory. The BF₁₀ for the main effect of delay of the practice test (immediate vs delayed) was 10.78 (BF₀₁ = 0.09) indicating strong evidence for the alternative hypothesis and showing that immediate testing led to more forgetting in the final test. Moreover, the BF₁₀ for the interaction between condition and delay of the practice test was 18.99 (BF₀₁ = 0.05), which also spoke for strong evidence. It seems to be that the standard sleep effect occurred in the immediate practice test condition only, i.e. less forgetting in the sleep condition than in the wakefulness condition.

Figure 4.

Memory change (based on the total number of correct recalled items) between the practice test (immediate group: immediately after the learning phase; delayed group: 2 hours after the learning phase) and final test (test session) for cued recall of the English word (given the Lithuanian word as cue) in the main experiment. Error bars represent standard errors of the means.

Discussion

Our experiment replicates and extends the results of the pilot to a more heterogeneous population, to word material of a different language, and to a different experimental setting.

Forgetting over time for the sleep and the wakefulness conditions shows a similar pattern in the delayed testing condition but differs in the immediate testing condition (with more forgetting in the wakefulness group, Figure 2, upper panel). The successful replication and extension of our finding fosters the proposed wakefulness aid hypothesis.

General Discussion

In our experiments, we tested the effect of delaying a practice test on forgetting foreign vocabulary during a sleep- or wakefulness-filled interval of up to 12 hours. We found the standard beneficial sleep effect when participants were immediately tested on the material at the end of the practice session. Hence, a succeeding period of sleep led to less forgetting than a succeeding period of wakefulness. This replicates former findings by Gais et al. (2006, 2007). However, when the practice test was delayed by two hours, forgetting was comparably low in both the sleep and wakefulness conditions. A delayed practice test decreased forgetting in the wakefulness condition while leaving forgetting in the sleep group unaffected. Overall, during the practice test, the immediate test groups showed better performance than the delayed testing groups. However, forgetting in the immediate wakefulness group led to the recall of fewer vocabulary items compared to the delayed wakefulness group on the final test. For the sleep group, the difference between the immediate and the delayed testing condition persisted.

We were able to replicate the findings from the pilot study using a different experimental setting, another population of participants, and different vocabulary materials. Based on this, we think that these new findings are of theoretical as well as practical relevance. As discussed above, Bäuml et al. (2014) found that the beneficial effects from testing reduce the beneficial effect of sleep. The authors discussed how retrieving information strengthens items to a much higher degree than restudying (an argumentation in line with the bifurcation model that assumes that testing strengthens successfully retrieved items to a high degree while leaving non-retrieved items unaffected (Halamish & Bjork, 2011)). Because of this, additional sleep-induced strengthening of the items may not improve recall further. In contrast, Abel et al. (2019) could find benefits of sleep on recall after retrieval practice, but only if it was combined with feedback. However, this does not contradict the theoretical explanation from Bäuml et al. (2014). Initially non-retrieved items can be lifted above the recall threshold after corrective feedback (Pastötter & Bäuml, 2016). Now, time between learning and final test can lead to two things: these items can again fall below the recall threshold (time-dependent forgetting) or they can benefit from sleep-associated strengthening thereby creating a positive effect of sleep on memory. Therefore, sleep no longer modulates the testing effect (Abel et al., 2019).

A similar argumentation can also explain our results: a delayed practice test considerably strengthened fragile memory traces (Karpicke & Roediger, 2007) pushing them above the recall threshold, which becomes particularly important when a longer period of wakefulness follows. Items retrieved during the delayed practice test seem to be strengthened to a higher degree than items retrieved immediately after learning and remain over the recall threshold. Sleep, presumably through active consolidation processes (see Diekelmann & Born, 2010; Marshall & Born, 2007; Stickgold & Walker, 2013), is per se sufficient to ensure little forgetting of newly acquired foreign vocabulary. Consequently, delaying the practice test had a differential effect on forgetting in the sleep versus wakefulness condition: it neither increased nor decreased sleep-related forgetting, but rather helped in maintaining recently studied material through a period of wakefulness. For the immediate practice test condition, the typical benefit of sleep over wakefulness was found. Sleeping after the test helped to maintain this material for the final test. For the wake group, however, the items retrieved during the immediate practice test were still subject to decay over time. Participants were not able to maintain the learned material in their memory (a finding in line with Karpicke & Roediger, 2007, Exp. 3).

A caveat with regard to the present findings may be that we did not control the timing of sleep onset. There are different possibilities as to how this could have affected our results: (a) participants in the immediate practice condition could have gone to bed soon after encoding and, therefore, slept longer than participants in the delayed condition or (b) for the delayed practice condition, sleep onset occurred sooner after the practice test. For both experiments, we asked our participants at approximately what time they fell asleep the night before during the final test session. For the pilot, the mean reported time was around 12.13 a.m. for the delayed group and around 11.44 p.m. for the immediate group. For the main experiment, the mean reported time was around 11.56 p.m. for the delayed and around 11.10 p.m. for the immediate group. Based on these data, scenario 1 seems to be unlikely. However, it was indeed the case that there was less time between the practice test and sleep in the delayed than in the immediate sleep condition. Nevertheless, there was no benefit of sleep; our main finding was that the wakefulness group showed less forgetting in the delayed condition than in the immediate condition.

Many studies showed that retrieval practice and intervals between study and test that are filled with sleep are both beneficial for memory. However, there is also the question how long lasting these effects are. For the testing effect, less forgetting was found after two days (Thompson et al., 1978; Wenger et al., 1980) or after seven days (Roediger & Karpicke, 2006; Wheeler et al., 2003), and even after 42 days (Carpenter et al., 2008). The persistent effect of sleep on memory seems to be more complicated. There are studies reporting positive effects of sleep even after delays larger than 12h (see, for example, Gais et al., 2006; Griessenberger et al., 2012; Stickgold et al., 2000). Abel et al. (2019) were not able to find a sleep benefit after 24h or 7 days. There is the possibility that sleep has an active as well as a passive contribution on memory and that this may vary with experimental task (Abel et al., 2019). However, our study cannot make any conclusions about the persistence of our findings after a longer period of time. This should be investigated in more detail in future work.

Interestingly, in several experiments, Abel et al. (2019), Bäuml et al. (2014), and Antony and Paller (2018) found no sleep benefits on memory after immediate tests without corrective feedback. In contrast, in the study presented here, we observed a benefit of sleep for such an immediate test. However, in all of the experiments discussed above, participants had only one initial study cycle before they started a varying number of practices tests. In our study, they had two study cycles before engaging in one practice test. It is possible that this additional study cycle improved recall for both the sleep and the wake group. However, when recall is improved there is also a greater risk of forgetting. We think that it is possible that by increasing recall with two study cycles, we also gave room to the wake group to forget more than the sleep group. As discussed above, sleep in itself helps to strengthen fragile memory traces, thereby resulting in less forgetting.

At the beginning of our article, we asked if the timing of a practice test makes a difference, and if so, when exactly should it optimally be taken? The new and important finding of this study is that wakefulness after encoding does not always lead to more forgetting. When delaying retrieval practice, it is possible to remember as much on the final test as on the practice test. Given that it is not always possible to learn right before sleeping, our study shows that it is possible to compensate for the beneficial effect of sleep on memory and therefore has meaningful relevance for the learning schedule of students. Based on this, practical recommendations for students in educational settings could be the following: if students study, for example, the names of brain structures in the evening, they should test themselves before going to bed – this practice test should follow immediately after learning. However, if students decide to study during the day and anticipate a longer wakefulness stretch the practice test should be delayed in order to reinforce memory and reduce forgetting the material.

Interestingly, when teaching psychology students about learning and memory, they often rely on inappropriate strategies to learn this information for later tests. Many college students have low metacognitive awareness about which learning strategy is really beneficial for memory (McCabe, 2011). Unfortunately, the strategies known as very effective for long-term learning, like spacing (Rohrer & Pashler, 2007) or testing (Roediger & Karpicke, 2006), often make this learning very slow (so-called desirable difficulties, Bjork, 1994) and are therefore avoided even by psychology students. To increase the awareness of the beneficial effects of these strategies on memory, classroom demonstrations should be used (McCabe, 2014). Our study could be used as a classroom demonstration after the discussion of the beneficial effects of spacing, testing and sleep on long-term memory. To increase motivation of the students, it would also be possible to use material relevant for them (e.g. names of hormones, brain structures etc.) and test the influence of this material on the effects found in our studies.

Such a classroom demonstration could also be used to increase the empirical basis for evaluating our effect including its generalizability (Balch, 2006). Further research still has to investigate the robustness of our findings as well as the optimal delay between studying and the first practice test to obtain the best results as well as the generalizability to more complex material and the influence of motivation.

Footnotes

Acknowledgements

We thank David Balota for helpful comments on a previous draft of this article.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The second author discloses receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG).

ORCID iDs

Meike Kroneisen

Carolina E. Kuepper-Tetzel

Notes

Author Biographies

Meike Kroneisen is an experimental psychologist by training with experience in memory and cognition as well as formal modelling. She obtained her Ph.D. in Cognitive Psychology from the University of Mannheim and pursued a postdoc position also at the University of Mannheim. After that she was an assistant professor in Psychology at the University of Koblenz-Landau. Currently she is a replacement professor at the University of Mannheim. Meike Kroneisen teaches cognitive psychology on different levels (Bachelor’s, Master’s). Her expertise focuses on memory phenomena combined with the question if memory has an adaptive value. She is passionate about the field of cognitive psychology. When teaching students, she always tries to encourage the same enthusiasm in her students. Until now, she has received several teaching awards for her courses. Furthermore, she was supervisor for over 33 bachelor’s and master’s theses.

Carolina Kuepper-Tetzel is an expert in applying findings from Cognitive Science to education and an enthusiastic science communicator. She obtained her Ph.D. in Cognitive Psychology from the University of Mannheim and pursued postdoc positions at York University in Toronto and the Center for Integrative Research in Cognition, Learning, and Education (CIRCLE) at Washington University in St. Louis. She was a Lecturer in Psychology at the University of Dundee for four years before starting as a Lecturer in Psychology at the University of Glasgow in January 2020. Her expertise focuses on learning and memory phenomena that allow for implementation in educational settings to offer teachers and students a wide range of strategies that promote long-term retention. Carolina is convinced that psychological research should serve the public and, to that end, engages heavily in scholarly outreach and science communication. She is a member of the Learning Scientists and founded the Teaching Innovation & Learning Enhancement (TILE) network. TILE brings different disciplines and sectors together to discuss how to overcome prevailing issues in education with research-based approaches. Carolina is frequently invited to give CPD workshops and keynotes on learning and teaching worldwide. Carolina was awarded Senior Fellow of HEA. She is passionate about teaching and aims at providing her students with the best learning experience possible. She teaches Research Methods and Cognition and promotes Service Learning in Higher Education.

References

Abel

Haller

Köck

Pötschke

Heib

Schabus

Bäuml

K.H. T.

(2019). Sleep reduces the testing effect—but not after corrective feedback and prolonged retention interval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(2), 272–287. doi: 10.1037/xlm0000576.

Antony

J. W.

Paller

K. A.

(2018). Retrieval and sleep both counteract the forgetting of spatial information. Learning & Memory, 25(6), 258–263. https://doi.org/10.1101/lm.046268.117

Balch

(2006). Encouraging distributed study: A classroom experiment on the spacing effect. Teaching of Psychology, 33(4), 249–252.

Barenberg, J., Roeder, U.-R., & Dutke, S. (2018). Students' Temporal Distributing of Learning Activities in Psychology Courses: Factors of Influence and Effects on the Metacognitive Learning Outcome. Psychology Learning & Teaching, 17(3), 257--271. doi:10.1177/1475725718769488

Bäuml

K.-H. T.

Holterman

Abel

(2014). Sleep can reduce the testing effect: It enhances recall of restudied items but can leave recall of retrieved items unaffected. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(6), 1568–1581. doi: 10.1037/xlm0000025.

Barrett

T. R.

Ekstrand

B. R.

(1972). Effect of sleep on memory: III. Controlling for time-of-day effects. Journal of Experimental Psychology, 96(2), 321–327. doi:10.1037/h0033625

Benson

Feinberg

(1975). Sleep and memory: Retention 8 and 24 hours after initial learning. Psychophysiology, 12(2), 192–195.

Bjork

R. A.

(1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185–205). MIT.

Carpenter

S. K.

Pashler

Wixted

J. T.

Vul

(2008). The effects of tests on learning and forgetting. Memory & Cognition, 36(2), 438–448. doi: 10.3758/MC.36.2.438

10.

Conte

Ficca

(2013). Caveats on psychological models of sleep and memory: A compass in an overgrown scenario. Sleep Medicine Reviews, 17(2), 105–121. https://doi.org/10.1016/j.smrv.2012.04.001

11.

Diekelmann

Wilhelm

Born

(2009). The whats and whens of sleep-dependent memory consolidation. Sleep Medicine Reviews, 13(5), 309–321. doi: 10.1016/j.smrv.2008.08.002

12.

Diekelmann

Born

(2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114–126. doi:10.1038/nrn2762

13.

Dunlosky

Rawson

K.A.

Marsh

E.J.

Nathan

M.J.

Willingham

D.T.

(2013). Improving students` learning with effective learning techniques. Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 4–58. doi; 10.1177/1529100612453266

14.

Ellenbogen

J. M.

Hulbert

J. C.

Stickgold

Dinges

D. F.

Thompson-Schill

S. L.

(2006). Interfering with theories of sleep and memory: Sleep, declarative memory, and associative interference. Current Biology, 16(13), 1290–1294. doi:10.1016/j.cub.2006.05.024

15.

Ficca

Lombardo

Rossi

Salzarulo

(2000). Morning recall of verbal material depends on prior sleep organization. Behavioural Brain Research, 112(1-2), 159–163. doi: 10.1016/SO166-4328(00)00177-7

16.

Finley, J. (2015, January 8). High quality MTurk data. [Blog post]. Retrieved from http://www.psychonomic.org/featured-content-detail/high-quality-mturk-data

17.

Gais, S., Lucas, B., & Born, J. (2006). Sleep after learning aids memory recall. Learning & Memory, 13(3), 259--262. doi:10.1101/lm.132106

18.

Gais

Albouy

Boly

Dang-Vu

T. T.

Darsaud

Desseilles

Rauchs

Schabus

Sterpenich

Vandewalle

Maquet

Peigneux

(2007). Sleep transforms the cerebral trace of declarative memories Proceedings of the National Academy of Sciences (PNAS), 104(47), 18778–18783. doi: 10.1073/pnas.0705454104

19.

Griessenberger

Hoedlmoser

Heib

D. P. J.

Lechinger

Klimesch

Schabus

(2012). Consolidation of temporal order in episodic memories. Biological Psychology, 91(1), 150–155. https://doi.org/10.1016/j.biopsycho.2012.05.012

20.

Grimaldi & Rawson (2010). Normative multi-trial recall performance, metacognitive judgments, and retrieval latencies for Lithuanian-English paired associates. Behavior Research Methods, 42(3), 634–642. doi:10.3758/BRM.42.3.634

21.

Halamish

Bjork

R. A.

(2011). When does testing enhance retention? A distribution-based interpretation of retrieval as a memory modifier. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(4), 801–812. https://doi.org/10.1037/a0023219

22.

JASP Team (2020). JASP (Version 0.13.1) [Computer software].

23.

Jeffreys

(1961). Theory of probability (3rd Ed.). Oxford, UK: Oxford University Press.

24.

Jenkins

J. G.

Dallenbach

K. M.

(1924). Obliviscence during sleep and waking. The American Journal of Psychology, 25(4), 605–612. doi:10.2307/1414040

25.

Karpicke

J. D.

Roediger

H. L.

III (2007). Expanding retrieval practice promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33(4), 704–719. doi:10.1037/0278-7393.33.4.704

26.

Karpicke

J. D.

Roediger

H. L.

(2010). Is expanding retrieval a superior method for learning text materials? Memory & Cognition, 38(1), 116–124. doi:10.3758/MC.38.1.116[10.3758/MC.38.1.116]

27.

Loftus

G. R.

(1985). Evaluating forgetting curves. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11(2), 397–406.

28.

MacDonald, S. W. S., Stigsdotter-Neely, A. Derwinger, A., & Bäckman, L. (2006). Rate of acquisition, adult age, and basic cognitive abilities predict forgetting: New views on a classic problem. Journal of Experimental Psychology: General, 135(3), 368–390. doi: 10.1037/0096-3445.135.3.368

29.

Marshall

Born

(2007). The contribution of sleep to hippocampus-dependent memory consolidation. Trends in Cognitive Science, 11(10), 442–450. doi:10.1016/j.tics.2007.09.001

30.

Marsman

Wagenmakers

E.-J.

(2017). Bayesian benefits with JASP. European Journal of Developmental Psychology, 14(5), 545–555. https://doi.org/10.1080/17405629.2016.1259614

31.

McCabe

(2011). Metacognitive awareness of learning strategies in undergraduates. Memory & Cognition, 39(3), 462–476. http://dx.doi.org/10.3758/s13421-010-0035

32.

McCabe

(2014). Learning and memory strategy demonstrations for the psychology classroom. Baltimore: Goucher College.

33.

Pastötter

Bäuml

K.-H. T.

(2016). Reversing the testing effect by feedback: Behavioral and electrophysiological evidence. Cognitive, Affective, and Behavioral Neuroscience, 16(3), 473–488.

34.

Plihal

Born

(1997). Effects of early and late nocturnal sleep on declarative and procedural memory. Journal of Cognitive Neuroscience, 9(4), 534–547. doi:10.1162/jocn,1997.9.4.534

35.

Raftery

A. E.

(1995). Bayesian model selection in social research. Sociological Methodology, 25, 111–163. https://doi.org/10.2307/271063

36.

Roediger

H. L.

III Butler

A. C.

(2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15(1), 20–27. doi:10.1016/j.tics.2010.09.003

37.

Roediger

H. L.

III Karpicke

J. D.

(2006). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255. doi:10.1111/j.1467-9280.2006.01693.x

38.

Roediger

H. L.

III Pyc

M. A.

(2012). Inexpensive techniques to improve education: Applying cognitive psychology to enhance educational practice. Journal of Applied Research in Memory and Cognition, 1(4), 242–248.

39.

Rohrer

Pashler

(2007). Increasing retention without increasing study time. Current Directions in Psychological Science, 16(4), 183–186.http://dx.doi.org/10.1111/j.1467-8721.2007.00500.x

40.

Schwieren

Barenberg

Dutke

(2017). The testing effect in the psychology classroom: A meta-analytic perspective. Psychology Learning and Teaching, 16(2), 179–196. doi: 10.1177/1475725717695149

41.

Stickgold

James

Hobson

J. A.

(2000). Visual discrimination learning requires sleep after training. Nature Neuroscience, 3(12), 1237–1238. https://doi.org/10.1038/81756

42.

Stickgold

Walker

M. P.

(2013). Sleep-dependent memory triage: Evolving generalization through selective processing. Nature Neuroscience, 16(2), 139–145. doi:10.1038/nn.3303

43.

Taraban

Maki

W. S.

Rynearson

(1999). Measuring study time distributions: Implications for designing computer-based courses. Behavior Research Methods, Instruments, & Computers, 31(2), 263–269. https://doi.org/10.3758/BF03207718

44.

Thompson

C. P.

Wenger

S. K.

Bartling

C. A.

(1978). How recall facilitates subsequent recall: A reappraisal. Journal of Experimental Psychology: Human Learning & Memory, 4(3), 210–221. https://doi.org/10.1037/0278-7393.4.3.210

45.

Undorf

Erdfelder

(2013). Separation of encoding fluency and item difficulty effects on judgements of learning. The Quarterly Journal of Experimental Psychology, 66(10), 2060–2072. doi:10.1080/17470218.2013.777751

46.

Wenger

S. K.

Thompson

C. P.

Bartling

C. A.

(1980). Recall facilitates subsequent recognition. Journal of Experimental Psychology: Human Learning & Memory, 6(3), 135–144. https://doi.org/10.1037/0278-7393.6.2.135

47.

Wetzels

Matzke

Lee

M. D.

Rouder

J. N.

Iverson

G. J.

Wagenmakers

E. J.

(2011). Statistical evidence in experimental psychology: An empirical comparison using 855 t-tests. Perspectives on Psychological Science, 6, 291–298. http://dx.doi.org/10.1177/1745691611406923

48.

Wheeler

M. A.

Ewers

Buonanno

J. F.

(2003). Different rates of forgetting following study versus test trials. Memory, 11(6), 571–580. doi: 10.1080/09658210244000414

Using Day and Night – Scheduling Retrieval Practice and Sleep

Abstract

Keywords

Introduction

Overview of the Experiments

Pilot Experiment

Method

Participants

Design

Materials and Procedure 2

Analysis Strategy

Results

Forgetting Between the Practice Test and Final Test

Discussion

Experiment

Method

Participants

Materials

Design

Procedure

Results

Forgetting Between the Practice Test and Final Test

Discussion

General Discussion

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iDs

Notes

Author Biographies

References

Materials and Procedure²