Abstract
Sleep right after studying new material is more conducive to memory than a period of wakefulness. Another way to counteract forgetting is to practice retrieval: taking a test strengthens memory more effectively than restudying the material. The current work aims at investigating the interaction between sleep and testing by asking if testing adds to, neutralizes, or decreases the effect of sleep on memory? We tested this in one pilot and one experiment by manipulating the timing of the practice test as well as whether practice was followed by sleep or wakefulness when learning foreign language vocabulary. Taking a delayed practice test significantly reduces forgetting for both the sleep and the wakefulness group. An immediate practice test, in contrast, had no such effect; here we find the standard beneficial sleep effect. However, the immediate practice test leads to higher recall in the final test in comparison to a delayed practice test, but only for the sleep group. Practical recommendations imply two things: first, if students study in the evening, they should test themselves immediately after learning. Second, if students study during the day the practice test should be delayed in order to reinforce memory and reduce forgetting of the material.
Introduction
Sleep that follows directly upon the encoding of new information reduces time-dependent forgetting, as is demonstrated in studies on the beneficial effect of sleep on memory. Jenkins and Dallenbach (1924), for example, first demonstrated superior memory for nonsense syllables following an interval containing sleep compared to an equally long interval of wakefulness. The beneficial effect of sleep on declarative memory has been observed for different study materials like nonsense syllables (Benson & Feinberg, 1975; Jenkins & Dallenbach, 1924), associative word pairs (Plihal & Born, 1997), vocabulary (Gais, et al., 2006), or word lists (Ficca et al., 2000). Overall, several studies have shown a beneficial effect of sleep over a wake period on memory leaving little doubt that sleep is a potent way to reduce forgetting of newly acquired material (e.g. Barrett & Ekstrand, 1972; Diekelmann et al., 2009).
The exact mechanisms underlying the benefit of sleep have not yet been fully uncovered, but different theoretical candidates have been proposed. Earlier theories assumed that sleep is a passive state, protecting learned material against retroactive interference (Jenkins & Dallenbach, 1924; for a discussion see Ellenbogen et al., 2006). However, newer research emphasizes active memory consolidation as explanation for the benefit of sleep (for a review, see Conte & Ficca, 2013). Memory consolidation is a process whereby previously formed labile memory traces are transformed and integrated into a network of pre-existing long-term memories. Consolidation during sleep is thought to diminish forgetting of the material leading to better performance at the delayed test after sleep compared to wakefulness (Diekelmann et al., 2009).
Another powerful strategy to reduce time-dependent forgetting is retrieval practice. Retrieving newly acquired information instead of simply restudying it has been shown to be a successful way to reduce forgetting (also called the testing effect; e.g. Roediger & Karpicke, 2006; Roediger & Butler, 2011) even without feedback in the testing situation (Karpicke & Roediger, 2010). Furthermore, the testing effect could be demonstrated in teaching and learning psychology (Schwieren et al., 2017) showing that the effect is not restricted to laboratory studies. The strength of this effect depends on different factors like the difficulty of the retrieval task (Carpenter & DeLosh, 2006), the retention interval between retrieval practice and final test (Roediger & Karpicke, 2006), the number of the practice tests (Karpicke & Roediger, 2010), or the timing of the practice test (Karpicke & Roediger, 2007; Karpicke & Roediger, 2010). Karpicke and Roediger (2007), for instance, showed that delaying the practice test was more conducive to long-term retention of the learned material than an immediate practice test. Testing an item immediately after learning led subjects to recall this item while it was still in their immediate awareness, which is similar to a massed learning situation (Karpicke & Roediger, 2007; Karpicke & Roediger, 2010). However, the influence of the timing of the practice test seems to be more complicated. In contrast to Karpicke and Roediger (2007), Karpicke and Roediger (2010) reported positive and negative effects of delayed testing. In Experiment 1, they found that taking an immediate first test improved long-term retention, in Experiment 2 no such effect could be detected. However, it has to be noted that the used materials differed between Karpicke and Roediger (2007) and Karpicke and Roediger (2010). Whereas Karpicke and Roediger (2007) used vocabulary word pairs, Karpicke and Roediger (2010) used brief text passages. Furthermore, the delayed cued recall test used by Karpicke and Roediger (2007) occurred already after five trials after studying a wordpair. The first delayed test used by Karpicke and Roediger (2010) occurred approximately after eight minutes.
The current set of experiments examines the interaction between sleep and retrieval practice more closely. A previous study by Bäuml et al., (2014) found that sleep that followed directly upon encoding affected word material that was retrieved versus restudied differently: whereas the wakefulness group benefited from testing by showing better memory performance for previously retrieved versus restudied items, a reduced or even eliminated testing effect for the sleep group could be found. Similar to other studies, the sleep and the wakefulness group were tested 12 hours after the initial learning phase. Interestingly, the reason for this reduced testing effect was that sleep benefited recall of restudied items but left recall of retrieved items unaffected. When comparing memory for items between the sleep and the wakefulness group, the sleep group showed improved performance for restudied items, but no enhanced memory for retrieved items. It seems that beneficial effects from retrieval practice reduce the beneficial effect of sleep. Moreover, these results were replicated using spatial memory by Antony and Paller (2018). However, a recent study from Abel et al. (2019) showed benefits of sleep on recall after retrieval practice, but only if it was combined with feedback. How can these results be explained? As the authors discussed, it may be that retrieving information strengthens fragile memory traces. Therefore, the beneficial effect of sleep on memory cannot improve recall any further. When giving feedback, previously non-retrieved items can now be again forgotten in the time between learning and final test or benefit from the positive effect of sleep on recall.
In contrast to Bäuml et al. (2014), we were not interested in evaluating the differential effect of restudied versus retrieved items. As studies have repeatedly shown the superiority of testing as a learning event (e.g. Roediger & Butler, 2011), we instead aimed at manipulating the timing of the practice test to determine its optimal placement in a sleep-wake-paradigm. There are several studies showing that students often prefer massed (= immediate learning of all the material) instead of spaced learning (e.g. Taraban et al., 1999). Given the amount of accumulating evidence and the general recommendation from learning scientists (Dunlosky et al., 2013; Roediger & Pyc, 2012) that students should take tests to boost their memory, how does this recommendation systematically interact with other factors that are part of their daily lives such as sleep, which has also been shown to reduce forgetting? Does the timing of a practice test make a difference, and if so, when exactly should it optimally be taken to reduce forgetting in the sleep-wakefulness paradigm?
To evaluate this research question, we drew on a design often used in studies addressing research on sleep-associated memory consolidation to the extent that in one condition participants’ memory for the material was tested immediately after acquisition (which was considered as a learning trial) and again assessed after a retention interval filled with wakefulness or sleep 12 hours later. Similar to other studies (Gais et al., 2006, 2007), we were primarily interested in forgetting between the first test after studying and the final test following the retention interval. Our experiments added a delayed retrieval practice condition that tested newly acquired material at a delay of two hours 1 after acquisition to evaluate how this affects forgetting after sleep versus wakefulness.
This has practical implications, students would benefit from knowing the optimal time for a practice test during the day and before going to bed to improve their study outcome. Psychology students, for example, have to learn a lot of facts, like the names of brain structures, neurotransmitters, and hormones, but also a lot of different theories and corresponding experiments. Knowing the optimal schedule for learning could be an effective and time saving strategy. Optimal learning schedules not only improve memory, they can also improve the metacognitive monitoring and, therefore, the regulation of study behaviour. Students that accessed course material continuously through the semester showed better metacognitive learning outcomes and a higher accuracy of confidence judgements (Barenberg, et al., 2018).
We think that two hypotheses can be tested: first, based on Karpicke and Roediger (2007) it is possible that delayed testing aids to strengthen fragile memory traces result in less forgetting. This may particularly benefit the wakefulness group and not the sleep group as much because sleep-induced strengthening already leads to a strengthening of fragile memory traces. Consequently, we would not expect a difference in forgetting between the wakefulness and sleep group when delayed testing is in place. On the other hand, the sleep benefit over wakefulness would occur in the immediate testing condition because no additional boost for the wakefulness group occurs here. We call this the wakefulness aid hypothesis.
Second, it may be the case that beside the positive effect of delayed testing on memory, sleep still has an additional effect on memory. Thus, in this case we would expect a main effect of time of the practice test and sleep versus wakefulness – with less forgetting occurring after sleep than after a period of wakefulness – but no interaction.
Overview of the Experiments
We conducted a pilot experiment in the laboratory and a main experiment on the Internet. In both experiments, we manipulated the timing of the practice test (immediate vs. delayed) as well as whether practice was followed by an interval including or excluding sleep. During the initial learning session, participants studied foreign vocabulary and took a practice test without feedback immediately or two hours later. After a 12-hour retention interval, we assessed memory performance of the vocabulary and calculated forgetting during the 12-hour interval between practice and final test performances (see Figure 1 for an overview).

Schematic presentation of all conditions in the pilot and the main experiment: In the 12-h wake condition (group I and II), the learning of the material took place at 9 am. In the 12-h sleep condition (groups III and IV), the learning of the material took place at 9 pm. For the immediate testing groups the practice test followed immediately after the two learning rounds; in the delayed testing groups the practice test started at 11 a.m. or p.m. (depending on sleep or wakefulness condition). (LL) two learning cycles, (PT) practice test, (FT) final test.
Pilot Experiment
To obtain initial data for our research question and to test if the strategical use of a two-hour delay between learning and practice test in the delayed test groups was appropriate we conducted a pilot experiment in the laboratory.
Method
Participants
Thirty-eight German-speaking Psychology undergraduates at the University of Mannheim participated in this laboratory experiment. Two participants had to be excluded due to an experimenter error which had these participants complete both the immediate and the delayed test. The remaining 36 participants (26 female, Mage = 21.61 years, SDage = 2.80 years, age range = 19 to 36 years) took part in the experiment in exchange for course credits. Participants were randomly assigned to four experimental conditions and were fairly evenly distributed across experimental conditions (Nsleepimmediate = 8; Nsleepdelayed = 10; Nwakeimmediate = 9; Nwakedelayed = 9).
Design
We manipulated the delay of the practice test and sleep versus wakefulness in a 2 (condition: sleep versus wakefulness) × 2 (delay of practice test: immediate versus delayed) between-subjects design.
Materials and Procedure 2
Session 1. Depending on the experimental condition, participants started the experiment either at 9 a.m. (wakefulness group) or at 9 p.m. (sleep group). Participants were tested in groups in a lab in Mannheim ranging in size from one to ten. All lab sessions were programmed using E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA); the online session was programmed using PHP. They were instructed that the experiment consisted of different parts. In addition, participants were informed about the timing of the later tests. After signing consent forms, they started the experiment at the computer. They were asked to study 40 Polish-German vocabulary pairs (Undorf & Erdfelder, 2013) in order to remember the German word in a later cued recall task. Participants completed a total of two study cycles: a study cycle contained a randomized presentation of all 40 vocabulary pairs at a rate of 6 seconds per pair followed by a short 2-min arithmetic distractor task. During the distractor task, simple math equations with solutions were presented and participants had 8s to decide if the equation was correctly solved or not. Afterwards, participants were either given the practice test (immediate practice test condition) or dismissed and emailed the link to the online version of the practice test two hours later (delayed practice test condition). The practice test was a cued recall test where the Polish words were presented randomly one at a time as cues to remember the German translation. Participants then had to write down the word they remembered via the keyboard. The test was self-paced, and no feedback was provided. Participants were randomly assigned to the immediate or the delayed practice test groups.
Session 2. Participants came to the laboratory for their final session for the final cued recall test. Participants were asked to recall the German translation to all 40 Polish words. Again, the Polish words were presented randomly, one at a time on the screen and participants had to write down their answers via the keyboard. The test was self-paced, and no feedback was provided. In the present experiment, the final test session always took place at either 9 a.m. in the sleep condition or at 9 p.m. in the wakefulness condition. Therefore, the retention interval (measured from the end of the practice test) was effectively 10 hours in the delayed practice test condition and 12 hours in the immediate practice test condition. This was done for the sake of simplicity of running the study in the laboratory. We are aware that this may be a potential confounding factor. However, after consulting our data we find no indication of a confounding effect as participants in the delayed practice test condition did not generally outperform participants in the immediate practice test condition. Thus, the difference in retention interval did not affect our results.
Analysis Strategy
To give an optimal overview of our data, we report recall rates as a measure of memory and memory change as a measure of forgetting. Interestingly, there is an ongoing debate on how to measure forgetting (see Loftus, 1985; MacDonald, et al., 2006). Gais et al.(2006) measured recall and forgetting to test if sleep affects performance differently than a period of wakefulness. Recall was measured as the number of correctly remembered words, and forgetting was measured as the average individual per cent change in recall performance across the retention interval. For Gais et al.(2006), forgetting was ’indicated as average individual percent change in recall score across periods of sleep and wakefulness’ (p. 261). According to MacDonald et al. (2006), forgetting represents a decline in performance across different testing times. It is not possible to measure forgetting in a direct way; it can only be examined by analyzing performance changes over time. The authors also point out that forgetting always meant an absolute decrement in performance. Therefore, it is ‘independent of initial performance level’ (p. 369). Moreover, because of this independence, different baseline performances between groups should not influence forgetting curves and are not of interest for the forgetting research (MacDonald et al., 2006).
Results
The significance level was set to α = .05 for all statistical tests.
Participants in the sleep condition reported having slept regularly during the night (M = 7.22 hours; SD = .86), whereas those in the wakefulness condition reported not having taken naps during the day. None of the participants reported alcohol intake between sessions.
Forgetting Between the Practice Test and Final Test
A 2 (test: practice test vs final test) × 2 (condition: sleep vs wakefulness) × 2 (delay of the practice test: immediate vs delayed) repeated measurement ANOVA revealed no main effect of the test, F(1, 32) = 0.003, p = .95, ηp2 < .01, no significant main effect of condition, F(1, 32) = 3.65, p = .07, ηp2 = .10, nor a significant interaction between test and condition, F(1, 32) = 0.99, p = .33, ηp2 = .03, or between the test and delay of the practice test, F(1, 32) = 1.24, p = .27, ηp2 = .04 or between condition and delay of the practice test, F(1, 32) = 0.99, p = .33, ηp2 = .03. However, there was a significant main effect of delay of the practice test, F(1, 32) = 7.73, p = .009, ηp2 = .19, showing better memory in the immediate than in the delayed practice test. Furthermore, there was a significant three-way interaction between the test, condition and delay of the practice test, F(1, 32) = 4.71, p = .038, ηp2 = .13 indicating that memory in the two conditions (sleep vs wakefulness) was differently affected by immediate and delayed testing. Specifically, in the immediate practice test condition the standard positive sleep effect occurred: participants who slept between the end of practice and the final test showed less forgetting than participants who were awake during that time period. In contrast, in the delayed practice test condition the sleep effect was eliminated: it seems to be that participants in both the sleep and wakefulness condition showed similar forgetting between practice and final test session. 3 Figure 2 (lower panel) shows the mean recall rates for all four groups.

Forgetting over time (based on the total number of correct recalled items) between the practice test and final test (immediate testing: immediately after the learning phase; delayed testing: 2 hours after the learning phase) for cued recall of the English word in the pilot (lower panel) main experiment (upper panel). Error bars represent standard errors of the means.
To assess forgetting between the practice test and the final test, we also calculated memory changes by subtracting the performance on the practice test (= number of correct German translations of the Polish vocabulary) from the performance on the final test. Figure 3 shows the memory change for all four groups. We report the Bayes Factors (BFs) to weigh the null and alternative hypotheses against each other. The BF indicates the likelihood with which the data are obtained under one hypothesis as compared to the other hypothesis. According to Jeffreys (1961, Appendix B), Raftery (1995) and Wetzels et al. (2011), a BF between 1 and 3 can be interpreted as weak evidence, a BF between 3 and 20 as positive or substantial evidence, a BF between 20 and 150 as strong or very strong evidence, and a BF larger than 150 as very strong or decisive evidence for one of the hypotheses. Because the BF can be used to measure the likelihood of the H1 as well as the likelihood of the H0. an index denotes which hypothesis is tested (BF01 for the null hypothesis; BF10 for the alternative hypothesis). All reported BFs were conducted using JASP (JASP Team, 2020; Marsman & Wagenmakers, 2017). We used the default prior options for the effects. The BF10 for the main effect of condition (sleep vs wakefulness) was 0.49. However, the BF01 for the null hypothesis was also only 2.04. Similarly, for the main effect of delay of the practice test (immediate vs. delayed) the BF10 was 0.56, the BF01 was 1.79. For the interaction between condition and delay of the practice test the BFs were BF10 = 0.84 for the alternative hypothesis and BF01 = 1.19 for the null hypothesis. Overall, the BFs were inconclusive.

Memory change (based on the total number of recalled items) between the practice test (immediate group: immediately after the learning phase; delayed group: 2 hours after the learning phase) and final test (test session) for cued recall of the German word (given the Polish word as cue) in the pilot. Error bars represent standard errors of the means.
Discussion
We found less forgetting after a period of wakefulness for the delayed testing group in comparison to the immediate testing group. The benefit of sleep was revealed only in the immediate practice test condition. The elimination of the sleep-related memory benefit was mainly driven by a decrease in forgetting in the wakefulness group. In terms of forgetting, the sleep group was not affected by the delay of the practice test. Forgetting was similar irrespective of the time of the practice test. In line with this, we found a comparable pattern of forgetting over time for the sleep and the wakefulness condition in the delayed testing condition but not in the immediate testing condition (Figure 2, lower panel). However, the BFs showed inconclusive results: there was neither a clear indication for the alternative hypothesis nor for the null hypothesis. However, the pilot study used only 36 participants. This could be the reason for these inconclusive results.
Overall, the findings from our pilot experiment are more in line with the proposed wakefulness aid hypothesis. Delayed testing presumably aided the strengthening of memory traces in the wakefulness group whereas sleep in itself had a sufficiently beneficial effect on memory stabilization and protection leading to less forgetting. Immediate testing reveals the typical benefit of sleep in comparison to wakefulness. However, we also had a few limitations in the pilot that we address in turn. First, as the BFs indicated, we may have had too few participants to come to a clear decision between the alternative and null hypothesis. Second, the final test session was 12 hours after learning for the immediate testing group, but only 10 hours for the delayed testing group. Third, participants took the practice test either in the laboratory (immediate testing) or on the Internet (delayed condition). It is possible that the change of context in the delayed testing group led to a disruption in processing leading to an overall decrease in performance compared to the immediate testing group. Fourth, we did not control whether the delayed testing groups did the online test exactly 2 hours after the first learning session. Fifth, some of the Polish-German words were similar to each other and therefore easier to learn.
Experiment
The main experiment was run to test the replicability of the new finding using a larger and more heterogeneous sample of participants while remedying the abovementioned points. The experiment differed from the pilot in three major aspects: (i) we used Lithuanian–English vocabulary (Grimaldi & Rawson, 2010) instead of Polish-German vocabulary, (ii) the experiment was run on the Internet and participants were native English speakers, and (iii) the delay between the practice test and final test was the same in all conditions and fixed at a length of 12 hours.
Method
Participants
Sixty-three English-speaking people participated in this web-based experiment. All participants were recruited from the online platform Amazon Mechanical Turk. To obtain high quality data, we restricted study access to people who had at least completed 500 studies on Amazon Mechanical Turk before and who had an approval rate of at least 95% (see Finley, 2015 4 ). The study was accessible to participants on Amazon Mechanical Turk located in the United States or Canada. Of all participants, six had to be excluded because they either reported they were familiar with the Lithuanian language or had cheated on the memory tests (e.g. writing down vocabulary pairs during initial presentation and using their notes during the test). A final sample of 57 (39 female) participants remained and was included in the analyses (Nsleepimmediate = 14; Nsleepdelayed = 16; Nwakeimmediate = 15; Nwakedelayed = 12). Their ages ranged from 22 to 40 years (M = 32.30; SD = 5.05). Participants received a total of $1.50 for their participation in the experiment.
Materials
Materials comprised 40 Lithuanian–English word translations of similar difficulty taken from Grimaldi and Rawson (2010). All translations were unfamiliar to the participants. Participants were asked to study the English translation of each Lithuanian word.
Design
The experiment included a 2 × 2 between-subject design with the factors delay of the practice test (immediate vs. delayed) and condition (12-h wake versus 12-h sleep). The final test session followed 12 hours after the practice test regardless of whether it was immediate or delayed. Although we did not find any effect of slightly different retention intervals in the pilot, we felt that it would be experimentally cleaner to hold the retention interval constant across conditions. In addition, the links leading to the study were only active for a short time, therefore, it was not possible to conduct either the learning or the testing sessions later or earlier. Moreover, this time, the entire experiment was conducted online. Therefore, all sessions were comparable in regard to context between the groups.
Procedure
The procedure of the present experiment was the same as that of the pilot. The only major difference was that participants used their own computer devices to access and run the experiment through their web browsers. To make sure that participants logged on to the experiment at the correct times, automated reminder emails were sent to them at the respective times. The experiment was programmed in HTML, PHP, and Javascript and participants could only access the study at their predetermined times and for a short time only. 5
Results
The significance level was set to α = .05 for all statistical tests. Again, we report recall rates as a measure of memory and memory change as a measure of forgetting. Participants in the sleep condition reported having slept regularly during the night (M = 6.9 hours; SD = 1.34) whereas participants in the wakefulness condition reported not having taken naps during the day.
Forgetting Between the Practice Test and Final Test
A 2 (test: practice test vs final test) × 2 (condition: sleep vs wakefulness) × 2 (delay of practice test: immediate vs delayed) repeated measurement ANOVA revealed no main effect of test, F(1, 53) = 1.80, p = .19, ηp2 = .03, no significant effect of condition, F(1, 53) = 0.63, p = .43, ηp2 = .01, and no significant effect of delay of the practice test, F(1, 53) = 0.29, p = .60 ηp2 = .01. There was a significant interaction between test and condition, F(1, 53) = 5.95, p = .02, ηp2 = .10, and between test and delay of the practice test, F(1, 53) = 6.15, p = .02, ηp2 = .10, but not between condition and delay of practice test, F(1, 53) = 6.15, p = .02, ηp2 = .10. Again, there was a significant three-way interaction between test, condition and delay of the practice test, F(1, 53) = 0.19, p = .67 ηp2 = .003. 6 Figure 2 (upper panel) shows the mean recall rates for all four groups.
Also similar to the pilot, we calculated the BFs for the memory change to corroborate our results. Figure 4 shows the memory change for all four groups (Msleep;immediate = .71, Msleep;delayed =.25, Mwake;immediate = -4.07, Mwake; delayed = 0.75). The BF10 for the main effect of condition (sleep vs wakefulness) was 14.12 (BF01 = 0.07) showing that the alternative hypothesis was 14 times more likely than the null hypothesis and indicating that in comparison to wakefulness, sleeping was more beneficial for memory. The BF10 for the main effect of delay of the practice test (immediate vs delayed) was 10.78 (BF01 = 0.09) indicating strong evidence for the alternative hypothesis and showing that immediate testing led to more forgetting in the final test. Moreover, the BF10 for the interaction between condition and delay of the practice test was 18.99 (BF01 = 0.05), which also spoke for strong evidence. It seems to be that the standard sleep effect occurred in the immediate practice test condition only, i.e. less forgetting in the sleep condition than in the wakefulness condition.

Memory change (based on the total number of correct recalled items) between the practice test (immediate group: immediately after the learning phase; delayed group: 2 hours after the learning phase) and final test (test session) for cued recall of the English word (given the Lithuanian word as cue) in the main experiment. Error bars represent standard errors of the means.
Discussion
Our experiment replicates and extends the results of the pilot to a more heterogeneous population, to word material of a different language, and to a different experimental setting.
Forgetting over time for the sleep and the wakefulness conditions shows a similar pattern in the delayed testing condition but differs in the immediate testing condition (with more forgetting in the wakefulness group, Figure 2, upper panel). The successful replication and extension of our finding fosters the proposed wakefulness aid hypothesis.
General Discussion
In our experiments, we tested the effect of delaying a practice test on forgetting foreign vocabulary during a sleep- or wakefulness-filled interval of up to 12 hours. We found the standard beneficial sleep effect when participants were immediately tested on the material at the end of the practice session. Hence, a succeeding period of sleep led to less forgetting than a succeeding period of wakefulness. This replicates former findings by Gais et al. (2006, 2007). However, when the practice test was delayed by two hours, forgetting was comparably low in both the sleep and wakefulness conditions. A delayed practice test decreased forgetting in the wakefulness condition while leaving forgetting in the sleep group unaffected. Overall, during the practice test, the immediate test groups showed better performance than the delayed testing groups. However, forgetting in the immediate wakefulness group led to the recall of fewer vocabulary items compared to the delayed wakefulness group on the final test. For the sleep group, the difference between the immediate and the delayed testing condition persisted.
We were able to replicate the findings from the pilot study using a different experimental setting, another population of participants, and different vocabulary materials. Based on this, we think that these new findings are of theoretical as well as practical relevance. As discussed above, Bäuml et al. (2014) found that the beneficial effects from testing reduce the beneficial effect of sleep. The authors discussed how retrieving information strengthens items to a much higher degree than restudying (an argumentation in line with the bifurcation model that assumes that testing strengthens successfully retrieved items to a high degree while leaving non-retrieved items unaffected (Halamish & Bjork, 2011)). Because of this, additional sleep-induced strengthening of the items may not improve recall further. In contrast, Abel et al. (2019) could find benefits of sleep on recall after retrieval practice, but only if it was combined with feedback. However, this does not contradict the theoretical explanation from Bäuml et al. (2014). Initially non-retrieved items can be lifted above the recall threshold after corrective feedback (Pastötter & Bäuml, 2016). Now, time between learning and final test can lead to two things: these items can again fall below the recall threshold (time-dependent forgetting) or they can benefit from sleep-associated strengthening thereby creating a positive effect of sleep on memory. Therefore, sleep no longer modulates the testing effect (Abel et al., 2019).
A similar argumentation can also explain our results: a delayed practice test considerably strengthened fragile memory traces (Karpicke & Roediger, 2007) pushing them above the recall threshold, which becomes particularly important when a longer period of wakefulness follows. Items retrieved during the delayed practice test seem to be strengthened to a higher degree than items retrieved immediately after learning and remain over the recall threshold. Sleep, presumably through active consolidation processes (see Diekelmann & Born, 2010; Marshall & Born, 2007; Stickgold & Walker, 2013), is per se sufficient to ensure little forgetting of newly acquired foreign vocabulary. Consequently, delaying the practice test had a differential effect on forgetting in the sleep versus wakefulness condition: it neither increased nor decreased sleep-related forgetting, but rather helped in maintaining recently studied material through a period of wakefulness. For the immediate practice test condition, the typical benefit of sleep over wakefulness was found. Sleeping after the test helped to maintain this material for the final test. For the wake group, however, the items retrieved during the immediate practice test were still subject to decay over time. Participants were not able to maintain the learned material in their memory (a finding in line with Karpicke & Roediger, 2007, Exp. 3).
A caveat with regard to the present findings may be that we did not control the timing of sleep onset. There are different possibilities as to how this could have affected our results: (a) participants in the immediate practice condition could have gone to bed soon after encoding and, therefore, slept longer than participants in the delayed condition or (b) for the delayed practice condition, sleep onset occurred sooner after the practice test. For both experiments, we asked our participants at approximately what time they fell asleep the night before during the final test session. For the pilot, the mean reported time was around 12.13 a.m. for the delayed group and around 11.44 p.m. for the immediate group. For the main experiment, the mean reported time was around 11.56 p.m. for the delayed and around 11.10 p.m. for the immediate group. Based on these data, scenario 1 seems to be unlikely. However, it was indeed the case that there was less time between the practice test and sleep in the delayed than in the immediate sleep condition. Nevertheless, there was no benefit of sleep; our main finding was that the wakefulness group showed less forgetting in the delayed condition than in the immediate condition.
Many studies showed that retrieval practice and intervals between study and test that are filled with sleep are both beneficial for memory. However, there is also the question how long lasting these effects are. For the testing effect, less forgetting was found after two days (Thompson et al., 1978; Wenger et al., 1980) or after seven days (Roediger & Karpicke, 2006; Wheeler et al., 2003), and even after 42 days (Carpenter et al., 2008). The persistent effect of sleep on memory seems to be more complicated. There are studies reporting positive effects of sleep even after delays larger than 12h (see, for example, Gais et al., 2006; Griessenberger et al., 2012; Stickgold et al., 2000). Abel et al. (2019) were not able to find a sleep benefit after 24h or 7 days. There is the possibility that sleep has an active as well as a passive contribution on memory and that this may vary with experimental task (Abel et al., 2019). However, our study cannot make any conclusions about the persistence of our findings after a longer period of time. This should be investigated in more detail in future work.
Interestingly, in several experiments, Abel et al. (2019), Bäuml et al. (2014), and Antony and Paller (2018) found no sleep benefits on memory after immediate tests without corrective feedback. In contrast, in the study presented here, we observed a benefit of sleep for such an immediate test. However, in all of the experiments discussed above, participants had only one initial study cycle before they started a varying number of practices tests. In our study, they had two study cycles before engaging in one practice test. It is possible that this additional study cycle improved recall for both the sleep and the wake group. However, when recall is improved there is also a greater risk of forgetting. We think that it is possible that by increasing recall with two study cycles, we also gave room to the wake group to forget more than the sleep group. As discussed above, sleep in itself helps to strengthen fragile memory traces, thereby resulting in less forgetting.
At the beginning of our article, we asked if the timing of a practice test makes a difference, and if so, when exactly should it optimally be taken? The new and important finding of this study is that wakefulness after encoding does not always lead to more forgetting. When delaying retrieval practice, it is possible to remember as much on the final test as on the practice test. Given that it is not always possible to learn right before sleeping, our study shows that it is possible to compensate for the beneficial effect of sleep on memory and therefore has meaningful relevance for the learning schedule of students. Based on this, practical recommendations for students in educational settings could be the following: if students study, for example, the names of brain structures in the evening, they should test themselves before going to bed – this practice test should follow immediately after learning. However, if students decide to study during the day and anticipate a longer wakefulness stretch the practice test should be delayed in order to reinforce memory and reduce forgetting the material.
Interestingly, when teaching psychology students about learning and memory, they often rely on inappropriate strategies to learn this information for later tests. Many college students have low metacognitive awareness about which learning strategy is really beneficial for memory (McCabe, 2011). Unfortunately, the strategies known as very effective for long-term learning, like spacing (Rohrer & Pashler, 2007) or testing (Roediger & Karpicke, 2006), often make this learning very slow (so-called desirable difficulties, Bjork, 1994) and are therefore avoided even by psychology students. To increase the awareness of the beneficial effects of these strategies on memory, classroom demonstrations should be used (McCabe, 2014). Our study could be used as a classroom demonstration after the discussion of the beneficial effects of spacing, testing and sleep on long-term memory. To increase motivation of the students, it would also be possible to use material relevant for them (e.g. names of hormones, brain structures etc.) and test the influence of this material on the effects found in our studies.
Such a classroom demonstration could also be used to increase the empirical basis for evaluating our effect including its generalizability (Balch, 2006). Further research still has to investigate the robustness of our findings as well as the optimal delay between studying and the first practice test to obtain the best results as well as the generalizability to more complex material and the influence of motivation.
Footnotes
Acknowledgements
We thank David Balota for helpful comments on a previous draft of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The second author discloses receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG).
