Abstract
There is evidence to suggest that variations in difficulty during learning can moderate long-term retention. However, the direction of this effect is under contention throughout the literature. According to both the Desirable Difficulties Framework (DDF) and the Retrieval Effort Hypothesis (REH),
Keywords
Highlights
Desirable difficulties framework and cognitive load theory oppose one another, prompting comparisons between the two.
Both agree memory is modulated by successful schema formation which requires a degree of effort.
A new model is put forward that emphasises the need to tailor instructional design to individual learners and circumstances.
Introduction
When learning something new, there are various strategies we can employ to enhance memory retention, such as association with familiar objects or repetitive recitation. In the educational context, it is essential to identify the most effective learning methods for increasing long-term retention and optimising educational programmes. This review aims to investigate the impact of difficulty during learning on long-term retention, specifically contrasting two opposing frameworks: the Desirable Difficulties Framework (DDF) and the Cognitive Load Theory (CLT). The DDF suggests that introducing certain difficulties during learning can enhance long-term retention (Bjork, 1994; Bjork & Bjork, 2020). This idea is also supported by the Retrieval Effort Hypothesis (REH), which was developed from the DDF with a focus on retrieval learning. The CLT, on the contrary, posits that excessive difficulty (cognitive load) inhibits retention (Howard et al., 2015; Örün & Akbulut, 2019; Sweller, 1988). To date, the two conflicting frameworks are yet to be compared, despite sharing commonalities. After reviewing the current literature, this article aims to provide an updated framework on difficulty in learning.
DDF
The DDF, developed by Bjork (1994), suggests that an effective way to improve long-term retention is to introduce a desirable amount of difficulty (effort) while learning. That is, during encoding, the task used should strike a balance between difficulty and achievability. According to this idea, while the initial performance on a difficult task may be poorer, greater improvements can be seen in the long term, compared to more simple tasks. There are several ways in which difficulty can be induced during learning, one of which being spacing, where study sessions are spread out over time, as opposed to cramming the information in a short period of time. Spacing requires more effort to remember the information over a longer period of time, which helps to strengthen the memory trace. Another method of learning that induces an increased amount of difficulty is retrieval-based learning (RBL), which will be discussed in more depth below.
RBL
The concept of learning via retrieval has had its place in the literature since the early 20th century (Abbott, 1909; Gates, 1917; Spitzer, 1939). There has been much research on RBL, the fundamental basis of which relies on repeated testing of the participant on the subject material (Roediger & Butler, 2011). Typically, the participant is first shown the target stimuli (known as a study phase) before they move on to the next stage, the testing phase. Following this, there may be various patterns of study and test phases before a final test. In an experimental situation, this final test often measures the efficacy of RBL compared to say, study alone. An example of this pattern in an experimental group may be: Study, Test, Study, Test (STST); compared to a control Study alone group (SSSS). Learning via retrieval has consistently been shown to be more effective than studying the material alone (Carpenter et al., 2008; de Lima et al., 2020; Fazio & Marsh, 2019; Karpicke & Aue, 2015; Karpicke & Grimaldi, 2012; Karpicke & Roediger, 2008; Kornell et al., 2011; Roediger & Karpicke, 2006). Furthermore, as well as healthy populations, RBL has also been shown to be effective in language-impaired populations, compared to repetitive study (de Lima et al., 2021) and when spacing is implemented (increasing difficulty), improved results are shown compared to massed study (Middleton et al., 2016, 2019). In an experimental setting, the difference in performance between the retrieval group and the study-only group is known as the “testing effect.”
REH
Consistent with Bjork’s (1994) DDF, the REH states that the more difficult a retrieval is, the more effort it requires from the learner; therefore, increasing the probability that the material will be consolidated in memory and retrieved at a later date. There are several studies that have directly tested the REH, each finding evidence in support of this notion (Carpenter & DeLosh, 2006; Karpicke & Roediger, 2007b; Pyc & Rawson, 2009).
When testing the REH, it is important to understand how effort, a particularly subjective concept, can be manipulated by the experimenter. These variations affect the cognitive processes occurring simultaneously with learning. In retrieval learning literature, there are differences in how both task (the way in which the learning material is presented; (Carpenter & DeLosh, 2006; Kang et al., 2007; Pyc & Rawson, 2009; Stenlund et al., 2016) and item (the learning material itself; de Lima et al., 2020; Minear et al., 2018; Pyke et al., 2023; Vaughn et al., 2013) difficulty can affect final performance on a memory task. It is yet to be clearly established whether the increase in effort elicited by these variables is a direct manipulator or whether effort itself indirectly affects performance on a task. Indeed, one criticism of the REH is that it is a purely descriptive account (Karpicke et al., 2014) and fails to explain reasons why an increase in effort may produce memory benefits. Below, we outline how difficulty can be increased during learning (which increases relative effort) and studies that provide support for the DDF and REH.
Task difficulty
The majority of studies investigating how difficulty can moderate later memory performance have manipulated the tasks that are used during both initial testing (learning phase) and final testing sessions (to assess the efficacy of the learning task). Most commonly, the variance is in the type of task administered. These typically employ either recognition, cued recall or free recall, during either the learning or final test phase. Both cued and free recall are said to be more difficult (and thus induce more effort) because they require the participant to

Variations in effort induced by different types of learning tasks: (a) recognition task, (b) cued recall, and (c) free recall.
As well as the type of test used during learning, it is also possible to manipulate task difficulty by increasing or decreasing intervals between retrieval trials, known as
Regardless of the spacing, there is evidence to suggest that
The reasoning behind why the task type can induce greater retention may be explained by the Perceptual Load Theory (PLT; Lavie & Dalton, 2014). The PLT suggests that performance on a task is greater when the perceptual load related to that task is higher (i.e., more attentional resources are dedicated towards it). This theory seems to align with the notion of the REH, in that the higher the degree of difficulty, the higher the degree of attention focused on it.
Practical examples of PLT related to retrieval learning can be found in research investigating how divided attention interacts with learning. This research typically involves applying a secondary cognitive task concurrently with a primary learning task. Studies that have carried this out with typical memory tasks (study-only) with separate encoding and retrieval phases, found that performance on the primary learning task was only affected by the secondary task during encoding, but not retrieval (N. D. Anderson et al., 1998; Craik et al., 1996). This implies that the encoding phase during a study-only intervention is susceptible to interference from a secondary task, due to lack of attentional focus. However, these studies do not indicate whether encoding via retrieval would also suffer from interference.
Mulligan and Picklesimer (2016) investigated this question by comparing an RBL task (cued recall) with a study-only group under either full attention or divided attention. They found that the testing effect was greater under divided attention than full attention (to reiterate, the “testing effect” is the difference in performance between the RBL group and the study-only group). Therefore, this suggests that the RBL group was more resilient to divided attention than the study-only group. The same was shown in two experiments using free recall (Buchin & Mulligan, 2017, 2019) and this held with both shorter and longer word lists, the latter of which required more effort.
These findings support the use of the PLT to explain why RBL is resistant to divided attention, as it requires considerable perceptual load, and thus perceptual capacity is exhausted. This then reduces the likelihood that irrelevant distractors will interfere with the primary task, potentially increasing the likelihood of successful retention. The PLT may also more broadly explain why an increase in difficulty during RBL tasks, can provide superior long-term retention, compared to easier tasks. As perceptual load is at capacity, full focus is given to the learning task, allowing for an uninterrupted consolidation process.
Item difficulty
The majority of the learning material used to assess the DDF and REH is simple word pairs or sentences. In any memory task utilising stimuli word pairs, there is likely to be a difference in how each item is encoded and recalled. This notion can be quantified as item difficulty. In the literature, several previous studies have rigorously tested each item. For example, Cho et al. (2020) collected normative data for Chinese-English word pairs, requiring participants to partake in three study-test cycles for 160-word pairs. They found that Chinese characters with a higher number of strokes (visual complexity) were less likely to be recalled in a test phase. This type of normative study, among others (de Lima & Buratto, 2021; Grimaldi et al., 2010; Nelson & Dunlosky, 1994; Pyke et al., 2023), allows future research to utilise individual items differently. This can be useful to ensure an equal spread of difficulty between groups, reducing the likelihood that one condition will contain easier items than the other. This normative data have also been used in studies relating to the REH. Proponents of the DDF/REH state that difficult items are more likely to benefit from retrieval learning than easier items as they require more in-depth processing and attentional resources, similar to the idea put forward for cued and free recall over recognition. This was tested by (de Lima et al., 2020) who, over two experiments compared study versus test conditions and easy versus difficult items, with a follow-up cued recall test 48 hours later. Their findings in Experiment 1 showed a strong retrieval practice effect, with easier items recalled more than difficult items. The authors put this down to the lack of correct retrievals during the learning phase, something that is required under the DDF. In the second experiment, additional learning sessions were employed for difficult items (six compared to four for easy items). The final test results displayed a non-significant trend towards a greater retrieval practice effect for difficult items. Therefore, this suggests that item difficulty and perceived effort may contribute towards final retention, however, this is inconclusive. Future studies that seek to replicate the study above could consider employing further learning sessions for difficult items, perhaps employing a performance-based ending threshold. While there has been less research on the REH’s explanation of how item difficulty can affect long-term retention, the individual variability of participants, such as educational background or language, as well as previous semantic representations developed throughout the lifespan, will undoubtedly influence how difficult an item is perceived. Therefore, this highlights that the REH perspective on item difficulty, without the consideration of the potential covariates highlighted above, requires further exploration before a conclusion can be made.
Relating to item difficulty, the Elaborative Retrieval Hypothesis, states that during RBL, the individual activates elaborative information to help with retrieval of the target response. Using paired-word associates, Carpenter (2009) tested this theory by presenting participants with a cued recall test containing either strongly (e.g., Toast: Bread) or weakly (Basket: Bread) associated cues, or a restudy opportunity. While strongly associated pairs were easier to learn, scoring higher on an initial test, weakly associated pairs were better remembered on a final free-recall test due to activation of elaborative information during learning (see Figure 2). This notion is in line with the REH as it suggests that as more effort is required for the weakly associated pairs, these items benefit more from the retrieval practice effect.

A visual example of how harder cues during encoding can facilitate the creation of more semantic mediators to aid in subsequent retrieval.
Support for the Elaborative Retrieval Hypothesis has also been shown in a previous study by Carpenter & DeLosh (2006), even when controlling for item difficulty (Experiment 3) and in a more recent study (Endres & Renkl, 2015), where interestingly the authors stated that the testing effect disappeared when statistically controlling for mental effort. Consistent with the Elaborative Retrieval Hypothesis, they claimed that increased mental effort, as measured subjectively on a sliding scale, leads to spreading activation, which in turn is an indicator of semantic elaboration. Spreading activation can be defined as the implicit creation of new cues in memory, which can then be utilised during retrieval (J. R. Anderson, 1983). This semantic elaboration can aid future retrievals and can even spread to material that has not been initially tested, a phenomenon known as retrieval-induced facilitation (Chan, 2009; Chan et al., 2006; Oliva & Storm, 2023; Rowland & DeLosh, 2014). A difficult task will lead to more mental effort and thus a greater amount of spreading activation occurs, which in turn will strengthen the memory trace, allowing for multiple cue points (Carpenter, 2009). On the contrary, a comprehensive study (7 experiments) by (Lehman & Karpicke, 2016) tested whether semantic mediators (cues linking to target information) related to the target word (e.g.,
So far, this review has presented theories that, for the most part, are consistent with the DDF. These theories suggest that by increasing difficulty, either by manipulating the type of task or difficulty of items during the encoding phase of RBL, one can improve subsequent performance on a final test. While task type (recognition, cued or free recall) employed during learning seems to have an undisputed effect on final test performance, differences in item difficulty and the generation of semantic mediators is still open to debate.
In the next section, this review will address a different type of load, the CLT. This theory is often discussed in the context of PLT, with elements of the PLT said to be related to external aspects and CLT to be related to internal mechanisms of attention and learning. The CLT proposes that an increase in mental effort (comparable to the previously discussed difficulty) can be detrimental to learning. We provide a description of the key components of the CLT, and how it claims optimum learning is achieved while also drawing contrasts and comparisons to the DDF and REH.
CLT
CLT details another explanation for the role of difficulty in learning. More specifically, CLT aims to explain the link between the processing load (i.e., cognitive load) induced by learning tasks and students’ ability to manage novel information to subsequently build knowledge in the form of long-term memory (Sweller, 1988; Sweller et al., 1998). The theory rests on three key assumptions. First, that working memory has a limited capacity and consists of multiple partially independent subsystems. Second, that long-term memory has an unlimited capacity and consists of schemas that categorise information based on how it will be used (Chi et al., 1982). These two assumptions form a third, which is that learning is most effective when instructional procedures
Beyond these foundational assumptions, while cognitive load can generally be defined as the number of working memory resources employed to perform a task, CLT distinguishes between three different types (Sweller et al., 1998, 2019) each of which are reviewed below.
Intrinsic load
Intrinsic load refers to the inherent difficulty of the learning material, which is determined in large part by
Extraneous load
Extraneous load is related to instructional design, which concerns how a task is formatted and presented to the learner. As with intrinsic load,
Germane load
Finally,

A graphical representation of schema construction. This notion is acknowledged in both DDF/REH and CLT literature as a crucial process for learning.
More recently, the concept of germane “load” has been challenged in the literature. Initially, germane load was conceptualised as the mental effort dedicated to schema construction, considered beneficial for learning. However, this approach created ambiguity, as it overlapped with intrinsic load and suggested that increasing germane load would always enhance learning. Both intrinsic and germane loads are involved in processing the essential elements of learning tasks, making it difficult to clearly distinguish between the two. For instance, when learners engage deeply with complex material, the cognitive effort they expend can be seen as contributing to both the intrinsic and germane loads, blurring the lines between them (Greenberg & Zheng, 2023; Kalyuga, 2011). To address these issues, the concept of germane resources was introduced, emphasising the allocation of cognitive resources towards effective learning processes rather than viewing it as an additional load (Sweller, 2023). This shift aligns better with the finite capacity of working memory, focusing on optimising instructional design to maximise the use of cognitive resources for productive learning activities (Kalyuga, 2011; Sweller et al., 2019). As a result, the emphasis has moved from simply increasing germane load to ensuring that learners’ cognitive efforts are directed towards the most meaningful and supportive tasks in the learning process (Sweller et al., 2019).
CLT in practice
Empirical evidence for CLT originates largely from studies that provide support for various effects set forward by the theory (Sweller et al., 1998), some of which will be discussed here. The
The
The
The review of the CLT presented above has indicated how the different types of load may be either of detriment (extraneous and intrinsic) or of benefit (germane) to learning. There appears to be considerable support for the practical applications of CLT; however, these are dated and would therefore benefit from more recent evidence.
It is evident that the DDF (including REH) and CLT offer contrasting perspectives on how difficulty affects learning. While the DDF/REH argue that increasing difficulty during learning enhances long-term retention, the CLT suggests that, in many cases, reducing difficulty leads to better learning outcomes and, consequently, improved retention. However, these theories are often applied in different contexts, utilising different types of learning materials. In addition, CLT emphasises the importance of considering learner expertise and adapting instructional design to maximise efficiency. In the next section, a model is proposed that integrates these perspectives, aiming to optimise learning by strategically adjusting difficulty based on the nature of the material and the learner’s level of expertise.
Working towards a new model of difficulty in learning
This review sought to evaluate the role of difficulty in learning by examining the DDF and CLT, with a focus on how task difficulty impacts learning outcomes. Research shows that task difficulty moderates the effects seen in studies on DDF and the REH (see Table 1). Free and cued recall tasks yield the highest testing effect when implemented during encoding, even when there is a mismatch in initial and final test type (e.g., free recall followed by recognition; (Endres & Renkl, 2015)). This is likely due to the development of stronger memory traces and retrieval schemas during encoding, which is then able to facilitate easier recall when required (Zaromb & Roediger, 2010). In CLT, the
Studies supporting increasing difficulty during learning.
The DDF posits that difficulty must strike a balance between challenge and achievability for it to be desirable (Bjork & Bjork, 2020). If a task is too difficult due to a learner’s lack of ability, the difficulty becomes undesirable. In such cases, corrective feedback can reduce difficulty and prevent disengagement (Binks, 2018; Kornell et al., 2011). Similarly, CLT suggests that instructional design should consider the learner’s prior knowledge, with more guidance needed for novices than for experts (Chen et al., 2023). Both theories, therefore, agree that individual differences should inform the design of learning tasks.
Based on these insights, a new model of difficulty in learning is proposed that integrates DDF (including REH) and CLT (see Figure 4).

A model to incorporate the DDF and CLT. The top section of the diagram follows the CLT theory in that it is necessary to reduce difficulty so as not to exhaust working memory (WM) capacity. The bottom section of the diagram suggests a new model that utilises the PLT to explain why easier learning material with lower element interactivity will benefit from an increase in difficulty. Both theories agree that the overall aim is to encourage successful schema formation.
The proposed model of difficulty in learning seeks to integrate and extend the principles of the DDF (including REH) and CLT, by incorporating insights from PLT. This model emphasises that the effectiveness of learning tasks depends not just on the inherent difficulty of the material but also on the strategic modulation of difficulty to optimise cognitive resources and attention.
When dealing with learning material low in element interactivity, such as reading text or memorising word pairs, the perceptual load is naturally low, making the learner more susceptible to distractions like mind wandering. In such cases, increasing task difficulty through methods like testing (e.g., cued or free recall) can help to allocate more attentional resources to the task, thereby reducing interference and enhancing learning outcomes. This approach not only heightens attentional focus but also strengthens memory traces and promotes the formation of schemas, leading to better retention compared to passive study methods. This principle aligns with the Elaborative Retrieval Hypothesis, which suggests that more effortful retrieval of weakly associated pairs improves long-term memory (Carpenter, 2009; Carpenter & DeLosh, 2006; Endres & Renkl, 2015; Rawson et al., 2015).
Conversely, when the learning material is high in element interactivity, such as in complex problem-solving or mathematical tasks, the cognitive and perceptual loads are significantly higher. In these scenarios, increasing difficulty could overwhelm the learner’s working memory, leading to ineffective learning or even cognitive overload. To avoid this, it would be most effective to decrease difficulty by providing worked examples and scaffolding, which help to reduce cognitive load and allow working memory to be used more efficiently (Chen et al., 2016a). This approach facilitates schema formation through the borrowing and reorganising principle, enabling learners to develop effective strategies without the strain of excessive cognitive load (Paas & van Merriënboer, 2020; Sweller & Sweller, 2006).
It is also important to take into account the expertise of the learner. For novice learners, increasing the difficulty of low-element-interactivity material can be beneficial, as long as the task remains achievable and is supported by corrective feedback to prevent frustration or disengagement. However, for expert learners with low-element interactivity, a ceiling effect may occur, where additional difficulty does not yield further benefits. For novices dealing with high-element-interactivity material, reducing difficulty through guided instruction and worked examples is essential to avoid overwhelming their cognitive capacities and facilitate effective learning. Expert learners facing high-element-interactivity material are more likely to benefit from tasks that require active problem-solving and critical thinking, as they have already developed schemas that allow them to handle complex information more efficiently. In such cases, increasing the difficulty would require more attentional resources (as per the PLT) and promote the refinement of their knowledge.
Building upon the proposed framework, future research should aim to empirically validate the model through controlled experimental studies that manipulate task difficulty, element interactivity, and learner expertise. For instance, experiments could investigate how increasing task difficulty in low-element-interactivity tasks affects retention in both novice and expert learners. Conversely, studies could examine how reducing difficulty in high-element-interactivity tasks impacts cognitive load and schema formation. Longitudinal research assessing retention over extended periods would provide valuable insights into the durability of learning outcomes associated with these manipulations.
Another promising avenue is the development of adaptive learning systems that adjust task difficulty in real time based on the learner’s performance and the complexity of the material. Such systems could incorporate corrective feedback mechanisms to maintain optimal challenge levels, ensuring that cognitive and attentional resources are allocated effectively. Applying the model across various educational domains—such as mathematics, language learning, and science—would test its generalisability and practical utility. Integrating the framework into educational technology and e-learning platforms could facilitate personalised learning experiences, enhancing engagement and efficiency.
Finally, exploring individual differences and learner characteristics presents an opportunity to refine the model further. Research could examine how factors like working memory capacity, prior knowledge, motivation, and cognitive abilities interact with task difficulty and element interactivity. This approach would allow for the tailoring of instructional designs to meet the needs of diverse learner populations, including those with special educational requirements. By addressing these areas, future research can strengthen the theoretical foundations of the framework and contribute to the development of effective educational practices.
In conclusion, this new model does not reject existing theories but rather synthesises them into a more comprehensive framework. It reiterates the importance of considering both the nature of the learning material and the learner’s prior knowledge when designing instructional tasks. By individually adapting task difficulty, learning can be optimised by ensuring that cognitive and attentional resources are allocated most effectively, which in turn will promote superior long-term retention.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
