Abstract
Working memory (WM) training with the N-Back task has been argued to improve cognitive capacity and general cognitive abilities (the Capacity Hypothesis of training), although several studies have shown little or no evidence for such improvements beyond tasks that are very similar to the trained task. Laine et al. demonstrated that instructing young adult participants to use a specific visualisation strategy for N-back training resulted in clear, generalised benefits from only 30 min of training (Strategy Mediation Hypothesis of training). Here, we report a systematic replication and extension of the Laine et al. study, by administering 60 younger and 60 older participants a set of WM tasks before and after a 30-min N-back training session. Half the participants were instructed to use a visualisation strategy, the others received no instruction. The pre-post test battery encompassed a criterion task (digit N-back), two untrained tasks N-back tasks (letters and colours), and three structurally different WM tasks. The instructed visualisation strategy significantly boosted at least some measures of N-back performance in participants of both age groups, although the strategy generally appeared more difficult to implement and less beneficial for older adults. However, the strategy did not improve performance on structurally different WM tasks. We also found significant associations between N-back performance and the type and level of detail of self-generated strategies in the uninstructed participants, as well as age group differences in reported strategy types. WM performance appeared to partly reflect the application of strategies, and Strategy Mediation should be considered to understand the mechanisms of WM training. Claims of efficient training should demonstrate useful improvement beyond task-specific strategies.
Working memory (WM) refers to cognitive functions that support the ready availability of a small amount of information on a temporary basis while we undertake ongoing actions and mental activities (e.g., Logie & Cowan, 2015). WM is viewed as a core mechanism underpinning higher-order cognitive abilities such as perception and problem-solving (Ma et al., 2014), and is related to fluid intelligence (Kane et al., 2005; Unsworth et al., 2014), reasoning ability (Conway et al., 2003; Kyllonen & Christal, 1990), and measures of cognitive control (Conway et al., 2001; Kane & Engle, 2003; Redick et al., 2011). WM also suffers pronounced, linear decline during adult ageing (Bopp & Verhaeghen, 2005; Borella et al., 2008; Park & Payer, 2006), although some aspects of WM decline faster than others; verbal WM appears least susceptible, and visuo-spatial most susceptible to age-related decline (Johnson et al., 2010; Park et al., 2002). Functioning of WM abilities is important for the autonomy and well-being in older adults (Tomaszewski Farias et al., 2009). Hence, when early studies suggested that repeated adaptive WM training could protect older adults from cognitive decline (e.g., Brehmer et al., 2012), there was great interest (Green & Bavelier, 2008; Klingberg, 2010; Lövdén et al., 2010; Morrison & Chein, 2011; von Bastian & Oberauer, 2014), due to the potential benefits to public health and well-being. In addition, several commercial companies have promoted WM training software, claiming scientific support for a range of benefits such as an increasing IQ (Mindsparke, 2011), improving grades (Jungle Memory, 2011), and reducing day-to-day lapses of attention (Cogmed, 2011).
Contemporary approaches to cognitive training stem from evidence of neural plasticity related to cognition in both younger and older adults (e.g., Hertzog et al., 2008). The brain was likened to muscles, growing physically larger and stronger when repeatedly challenged at close to maximum currently manageable difficulty (i.e., adaptivity). Based on this analogy, researchers proposed that such challenging training of WM increases WM capacity (e.g., Morrison & Chein, 2011) by eliciting functional and anatomical changes in the brain (Dahlin et al., 2008). Such changes, they suggested, may help preserve brain integrity as we age, and produce lasting improvements in fluid intelligence, if WM and fluid intelligence rely on a shared capacity constraint (Halford et al., 2007). The attractive idea of increased WM capacity as a result of training has been referred to as the
While WM training initially appeared promising (training improved performance even on untrained, quite different cognitive tasks in healthy adults: Jaeggi et al., 2008, and children with attention-deficit hyperactivity disorder [ADHD]; Klingberg et al., 2002), subsequent research in healthy children and younger adults challenged these claims. With more appropriate experimental controls, it appeared that WM training typically improved performance on the trained task itself, as well as on other verbal and visuospatial WM tasks that were similar to the trained task, whereas far-transfer effects to reasoning, or fluid intelligence were at most small and unreliable across different studies (for comprehensive meta-analyses, see Melby-Lervåg et al., 2016; Melby-Lervåg & Hulme, 2013; Schwaighofer et al., 2015; Weicker et al., 2016; see also Dougherty et al., 2016; Kassai et al., 2019; Lampit et al., 2014; McCabe et al., 2016; Redick, 2019; Simons et al., 2016). Evidence regarding the effects of training in older adults is also mixed. A meta-analysis of 13 studies indicated that WM training in healthy older adults produced both large near- and far-transfer effects (Karbach & Verhaeghen, 2014). However, when Melby-Lervåg et al. (2016) replicated the meta-analysis by only including studies which compared the trained group to active controls and controlled for baseline differences, they found much smaller effects of training than originally reported. Moreover, in a recent meta-analysis of the commonly used N-back WM training by Soveri et al. (2017), the only more substantial effects following WM training were seen in task-specific near-transfer measures, that is, in tasks that were structurally similar to the trained WM task(s). In general, meta-analyses with less stringent inclusion criteria typically find both near- and far-transfer effects in older adults (e.g., Chiu et al., 2017). It has been difficult to reach consensus regarding the effects of cognitive training due to variations in training paradigms and in what is considered an appropriate control group (see Melby-Lervåg et al., 2016; Melby-Lervåg & Hulme, 2013; Morrison & Chein, 2011; Shipstead et al., 2010, 2012).
In addition to methodological inconsistencies, different theoretical perspectives may contribute to confusion in the literature. Some theories propose that online cognition is limited by the capacity of a domain-general attentional resource or WM system (Engle & Kane, 2004), and advocates for the benefits of WM training argue that this resource can be increased by WM training, thus enhancing general cognitive abilities (Au et al., 2015; Jaeggi et al., 2008). For example, the amount of information WM can retain and manipulate is thought to constrain “fluid” intelligence, as measured by Raven’s Progressive Matrices (Jaeggi et al., 2008). According to the
In contrast, other theories view WM as involving a variety of cognitive systems, among which participants select according to task demands (Baddeley & Logie, 1999; Logie, 2011; Logie & Niven, 2012). For instance, one system may retain phonological codes, another visual codes. When tasked to remember sets of digits, participants may remember them phonologically, by their visual shapes, or using a semantic memory strategy. Therefore, performance may reflect use of different cognitive resources in different participants (Johnson et al., 2010; Logie, 2018; Logie et al., 1996, 2011; Thurstone, 1931), and crucially, participants may change how they attempt to perform a task as they see how well a given strategy works with repeated trials, or as a result of explicit instruction. Training thus might improve one particular cognitive skill, or lead to strategic recruitment of a different cognitive mechanism, with potentially different implications for transfer to other tasks. Based on studies that had indicated improved Raven’s Matrices performance following training with the commercial Cogmed WM training programme (Roughan & Hadwin, 2011), Shipstead et al. (2012) suggested that this might occur because this test used to measure “fluid” intelligence requires visual processing and matching very similar to the tasks trained with Cogmed. Thus, WM training may improve specific abilities, rather than improving some underlying intelligence “capacity.” It is also possible that training results in development of highly practiced cognitive skills so that, after training, the tasks that require these skills rely less (or not at all) on WM capacity (e.g., Gopher et al., 1989; Schneider, 1985). In this argument, the capacity of WM previously required for the untrained task is then available for other tasks, giving the misleading impression that its capacity has increased (for discussions see Gathercole et al., 2019; Logie, 2012, 2018).
Typically, adaptive training (i.e., tasks get harder as the participant improves) is associated with significantly better performance improvement than non-adaptive training (i.e., performing the task at a consistent level of difficulty; for example, Holmes et al., 2009; Klingberg et al., 2005; Olson & Jiang, 2004; Thorell et al., 2009) and is seen as a key ingredient of effective training. Interestingly, some evidence suggests that adaptive training may also affect strategy use. Post-training interviews following Cogmed training indicated that participants in an adaptive training group reported using grouping strategies significantly more than did active and passive control group participants. This was associated with larger performance gains in some of the post-tests (Dunning & Holmes, 2014). This suggested that adaptive training may be comparatively more beneficial because participants are encouraged to develop new strategies as the task gets more challenging.
Laine et al. (2018) proposed and explicitly tested one aspect of this, the
However, the Strategy Mediation and Capacity Hypotheses are not mutually exclusive. While associations of performance gains with strategies provide support for the Strategy Mediation Hypothesis, they do not rule out the possibility that training increases actual capacity of some sort. Laine et al.’s (2018) finding that practising with a strategy for 30 min resulted in gains equivalent to those typically observed after 5 weeks of N-back training did indicate that for training studies to be taken seriously, they should also demonstrate that trained participants developing a task-specific strategy cannot alone explain improved performance. For instance, the strategy of visualising digits used by Laine et al. may be unlikely to improve general reasoning or prevent age-related cognitive decline, but it did appear to boost N-back performance greatly. Establishing the mechanisms behind training-induced performance improvements is crucial to determining whether the intended cognitive improvement has occurred, and what factors might have led to any such improvement.
Moreover, important findings should be replicated, ideally in a different lab and with a different participant sample (see Simons, 2014). Therefore, in this study, we conducted a systematic replication of Laine et al. (2018) in a different country, using an online methodology, and unlike that previous study, also recruited healthy older adults.
Similar to the original study, our purpose was not to falsify the Capacity Hypothesis. Instead, we tested the Strategy Mediation Hypothesis by investigating the roles strategy use can play in these tasks, to further explore its role as one possible source for WM training outcomes. Specifically, our research question was: what are the effects of instructed and self-generated strategy use on WM updating performance, in healthy younger and older adults? We assessed this in the two age groups by testing the hypothesis that explicit instruction to use a visuospatial grouping and comparison strategy in a digit N-back task would improve performance in the trained task and in untrained N-back tasks employing different stimuli (letters, colours) in younger adults (directional; replication of findings in Laine et al., 2018; H1). Moreover, evidence suggests that older adults are not merely like poorly performing younger adults (e.g., Perfect & Maylor, 2000; Rabbitt, 2005). Instead, as noted earlier, different cognitive abilities appear to decline at different rates, and younger and older adults may use different cognitive resources when performing the same cognitive task (e.g., Johnson et al., 2010). Therefore, it is unclear whether Laine et al.’s (2018) visualisation strategy would be equally efficient in older adults, and whether non-instructed older adults would make different strategic choices than younger adults. However, healthy older adults are a target group for training, given that they might be worried about cognitive decline (e.g., Federal Trade Commission, 2016). So, it is important to discover whether or not such training packages are likely to be beneficial. Some previous studies instructing participants to apply mnemonic techniques or strategies have found more substantial training gains in younger than in older adults (e.g., Lövdén et al., 2012; Verhaeghen et al., 1992; Verhaeghen & Marcoen, 1996 but see Gross et al., 2012). However, older adults’ WM performance benefitted from instruction to switch to visual codes in a verbal WM task (Osaka et al., 2012). Due to a lack of background evidence on how this specific strategy actually improves performance, we hypothesised that explicit strategy instruction would affect post-test performance in healthy older adults to the same extent as in younger adults (H2). Next, we hypothesised that reported self-generated strategies (in the non-instructed group) would be associated with better memory performance on the trained N-back task and in untrained N-back tasks employing different stimuli (letters, colours) in younger adults (directional; replication of findings in Laine et al., 2018, H3), and that similar effects of self-generated strategies would be observed in the older adults as well (H4).
The four hypotheses, methods, and analyses were pre-registered via the Open Science Framework (https://osf.io/npzkc).
Method
Participants
Our pre-registered target sample size was 60 younger and 60 older adults. These numbers ensured a power of at least .95 to detect a medium effect of strategy condition on the trained N-back digit task, and a power of .80 to detect near-transfer to other N-back tasks, determined by a power analysis using G*power (Faul et al., 2007), based on effect sizes in the study we aimed to replicate (Laine et al., 2018).
1
We recruited a total of 136 participants: 74 younger adults who were students or former students at the University of Edinburgh, and 62 older adults who were members of a Participant Volunteer Panel, or a lifelong learning group. Two older and 13 younger adults were excluded and replaced for failing to complete all three sessions. We excluded one younger participant who reported using pen and paper in the memory tasks, and one who completed the first session twice. The final sample consisted of 60 younger adults (
Participant exclusions by age and strategy group.
RT: reaction time.
One excluded from the training analysis.
Post-test N-back digit (1), RTs in pre-test 2-back colours (1).
Missing data in both N-back colours and RTs in pre-test 2-back letters (2).
Colour-blind participants were excluded from colour N-back task.
Procedure
We used a mixed pre- and post-test intervention design. First, participants completed a set of cognitive tasks (taking 1–1.5 hr) to assess baseline abilities. Two days later, they did a 30-min adaptive N-back task (

The visualisation strategy instructions for participants in the strategy groups during training.
Our procedure differed from that of Laine et al. (2018) as follows: (1) in contrast to Laine et al. (2018), we did not include a passive control group that did not perform any training between pre- and post-test, because the central question concerned the presence or absence of strategy instruction. (2) While all their participants were younger-adult university-level students, we also included a group of older adults. (3) Their participants performed pre- and post-test sessions in the laboratory while our participants completed all sessions online. (4) Our instructions and tasks were in English, theirs in Finnish. (5) We did not screen participants for health conditions (except for cognitive impairments in the older adults). Apart from these differences, our study was identical to theirs. We chose an online methodology because WM training software promoted by companies are typically intended for independent use with home computers or smartphones, and it enabled us to test a larger number of participants. However, there was a possibility of less attentive or compliant participants. To minimize the impact of this, we screened for outliers and asked participants if they used external tools (e.g., writing things down) when performing the tasks.
WM: training task
The strategy and control groups performed the same digit N-Back training task, but the strategy group was instructed to use the strategy illustrated in Figure 1. Participants saw digits (1–9) displayed one at a time, in the centre of the screen. They responded to each digit with the N or M key on their keyboard (meaning Yes or No, respectively) to indicate whether the current digit corresponded to the digit presented
Each participant completed 20 blocks of 20 +
Strategy instruction
The strategy instruction taught participants to visualise the incoming
Expectations
Prior to starting the training session, participants reported how much they thought they would improve on the training task during the session, using a 10-point Likert-type scale (1 =
Motivation and alertness
Before the training session, participants rated their motivation to perform the tasks and alertness on scales from 1 to 5.
Pre- and post-test measures
The following six cognitive tasks made up both the pre- and post-training test sessions and were thus completed by each participant twice, to compare performance improvement in participants who trained using the visualisation strategy with that observed in the control, no strategy group.
Criterion training task
Digit N-back
This was a shortened version of the adaptive training task described above, including 10 blocks instead of 20. Dependent variables were (1) the maximum digit level the participant had reached and (2) the average N-back level.
Untrained N-back tasks (task-specific near-transfer measures)
Letter N-back
This was a non-adaptive letter N-back task (2-back and 3-back), in which participants saw sequences of letters, and responded whether a given letter was identical to one presented 2 or 3 letters back. Participants did one block of the 2-back, one of the 3-back (order randomised) each containing 48 letters. Among these, 16 were targets, 32 non-targets, and half of the non-targets were lures (i.e., a letter identical to the letter presented next to the letter participants should base their response on; 8
Colour N-back
This was identical to the letter N-back task, but coloured squares were shown instead of letters.
Untrained WM tasks (task-general near-transfer measures)
Selective updating of digits
In this WM updating task (Murty et al., 2011), five digits between 1 and 9 were displayed on the screen in a row of five squares. Participants attempted to memorise the digit sequence. Then, a new row of five squares replaced the initial sequence. Two of the new squares contained digits, and three were empty. Participants were to replace the old digits with the new digits while maintaining the unchanged digits in memory. Each participant completed 10 trials with three such updating stages (i.e., new digits replaced original ones) and also 10 trials without updates. Participants saw the original five-digit sequence (4,000 ms), followed by a blank screen (100 ms), and the first updating stage (2,000 ms). At the end of each trial, participants reported the final five-digit sequence by clicking on the relevant digits in a recall grid with horizontally aligned squares containing numbers 1 to 9. All digit sequences followed these rules: (1) digit updates never occurred in adjacent squares, (2) adjacent digits deviated with more than one from each other (e.g., “2” could not be next to “1” or “3”), and (3) the two updated digits were never identical. Trial order was randomised between participants. The dependent variable was the percentage of correctly recalled digits (in the right order) in the updating trials.
Forward simple span
Participants were to remember sequentially presented digit sequences containing between 4 and 10 digits (one trial of each length) in order of appearance. Trial order was randomised for each participant. First, a fixation cross was shown in the middle of the screen (500 ms), followed by a digit (1,000 ms) and this procedure continued until all digits in the sequence had been presented. Then, participants recalled the digits by clicking on the correct digits (in the right order), displayed in horizontally aligned squares containing all possible digits (1–9). The dependent variables were (1) total number of correctly recalled digits in the correct serial position and (2) maximum span, that is, highest span length where all digits were recalled in the right order.
Running memory
Participants were instructed to report the final four digits of sequences containing between 4 and 11 items. A total of eight trials—one trial per sequence length—appeared in random order. First, a fixation cross appeared on the screen (500 ms), then a digit (1,000 ms), until the sequence ended. Participants then selected the final four digits in the same order as they had been presented, using a recall grid with horizontally aligned squares containing numbers 1–9. The dependent variable was the total number of correctly recalled items, in the correct position.
The strategy questionnaire
After completing all cognitive tasks in the post-training test session, participants filled out a questionnaire about their strategy use in each task they completed in the pre- and post-training test sessions, respectively. First, they responded to whether they had used a strategy (yes or no) for each specific task during the pre-test. If yes, they were asked to describe the strategy. They then indicated whether their strategy had changed between pre- and post-training tests (yes or no). If yes, they described their post-training test strategy.
Results
Exclusions
We excluded one younger adult in the control group who reported using pen and paper in the majority of the tasks. Also, one younger adult in the strategy group used pen and paper in one task and was excluded from that specific analysis. We excluded five participants with five or more errors on the Ishihara colour vision test from the colour N-back analyses and four participants from specific tasks due to missing data. See Table 1 for a summary of all exclusions by age and strategy group. Our results differed from Laine et al.’s (2018) in a way we had not anticipated—many of our strategy-group participants reported that they did not use the instructed strategy during training. In the original study, only 3 of 37 (8%) strategy-group participants failed to comply with the instruction, and non-compliant participants were not removed. In this study, 6 of 31 (19%) younger adults and 11 of 30 (37%) older adults in the strategy group reported not using the instructed strategy. We had not specified in our pre-registration how we would handle non-compliant participants. However, the aim was to replicate the study by Laine et al. (2018) with a different sample and test the effect of the instructed strategy in older adults. Hence, including non-compliant participants may lead to the trivial explanation that results did not replicate because too many of our participants did not use the strategy. Excluding non-compliant participants left 49 older and 54 younger adults, resulting in a power of .95 to detect the main effect on digit N-back performance observed by Laine et al. (2018) and a power of at least .80 to replicate the effects on untrained letter and colour N-back tasks. Therefore, we focused on results from compliant participants. For transparency, we present output from analyses including all participants in the supplementary materials and point out the differences. We also conducted exploratory analyses to confirm that non-compliant participants were not a less motivated or capable subset by comparing pre-test composite scores in younger and older compliant and non-compliant strategy participants (no significant differences; see Supplementary materials). We performed all analyses in the R environment version
Background and pre-test characteristics
The control and compliant strategy groups did not differ significantly in years of education, gender distribution, or pre-test N-back composite performance in either age group (see Table 2). However, there was a significant age difference between control and strategy groups in older adults, such that participants in the strategy group were younger,
Demographics and pre-training N-back performance.
Values in parentheses are standard deviations.
Alertness, motivation, and expectations
We assessed expected training-session improvement in participants in the strategy and control groups after the strategy participants had learned the strategy, but before starting the training. This was to check whether expectations were higher in the strategy groups, which might signal a placebo effect. There was no difference in expectations between control and strategy participants in younger,
Training session data
Figure 2 shows performance over the 20 N-back blocks during the 30-min training session in the control and strategy groups in younger (panel a) and older adults (panel b). While Laine et al. (2018) found that participants using the instructed strategy outperformed control group participants already in the fourth training block, we found no differences in the fourth block in our younger adults,

Performance across the 20 N-back digit training blocks, in the control and strategy groups in (a) younger and (b) older adults. Error bars represent standard errors of means.
The effects of training: pre- versus post-test performance
We tested whether training with the instructed strategy improved performance from pre- to post-training sessions on the various tasks to similar extents in the two age groups. Post-test performance was the dependent variable, pre-test performance the covariate, and strategy and age groups were between-subjects factors. See Tables 3 and 4 for pre- and post-training descriptives (means, standard deviations, pre-post correlations, and effect sizes) for each group, and Table 5 for analysis of covariance (ANCOVA) statistics. To adjust for multiple comparisons, we applied Benjamini-adjusted
Mean values (standard deviations) for the pre-post measures per group at pre- and post-test, for younger adults.
RT: reaction time; WM: working memory.
Values in parentheses are standard deviations.
Mean values (standard deviations) for the pre-post measures per group at pre- and post-test, for older adults.
RT: reaction time; WM: working memory.
Values in parentheses are standard deviations.
ANCOVA results for the trained task and for the transfer measures.
RT: reaction time; WM: working memory.
To adjust for multiple comparisons, Benjamini–Hochberg adjusted
The trained N-back task with digits
A 2 (Group) × 2 (Age) between-subjects ANCOVA of maximum post-test N-back performance that controlled maximum pre-test N-back performance indicated significant main effects of strategy,

Average performance across the 10 blocks of the trained N-back task at pre- and post-test in the control and strategy groups in (a) younger and (b) older adults. Error bars represent standard errors of means.
As an additional exploratory analysis, we also examined the block-level improvement in the trained digit N-back task at post-test using a linear mixed-effects analysis. As in the training analysis (see section “Training session data”), the fixed effects consisted of Age Group, Strategy Group, and Block (coded both as a linear and a quadratic term) together with their interaction terms. Moreover, we included the maximum reached digit N-back level at pre-test as a time-invariant covariate to control for possible group differences prior to intervention. Participant served as the random effect. The results showed a significant Group × Age interaction (Estimate = 0.17,
Untrained N-back tasks (task-specific near-transfer)
Letter N-back
There was no significant effect of age or strategy group on d-prime in the Letter 2-back, and no interaction (all
Colour N-back
We excluded five participants with five or more errors on the Ishihara colour vision test from these analyses. There was no significant main effect of strategy group for the 2-back d-prime (
Other untrained WM tasks (task-general near-transfer)
There were no significant main effects either of age or strategy group nor any interactions for selective updating of digits, running memory, or either forward digit span measure (correctly recalled digits, or maximum span), all
Self-generated strategies and performance
We tested whether (1) the types of reported self-generated strategies and (2) the reported levels of detail of those strategies were associated with better post-test N-back performance in control group participants. Only control participants were used to obtain a “pure” measure of spontaneously generated strategies in participants who were not exposed to any strategy instruction. One older adult was excluded due to missing strategy data for N-back letters and colours. Thus, the final sample of controls included 58 participants. The types of strategies and level of detail reported in the two age groups are presented in Figure 4.

(a) Regression plot with level of detail of reported strategies (9 = maximum level of detail) as the independent variable (X-axis) and the N-back composite score (Y-axis) as the dependent. The N-back composite summed up post-test
Self-generated strategies: type
We classified self-generated post-test strategies according to Laine et al.’s (2018) classification scheme, based on categories used by Morrison et al. (2016). Two independent raters classified each strategy report into one of these categories: Rehearsal, Grouping, Updating, Grouping and Comparison, Semantics, Phonology, Imagery, Familiarity, Guessing, Other Strategy, or No Strategy (see Supplementary Table S5). Initial inter-rater reliability (unweighted Cohen’s kappa) for the three N-back tasks was consistent and good: trained digit N-back (κ = .79, 95% confidence interval, CI = [0.72, 0.86]), letter N-back (κ = .81, 95% CI = [0.74, 0.88]), and colour N-back (κ = .81, 95% CI = [0.73, 0.88]). The raters then resolved discrepancies through discussion consensus, producing the final strategy type classifications used in the analysis. Strategies reported by less than 5% of participants were grouped as “Other Strategy” (see Supplementary Table S6 for the distributions of strategy types used in the three N-back tasks at post-test). The final list comprised five categories for the digit and letter N-back (No Strategy, Rehearsal, Grouping, Grouping and Comparison, and Other Strategy) and four categories for the colour N-back (No Strategy, Rehearsal, Grouping, and Other Strategy). We tested if N-back performance differed by strategy type using one-way analyses of variance (ANOVAs). No strategy served as the baseline. In each model, the dependent variable was N-back post-test performance and strategy type was the between-subjects factor. Figure 4 shows N-back post-test performance as a function of strategy type at post-test for each N-back task. We did not include age as a factor given the limited number of observations but, see Figure 4 for usage by age group.
Digit N-back (maximum level)
Reported strategy use was associated with significantly better performance than not using a strategy,
Letter 3-back (d-prime)
Using a strategy was significantly better than not using a strategy,
Colour 3-back (d-prime)
Again, using a strategy was better than not using a strategy,
Verbal rehearsal in older adults: exploratory analyses
Perhaps Rehearsal was not associated with better performance compared to No Strategy because Rehearsal was primarily used by older adults, who may generally perform worse than younger adults. To test this possibility, we performed exploratory analyses comparing older adults using Rehearsal with older adults using No Strategy, for the three different N-back tasks.
2
For the letter N-back (3-back d-prime), there were no differences,
Self-generated strategies: level of detail
We tested whether the level of detail of the reported strategy during post-test was associated with post-test N-back performance in controls. The same raters as above scored the reported strategies based on the criteria used by Laine et al. (2018) on a scale from 0 to 3. Zero meant that participants did not report using a strategy. One point was given to a vague, non-specific strategy (e.g., “I memorised the digits in my mind”) and two points for a clear strategy with at most one detail (“I memorised the digits in pairs, such as 52–48”). Scorers gave three points for clearly described strategies with at least two details (e.g., “I split the digits into different series, and compared those to each other”). The raters scored the three N-back varieties (digit, letter, and colour), such that each participant had a total N-back level-of-detail score between 0 and 9.
There was good interrater reliability between the two independent raters for this scoring procedure (linearly weighted kappa analysis; κw = .83, 95% CI = [0.80, 0.86]; Cohen, 1968). The raters then discussed and reached consensus on all discrepant scores, producing a final level of detail score for each control group participant. To test if these scores predicted general N-back post-test performance, we calculated an N-back composite score including: (1) for the trained digit N-back task: summed values of the z-transformations of the post-test average and maximum level reached, and (2) post-test d-prime variables in the letter and colour 3-back tasks.
We performed a multiple regression analysis with the N-Back composite score serving as the dependent variable, and level of strategy detail and age group serving as predictors. The results showed a significant regression equation,
Discussion
This study tested the Strategy Mediation Hypothesis of WM training via external (i.e., instructed) and internal (i.e., spontaneously self-generated) strategy use in a single session of adaptive N-back training. It was a systematic replication of the study by Laine et al. (2018) to test the validity of their results for younger adults in a different sample of participants (see Simons, 2014). We also explored potential implications of strategy use in N-back training in healthy older adults, given that they are often targeted by commercial training programmes (e.g., Federal Trade Commission, 2016).
The instructed N-back strategy was associated with greater performance improvement during the training session across the 20 training blocks in younger adults and was associated with significantly better performance on the trained N-back digit task a few days later, during the post-test session. However, the older adults appeared to benefit less from strategy instruction across blocks than the younger adult strategy group (see Figure 2). Instructed strategy was also associated with significantly more accurate performance on the more difficult version of the untrained colour N-back task (3-back) in both age groups, without improved reaction times—similar to transfer patterns typically seen after weeks of ordinary adaptive WM training (Soveri et al., 2017), and similar to Laine et al.’s (2018) observations. However, even though the effect size of the strategy (i.e., Control group vs Strategy group) was moderate following training in the untrained letter N-back (
These results indicate that learning to use a specific strategy—which is unlikely to improve general reasoning ability or prevent age-related cognitive decline—can produce significant N-back performance gains. This has several implications for the training literature. First, our results were in line with the notion that much of N-back training is task-specific (Soveri et al., 2017). Before encouraging members of the public to spend time and money on cognitive training, it should be established that improvements are not limited to some task-specific strategic approach—which is probably nearly useless in the individuals’ lives. Some training programmes keep users engaged via task-improvement feedback, suggesting that better performance implies improved WM ability. However, our findings of significant strategy-induced task-specific near transfer without task-general near transfer, along with those from many other studies, suggest that such claims are vastly overstated.
Strategy-induced improvements raise further questions regarding whether training strategies can be applied to outcome variables claimed to reflect far transfer. If so, perhaps some types of training are associated with far transfer improvement because trained participants develop a strategy which generalises to the outcome measure. Further research should explore whether strategies developed during training are applied to seemingly unrelated outcome measures. For instance, tests assumed to measure “fluid” intelligence (e.g., Raven’s Matrices) are often used as measures of far-transfer training gains. Cogmed’s visual processing and matching training is similar to Raven’s Matrices (Shipstead et al., 2012). Using a speeded-up version of Raven’s Matrices (e.g., Jaeggi et al., 2008) may even increase these similarities (Chuderski, 2013). Moreover, some evidence suggests that opportunity to practice may improve performance on Raven’s Progressive Matrices (e.g., Blieszner et al., 1981; Denney & Heidrich, 1990; Klauer et al., 2002). Thus, training control groups on a different task can be misleading if it differs in terms of structural similarity from outcome measures. If a WM training paradigm only improves performance on one specific reasoning measure, strategy mediation in far-transfer measures needs to be ruled out. Arguably, transfer should generalise to several structurally different outcome tasks, before transfer to for instance “fluid” intelligence is asserted.
However, evidence that strategy use improves performance on trained tasks does not falsify the Capacity Hypothesis of WM training; it is still possible that training also usefully improves cognitive capacity. According to the Capacity Hypothesis, training works by challenging the cognitive system, and working at one’s capacity limits is considered a prerequisite for the sorts of plastic changes in the brain considered to reflect increased capacity (e.g., see Klingberg, 2010). If strategies reduce cognitive load by making the task easier, this might prevent capacity-increasing change and therefore prevent broader transfer. Strategy use may, therefore, produce problematic confounds in training studies either by making possible improvements without meaningfully increasing cognitive capacity or by preventing optimally “broad,” efficient training.
The assumption that online cognition is limited by the capacity of a domain-general attentional resource or WM system (Engle & Kane, 2004) which can be “trained” and thus improve cognitive abilities more broadly (Jaeggi et al., 2008) underlies the Capacity Hypothesis. The finding that a visualisation strategy was associated with improved memory performance might fit better with theories of WM as containing a variety of cognitive systems among which participants may choose according to task demands (Baddeley & Logie, 1999; Logie, 2011; Logie & Niven, 2012). Encouraging participants to use other sub-components of the cognitive system (e.g., visualising the strings of digits) appeared to boost performance significantly, as suggested by Logie (2012). Strategic “off-loading” from a general resource to another system might be useful by freeing up its cognitive resources (McNamara & Scott, 2001). This would not imply that a general resource cannot be trained at all, but it suggests that this resource was not necessarily trained as was assumed in many training studies.
While our results suggest that instructed strategies can play a significant role in WM performance, strategies arguably only have implications for the training literature if participants spontaneously use them during adaptive training (e.g., Dunning & Holmes, 2014), which needs to be demonstrated. Our results from the non-instructed group suggested that participants did generate and use strategies spontaneously. Both strategy type and level of detail (i.e., how elaborate the strategy was) were associated with higher performance on all three N-back tasks at post-test (see Figure 4). However, the categories used in our study did not capture all strategies (16.1% classified as “Other” across the three tasks). Strategies classified as “Other” were not associated with improved performance in either N-back tasks (compared to not using a strategy). This suggests that a substantial proportion of participants applied potentially inefficient strategies. The implications of such strategies for the training literature are unclear, and more detailed research into the causes—and consequences—of these “Other” self-generated strategies may help design better training paradigms.
Moreover, the beneficial effects of spontaneous self-reported strategies on performance may be inflated. For instance, strategies may be used more by high-capacity individuals, who have more cognitive resources available for generating effective strategies while performing the task (Dunlosky & Kane, 2007) and who may also be more likely to reap training benefits regardless of strategy use. As well, reports of strategy use could be influenced by general task motivation, if participants who tried their best on the task are also keener to provide detailed descriptions of their approaches. Therefore, explicitly manipulating strategy use via instructed strategies that participants can and do use is important to ensure that associations between performance and strategies are not driven by such confounds. Our instructed strategy manipulation suggested that most participants can benefit from using a strategy—but an unexpected limitation was the relatively large proportion of non-compliant participants, whom we excluded from the main analyses. While WM capacity appeared similar in compliant and non-compliant participants (indicated by no significant differences in pre-test composite scores), we cannot infer whether non-compliant participants were unable to apply the strategy or preferred not to. However, despite these limitations regarding the causes of whether or not a strategy is applied, our results suggest that both internally generated and externally instructed strategies can boost N-back performance. The brevity of the training session (30 min) limits the generalisability of our findings to the broader training literature, where training is typically conducted over several weeks (e.g., von Bastian & Oberauer, 2014). Another limitation of the design is that we cannot infer whether the instructed strategy improved performance because participants used it during the training session, or simply because they were exposed to it. A third group of participants who trained without a strategy, and then learnt about the strategy just after the training session, would be needed to test this. From the data we do have, it seems that younger adults in the strategy group started benefitting immediately (see Figure 2), suggesting that this specific strategy in the N-back task did not require extended practice but may be implemented right away. Nonetheless, perhaps in older adults more training with the strategy would have made it more beneficial. However, a recent study investigated the effect of the same instructed strategy during a 4-week training period, in younger adults (Fellman et al., 2020). While the beneficial effect of strategy training replicated, their results indicated that the beneficial effect of the N-back strategy was short-lived, mostly visible during the first training session. Fellman et al. speculated that the instructed training may tie the hands of the trainees too much, while the uninstructed trainees were free to develop and optimise their own strategies. It is unclear whether older adults would have been able to benefit more if exposed to such extended strategy training.
Strategy training in healthy older adults
We included healthy older-adult participants to compare their strategy use with that of younger adults, noting both similarities and differences. During training, the older adult strategy group appeared to benefit less from training than the younger-adult strategy group (see Figure 2). In the post-test, younger and older adults both benefitted from the strategy in the untrained N-back colour 3-back, and in the maximum digit N-back score. However, in the average digit N-back level attained, the older adults benefitted less, reflecting that, on average, the control group outperformed the strategy group until block 8 of 10 (see Figure 3).
Some previous studies instructing participants to apply mnemonic techniques or strategies have found more substantial training gains in younger than in older adults (e.g., Lövdén et al., 2012; Verhaeghen et al., 1992; Verhaeghen & Marcoen, 1996 but see Gross et al., 2012). Taken together, our results suggested that while both age groups at least partially benefitted from the strategy, older adults appeared to benefit more slowly, as implementing the new strategy reduced performance during early trials. If participants develop spontaneous strategies during uninstructed, regular training and younger participants generate and effectively apply them more quickly, our results might be consistent with observations of initially larger gains in younger adults, followed by comparable improvements in both age groups in the final weeks (e.g., Brehmer et al., 2012). Furthermore, a large proportion of our older adults (11 of 30) did not use the instructed strategy, possibly indicating that they found it difficult to implement. Perhaps if implementing a strategy is generally more challenging for older than younger adults, it is also more beneficial once they learn how to do it effectively. For instance, cognitive training using an episodic memory strategy task was associated with less age-related decline in white matter microstructures in healthy older adults compared to a control group, after 40 weeks (de Lange et al., 2017).
Also, it is possible that older adults struggled to implement the strategy because it was visually based—some previous research suggests that visual WM declines more in healthy ageing than verbal WM (e.g., Johnson et al., 2010). Similarly, more older than younger adults in our uninstructed control group reported using a sub-vocal Rehearsal strategy, that is, silent repetition of verbal labels for material to be recalled (see Logie et al., 1996; Wang et al., 2016). Specifically, 4 younger and 25 older adults used this strategy in the three N-back tasks combined (see Figure 4), supporting previous suggestions that older adults may rely more on verbal rehearsal even in visual WM tasks (Forsberg et al., 2019). More severe WM deficits for visuospatial material than for verbal material have been observed in older adults (e.g., Jenkins et al., 1999; Leonards et al., 2002; Myerson et al., 1999), and perhaps sub-vocal rehearsal can be used to compensate for declining visual memory. Rehearsal benefitted older adults in our digit N-back task (compared to those not using a strategy), in line with observations that older adults’ WM benefitted from verbal encoding strategies (Bailey et al., 2014). However, it was not beneficial in the letter or colour N-back tasks. Verbal rehearsal might have been most useful for the digit task because the letter set likely produced more phonological similarity effects (Salamé & Baddeley, 1986), and colour names are longer, thus less efficient to rehearse (Schweickert et al., 1990). Also, the digit N-back task was adaptive (maximum levels reached by older adults: control group
In the broader training literature, younger adults often improve more than older adults (Bürki et al., 2014; Heinzel et al., 2014; Li et al., 2008; Zinke et al., 2014)—but gains of similar magnitude on trained tasks in younger and older adults are also sometimes observed (e.g., Bürki et al., 2014; Li et al., 2008; Richmond et al., 2011; von Bastian et al., 2013; Zając-Lamparska & Trempała, 2016). However, training of executive functions appeared to yield greater training-related benefits in older than in younger adults (e.g., see Karbach & Kray, 2016; Kray & Lindenberger, 2000). Larger training gains in younger adults are thought to be consistent with animal models suggesting that older age is associated with less neuroplastic change (Blumenfeld-Katzir et al., 2011; van Praag et al., 2005). Our results suggest an alternative explanation: perhaps younger adults appear to benefit more from training because they are more adept at developing strategies. Furthermore, age differences in training gains between paradigms may be driven by differences in strategy effectiveness (e.g., visual vs verbal). The observed age differences in the effectiveness of the instructed visualisation strategy and the use of spontaneous verbal rehearsal strategies fit with literature suggesting that not all cognitive functions decline with age to the same degree (for reviews, see Logie & Morris, 2015; Perfect & Maylor, 2000). In sum, these results support the notion that overall N-back performance may reflect use of different cognitive resources in different participants (Johnson et al., 2010; Logie, 2018; Thurstone, 1931).
To conclude, our results supported Laine et al.’s (2018) conclusion that using a visualisation strategy during training improved N-back performance in younger adults. Furthermore, the strategy also at least partly improved performance in older adults. The results provided support for the Strategy Mediation hypothesis of training and suggest that strategies can enable more efficient use of a limited WM capacity, which may have various implications for the training literature and industry. Commercial training programmes need to demonstrate useful improvement beyond task-specific strategies which are unlikely to benefit the user in their everyday life. Also, confirming that the trained task and outcome measures are structurally different—ideally by demonstrating far-transfer to several different reasoning and intelligence measures—is needed to ensure that transfer effects are not strategy-specific.
Furthermore, older adults may benefit more slowly when attempting to apply a visual strategy—indeed, we found some evidence that implementing the strategy was initially associated with worse performance. While the instructed strategy did appear to somewhat benefit those older adults who were able to apply it (i.e., for maximum, but not average, digit N-back performance), our results did not generalise to the substantial proportion of older adults who chose not to implement (or perhaps were unable to implement) the instructed strategy. Furthermore, older adults spontaneously applied verbal strategies more than did younger adults (with varied success) which suggests differences in spontaneous strategies used by younger and older adults. While our paradigm could not determine if this was driven by preference or ability, it did indicate that perhaps the same training paradigm—or cognitive task, more broadly—is not always measuring the same cognitive capacity in younger and older adults.
The present results highlighted that measures of performance and capacity may largely reflect the extent to which participants apply appropriate strategies, rather than domain-general underlying constructs. Investigating strategies and accounting for individual variability (see Logie, 2018), as well as for systematic, age-related variabilities during real, long-term training, and how specific task strategies may generalise to outcome measures in unintended ways may be essential to resolving discrepancies in the cognitive training literature. On a broader level, the findings are in line with a recently proposed hypothesis, stipulating that the mechanisms underlying WM training are driven by establishment of cognitive routines in the task(s) one has been practicing (which are intertwined with increased strategy use) and that transfer from a trained task (where routine has been established) to an untrained task occurs only if both tasks require the same cognitive routines (Gathercole et al., 2019).
Supplemental Material
QJE-STD-19-210.R1-Supplementary_Materials – Supplemental material for Strategy mediation in working memory training in younger and older adults
Supplemental material, QJE-STD-19-210.R1-Supplementary_Materials for Strategy mediation in working memory training in younger and older adults by Alicia Forsberg, Daniel Fellman, Matti Laine, Wendy Johnson and Robert H Logie in Quarterly Journal of Experimental Psychology
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by a PhD studentship from The University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology, part of the cross-Council Lifelong Health and Wellbeing Initiative (MR/L501530/1). Funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and Medical Research Council (MRC) is gratefully acknowledged. D.F. reports grants from Ane and Signe Gyllenberg Foundation. M.L. received funding from the Academy of Finland (Grant nos 260276 and 323251) and the Åbo Akademi University Endowment (grant to the BrainTrain project).
Open practices
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
