Abstract
Disengaging from the external world—a phenomenon referred to as mind wandering—is a common experience that has been shown to be associated with detriments in cognitive performance across a large range of tasks. In the current web-based study, we used a continuous delayed estimation paradigm to investigate the impact of task disengagement at encoding on subsequent recall of location. Task disengagement was assessed with thought probes on a dichotomous (off- vs. on-task) and a continuous response scale (from 0% to 100% on-task). This approach allowed us to consider perceptual decoupling in both a dichotomous and a graded manner. In the first study (n = 54), we found a negative relationship between levels of task disengagement at encoding and subsequent recall of location measured in degrees. This finding supports a graded perceptual decoupling process rather than a decoupling that happens in an all-or-none manner. In the second study (n = 104), we replicated this finding. An analysis of 22 participants showing enough off-task trials to fit the data with the standard mixture model revealed that in this particular subsample, being disengaged from the task at encoding was related to worse long-term memory performance in terms of likelihood to recall but not in terms of precision with which information is recalled. Overall, the findings suggest a graded nature of task disengagement that covaries with fine-grained differences in subsequent recall of location. Going forwards, it will be important to test the validity of continuous measures of mind-wandering.
Disengaging from the external world, which occurs when our attentional focus shifts away from the perceptual world towards internal self-generated thoughts (Christoff, 2012; Schooler et al., 2011; Smallwood & Schooler, 2006), is common across situations. Killingsworth and Gilbert (2010) found that we spend around half of our daily lives engaged in self-generated thought, and further assessments in the laboratory as well as in online experiments estimate that participants are disengaged from the task at hand for around 15%–50% of the time (e.g., Brosowsky et al., 2021; Krasich et al., 2020; Kuehner et al., 2017). Task disengagement, also referred to as mind-wandering or task-unrelated thoughts, has been studied extensively during the past two decades (see Smallwood & Schooler, 2015, for a comprehensive review), and several findings have highlighted its detrimental impact on a wide range of cognitive tasks (e.g., Mrazek et al., 2012). Despite the large amount of research on the negative impact of task disengagement on current processes, including reduced perceptual processing (e.g., Baird et al., 2014; Kam et al., 2011; Smallwood et al., 2008), there is less evidence on the consequences of task disengagement at encoding on visual long-term memory. Although the central role of experimentally manipulated attention during encoding for visual memory has been established (i.e., the literature on divided attention and memory as well as selective attention and memory; see Aly & Turk-Browne, 2017 for a review), there is less evidence on the consequences of natural task disengagement on memory, which is an ecologically valid research question. We thus aimed to investigate how natural task disengagement during encoding affects visual long-term memory with a specific interest in a potential differential impact on the likelihood to recall and/or the precision with which the information is recalled.
The idea that memory varies in precision (and that it is not an all-or-none process) has prompted researchers to advance measurement methods in the field of memory research. Researchers have thus started using the continuous delayed estimation paradigm (e.g., Brady et al., 2013; Harlow & Yonelinas, 2016; Ovalle-Fresa & Rothen, 2019; Richter et al., 2016; Tamber-Rosenau et al., 2015). In this paradigm, participants are tasked to reinstate previously seen features (e.g., location) of objects on a circle. The distance between the given response at recall and the original feature at encoding is referred to as recall error (measured in degrees). Despite recall error already being a graded measure, it is not a pure measure of precision, given that both the likelihood to recall and the precision with which the feature is recalled are confounded. However, the continuous nature of the recall error allows for the calculation of error distributions, which can then be fitted by means of the standard mixture model, including a uniform distribution to estimate the likelihood to recall a feature and a von Mises distribution to estimate the recall precision of the feature (Zhang & Luck, 2008). Using the standard mixture model thus allows to disentangle the likelihood to recall and the precision of recall, i.e., it allows independent estimations of how likely one is to entirely fail to recall a feature and how accurately one recalls a feature.
For example, Harlow and Yonelinas (2016) investigated whether the two parameters likelihood to recall and precision of recall are reflected in subjective ratings of participants. They combined the delayed estimation paradigm with subjective ratings about how much is recalled and the precision of recall. They found that recall is best described by two parameters (these are the likelihood to recall and the precision of recall), which seem to be functionally independent. They call for more research taking into account these two parameters when studying memory under different conditions. Only a few studies considered the impact of attention on recall error, the likelihood to recall, and precision recall. Emrich et al. (2017) manipulated the proportion of attention allocated to items at encoding, and they found a loss in working memory precision when items were less attended. These findings provide compelling evidence for the role of manipulated attention in the precision of memories (see also LaRocque et al., 2015 using a retro-cue design).
Research on natural task disengagement and subsequent long-term memory performance shows that memory performance is impacted by disengagement at encoding (e.g., Metcalfe & Xu, 2016; Risko et al., 2012; Xu et al., 2018). In a seminal study, Seibert and Ellis (1991) found that the proportion of task-unrelated thoughts and memory performance assessed with a free recall task of letter strings were inversely related. These findings were later supported by Smallwood et al. (2006), who investigated the impact of disengagement at encoding on both familiarity and recollection processes. Disengagement at encoding seemed to particularly affect recollection processes (Smallwood et al., 2006). Following a depth-of-encoding approach, Thomson et al. (2014) showed that the frequency of off-task trials during deep meaning-based encoding was negatively correlated with memory performance in a subsequent recognition task, whereas no such relation was observed during surface perceptual-based encoding. Current research extends these findings to other memory tasks, such as memory updating and memory for change. Garlitch and Wahlheim (2020) showed that disengagement at encoding was negatively associated with memory performance, including change recollection. These studies provide evidence that natural task disengagement reduces memory performance when assessed with binary tasks (e.g., old/new recognition tasks). However, whether natural task disengagement during encoding affects memory performance in a gradual manner (assessed with a continuous delayed estimation task), as well as the precision of visual long-term memory representations remains to be empirically investigated.
To assess task disengagement probe-caught methods have been used. Critically, different research groups investigating the effects of disengagement at encoding on subsequent memory performance (e.g., Garlitch & Wahlheim, 2020) successfully used self-report measures of attention. These results highlight that probe-caught methods are a valuable approach when studying the impact of disengagement at encoding on subsequent memory performance. However, there is a current concern in the field regarding the validity of mind-wandering measurements (e.g., Kane et al., 2021; Weinstein, 2018). Kane et al. (2021) investigated the construct validity of different thought probe types. Overall, they found that it is possible to assess task disengagement via thought probes. However, they also found an impact of probe type on the answers, thus urging the field to carefully select their thought probes and to consider construct validity. We used a classical binary task in which participants had to report whether they were off- or on-task during the ongoing encoding task. After the classical thought probe, we presented a continuous off–on-task slider tasking participants to detail their answer. This approach allowed us to use incongruencies between the answers as manipulation check and—notably—to test whether memory performance covaries not only with binary self-reports but also with a continuous measure of attention. A continuous measurement of task disengagement might be a valuable additional method for both the mind-wandering and memory research fields, as it provides fine-graded insights into the state of attention during an ongoing task.
Next to the main empirical question, this study addressed a methodological issue concerning the role of experimental assistance in web-based experiments. There is an increasing need for online experimental solutions, and there are several reasons for and advantages of carrying out online studies, ranging from recruiting more diverse participants to continuing with research projects in the midst of a pandemic situation (Sauter et al., 2020). In contrast to laboratory experiments, in online experiments, participants are usually not assisted by an experimenter, and it is assumed that the instructions are autonomously understood. Notably, the presence of an experimenter might impact the levels of task disengagement. Indeed, the presence of an experimenter could work as a motivator, and motivation has been shown to affect the rates of task disengagement (Rummel & Nied, 2017; Unsworth & McMillan, 2013). We aimed to investigate the impact of assistance from an experimenter during the instructions on data quality and thus adapted the continuous delayed estimation paradigm, a classical laboratory experiment, to two different experimental settings, both conducted online (i.e., a not-assisted condition and a phone-assisted condition). The not-assisted condition was a classical online version; participants were invited to take part in the study and could participate whenever they wanted. Under the phone-assisted condition, the experimenter accompanied the participant by phone to ensure that the instructions were understood. Apart from assistance, the experiment and instructions were identical in the two experimental settings. The design of the present study allowed the investigation of the impact of assistance from an experimenter on task disengagement and memory performance.
To sum up, the main goal of our study was to investigate whether natural task disengagement at encoding alters visual long-term memory measured in a continuous response space. In Study 1, we manipulated the experimental setting (not-assisted condition and phone-assisted condition), in Study 2, all participants were tested without assistance. In the memory task, participants were instructed to report their degree of disengagement at the encoding of objects presented at a random location on a centred circle and to memorise the exact location on the circle of the objects for subsequent retrieval. Based on previous studies that examined the impact of natural fluctuations of attention at encoding on subsequent long-term memory performance (e.g., Garlitch & Wahlheim, 2020), we predicted that task disengagement at encoding is associated with lower memory performance. Concerning the experimental assistance, we expected more task disengagement and consequently generally poorer memory performance under the not-assisted condition compared to the phone-assisted condition (e.g., Rummel & Nied, 2017; Unsworth & McMillan, 2013). Following Harlow and Yonelinas (2016) we were especially interested whether natural task disengagement at encoding alters the likelihood to recall and/or the precision with which information is recalled. To address this question, we conducted a first online study that we preregistered on the Open Science Framework (https://osf.io/kh49x). Contrary to our preregistration, we could not model recall errors separately for off-task vs. on-task trials, to distinguish the likelihood to recall and the precision of recall when off-task vs. on-task at encoding. Consequently, all the analyses that we report for Study 1 were not preregistered. We then conducted Study 2 (https://osf.io/2eudm), which allowed us to replicate the findings of Study 1 and to investigate whether task disengagement at encoding is associated with larger recall errors attributed to both a loss in the likelihood to recall and a loss in precision of recall in a subsample of 22 participants.
Study 1
Method
Participants
The required sample size and exclusion criteria were preregistered on the Open Science Framework (https://osf.io/kh49x). The a priori G*Power analysis indicated a required sample size of 54 participants to detect a medium effect in an analysis of variance (ANOVA) with repeated measures (within-between interaction; parameters: f = .25, α = .05, 1−β = .95, number of groups: 2, i.e., not assisted vs. phone assisted, number of measurements: 2, i.e., off-task vs. on-task; Faul et al., 2007). We thus aimed for valid data from 27 participants in the not-assisted condition and 27 participants in the phone-assisted condition. We tested a total of 30 participants under the not-assisted condition and 27 participants under the phone-assisted condition. Two participants under the not-assisted condition did not complete the experiment and were excluded from further analyses. One further participant under the not-assisted condition had to be excluded because he showed an inconsistency between binary and continuous attention responses of 40.48% (i.e., above 20%). All other participants showed inconsistencies below 20% (Minconsistency = 1.26%, SDinconsistency = 3.09), and there was no significant difference in the inconsistencies between the not-assisted and phone-assisted conditions, independent t-test, t(52) < 1. Furthermore, no participant relied on pure guessing to solve the memory task, as the error distributions of all participants were significantly different from a uniform distribution, Rayleigh tests, all r0s ⩾ .149, all ps < .001.
In the final sample, 27 participants (Mage = 37.7 years, SDage = 9.37, 66.7% female) were included under the not-assisted condition and 27 (Mage = 34.0 years, SDage = 8.71, 74.1% female) under the phone-assisted condition. We used English and German versions of the experiment. Eight participants under the phone-assisted condition chose the English version.
Participants were recruited from the participant pool of our institute and from the acquaintances of the experimenters. They received course credits for participation. The study was approved by the local ethics committee and participants were informed before they consented to participate that they could withdraw at any time during the experiment.
Design
In this web-based study, we manipulated the assistance of an experimenter in the not-assisted condition and the phone-assisted condition. Under both conditions, participants worked on the same memory task constituting 10 blocks. Each block consisted of an encoding phase, followed by a recall phase. During the encoding phase, we used thought probes to assess whether participants were on-task while performing the ongoing encoding task. During the recall phase, we measured memory performance as recall error in degrees (difference between the original location and response).
Apparatus and material
The study was programmed and run with lab.js. This is a freely available open-source programme for browser-based experiments (https://lab.js.org/, Henninger et al., 2020).
Memory task
The stimuli consisted of 850 images of natural objects randomly selected from a set of 948 validated (i.e., recognisable) images in the BOSS database (Brodeur et al., 2010, 2014). The images were resized to 25% of the original size and compressed on level 9 (Bimp plugin in Gimp, version 2.10.20). The 850 stimuli were used to create 10 lists, each including 85 randomly chosen images.
Task disengagement
We used online thought probes to assess the current attention directed to the encoding task. The thought probe question was, “Where were you with your attention?” Responses were given with a binary selection represented by radio buttons (i.e., off-task vs. on-task; binary) and on a continuous slider (percentage values from 0 = off-task on the left to 100 = on-task on the right; continuous). In the following, we refer to these two measures in terms of binary attention and continuous attention. The study also comprised retrospective items about mind-wandering as well as the Spontaneous and Deliberate Mind Wandering Scales (SDMWS; Carriere et al., 2013; Martarelli et al., 2020) that were included for explorative purposes and are not further considered in this article.
Procedure
The experiment was conducted online and lasted about 1 hr and 30 min for each participant. The not-assisted condition was a classical online study. Participants received the link and could take part in the study whenever they wanted. Under the phone-assisted condition, the experimenter accompanied the participant by phone. The participant and the experimenter talked before the experiment to ensure that the former understood the instructions. The phone call was then disconnected for the task and started a second time during a scheduled break in the middle of the experiment, and a third time at the end of the experiment. Except for the accompaniment by phone, the study procedure was equal in both assistance conditions.
The memory task consisted of 10 blocks. After 5 blocks, participants were asked to take a short break. Each block consisted of an encoding phase and a recall phase.
During the encoding phase, participants were presented with 85 trials in randomised order (see Figure 1 for an example of stimuli). Each trial started with a centred fixation cross for 500 ms, followed by the encoding screen for 3 s. The encoding screen consisted of an object (presented in an area of 150 × 150 pixels) presented at a random location on a centred grey circle (radius of 300 pixels). After 3 s, the next trial or the thought probe was presented. Participants were instructed to memorise the exact location on the circle of the presented objects for later retrieval in a recall task.

Sample of the experimental procedure. During encoding, 85 objects were presented for 3 s, separated by a fixation cross for 500 ms. After each encoding phase, the recall phase of about 24 objects followed. The objects appeared in the centre of the circle, and participants were tasked to reinstate the location of the original object.
The thought probe was presented after every 5 to 16 encoding screens, randomly sampled from a uniform distribution of a minimum of 5 and a maximum of 16. This procedure resulted in thought probes after 10% of all encoding trials (Mnumber of thought probes = 80.31, SDnumber of thought probes = 5.31). Participants responded to the binary thought probe first by clicking on one of the radio buttons and then to the continuous thought probe by positioning the mouse pointer on the slider. Both thought probes were presented on individual slides. The next trial was initiated by clicking the “further” button after the continuous response probe.
During the recall phase, participants were presented with the three trials preceding the thought probes (around 24 trials, depending on the exact number of thought probes in each encoding block). Overall, participants were presented with a mean of 240.95 recall trials (SDoverall recall trials = 15.92). Each trial started with a centred fixation cross for 500 ms, followed by the recall screen. Here, the object was presented at the centre of the screen (in an area of 150 × 150 pixels), surrounded by the grey circle also shown during encoding (radius of 300 pixels) (see Figure 1). Participants were tasked with positioning the object at the exact location the object was presented during encoding by moving it with the mouse. The object could only be placed on the circle. A click on the “further” button confirmed the position and initiated the next trial. Trials were presented in randomised order. After the memory task, participants provided their demographic information.
Results
We report the results according to the following structure. First, we report descriptive statistics. We fitted the standard mixture model (Zhang & Luck, 2008) to the recall errors to investigate the effect of experimental assistance (not assisted vs. phone assisted) on the likelihood to recall (guess rate, modelled as a uniform distribution) and on the precision of recall for successfully remembered locations (precision, modelled as a von Mises distribution). Guess rate and precision were estimated using a Bayesian Markov Chain Monte Carlo (MCMC) approach with the functionalities of the CatContModel package (Hardman, 2017; see also Hardman et al., 2017). We ran three chains of 10,000 iterations per condition (not assisted vs. phone assisted); the first 500 iterations were removed as burn-in, and we used uninformative priors. Then, we report mixed model analyses by considering the impact of experimental assistance on the task disengagement measures, and group comparisons for investigating the impact of experimental assistance on memory performance (recall error, guess rate, and precision). Finally, the main research question regarding the effect of task disengagement on subsequent memory performance is addressed. Particularly, we consider the impact of both the binary measure of attention and the continuous measure of attention at encoding on subsequent recall error. The analyses were computed with jamovi (The jamovi project, 2021) and R (R Core Team, 2020). The alpha level was set to .05 for all analyses. The dataset and the code to reproduce the main analyses are available on OSF (https://osf.io/6vnj3/).
Descriptive statistics
The main dependent variable was recall error, reflecting the angular difference between the original location of the objects at encoding and the adjusted location of the objects at recall. Recall error was measured in degrees and could lie between 0° (no deviation) and ±180° (full deviation). The distributions of errors for the off-task and on-task trials per condition can be seen in Figure 2.

Distribution of errors (difference between responses and original locations) per condition (not assisted vs. phone assisted) for off-task and on-task attention at encoding. The dashed lines represent the threshold of guessing across all attention responses separated per condition (not assisted vs. phone assisted).
The standard mixture model was estimated on the recall errors per condition (not assisted vs. phone assisted) across all trials. This analysis revealed mean posteriors for the guess rate of 0.41, 95% Bayesian credible intervals [0.32, 0.51] for the phone-assisted, and 0.41, 95% Bayesian credible intervals [0.30, 0.53] for the not-assisted condition. According to the guess rates (mean posteriors), recall errors ⩾ ±25.83 and ⩾ ±24.87 degrees could be considered as guessed under the phone-assisted and not-assisted conditions, respectively. Dashed lines in Figure 2 mark the thresholds of guessing in the recall error distributions. Note that we did not exclude any response according to this threshold. The precision posteriors were 13.47, 95% Bayesian credible intervals [12.36, 14.73] for the not-assisted and 14.35, 95% Bayesian credible intervals [13.45, 15.39] for the phone-assisted condition. We were not able to fit the standard mixture model per binary attention response (off-task vs. on-task), as we had preregistered, because 46 participants out of 54 had less than 90 off-task (n = 45) or on-task (n = 1) responses, thus not reaching the minimum of trials per participant and condition for the model to converge.
Descriptive statistics of the task disengagement measures per assistance condition (not assisted vs. phone assisted) are reported in Table 1 and Figure 6a. Overall, participants reported on-task attention for 81.63% (SD = 18.68%) of the time according to the binary responses to the thought probes. In terms of continuous attention, they reported a mean of 68.27% (SD = 15.08%) of the attention directed to the task. These rates of on-task attention are similar to those in previous studies in laboratory settings (see e.g., Brosowsky et al., 2021; Krasich et al., 2020; Kuehner et al., 2017).
Descriptive statistics for Study 1 (n = 27 in each condition, i.e., not-assisted and phone-assisted, except for nrecall error off-task = 20 in the not-assisted condition and nrecall error off-task = 22 under the phone-assisted condition) and for Study 2 (n = 104, except for nrecall error off-task = 76, due to missing values). Values in round and squared brackets represent standard deviations and 95% credible intervals, respectively.
We assessed the internal consistency of our measurements (i.e., recall error, binary attention, and continuous attention) per condition (not assisted vs. phone assisted). To this end, we computed a permutation split-half correlation procedure with the R package split-half (Parsons, 2021) in 5,000 random splits per condition. The Spearman–Brown corrected reliability estimates (95% CI in squared brackets) for the not-assisted condition were rSB = .98 [0.96, 0.99] for recall error, rSB = .97 [0.94, 0.99] for binary attention, and rSB = .98 [0.97, 0.99] for continuous attention. The reliability estimates for the phone-assisted condition were rSB = .97 [0.95, 0.98] for recall error, rSB = .95 [0.92, 0.98] for binary attention, and rSB = .98 [0.97, 0.99] for continuous attention. The high correlation values indicate high internal consistency of all measures and are in a similar range as reliability estimates found in other studies (see Belardi et al., 2022; Kane et al., 2016; McVay & Kane, 2009 with reliability estimates ranging from .89 to .93).
Experimental assistance
Effect of experimental assistance on binary attention
First, we computed a logistic mixed model analysis (not preregistered) with assistance (not assisted vs. phone assisted, coded as 0 for not assisted and 1 for phone assisted) as a fixed effect and by-participant random intercept. The dependent variable was the binary attention variable. The descriptive statistics are reported in Table 1. The main effect of the assistance condition was not significant, (b = −.679, SE = 0.691, 95% Exp (B) CIs [0.131, 1.962], p = .325).
Effect of experimental assistance on continuous attention
Then, we computed a linear mixed model analysis (not preregistered) with assistance (not assisted vs. phone assisted, coded as 0 for not assisted and 1 for phone assisted) as a fixed-effect and by-participant random intercept. Continuous attention was the dependent variable. The descriptive statistics are reported in Table 1. The main effect of the assistance condition was not significant (b = −1.766, SE = 4.139, 95% CIs [−9.877, 6.346], p = .671), thus leading to the same result (in terms of significance) as the analysis with binary attention reported above. Experimental assistance in our study does not seem to play a role in task disengagement.
Effect of experimental assistance on memory performance
We then tested whether the experimental assistance (not assisted vs. phone assisted) impacted memory performance in terms of guess rate and precision (not preregistered). T-tests for independent samples revealed no significant difference for guess rate, t = 0.02(52), p = .986, d = .01 nor for precision, t = 1.81(52), p = .076, d = .49. Experimental assistance seemed not to influence memory performance, neither in the likelihood to recall nor in the precision with which information is recalled.
Attention and memory performance
Effect of binary attention on memory performance
We computed a mixed model analysis (not preregistered) with assistance (not assisted vs. phone assisted, coded as 0 for not assisted and 1 for phone assisted), binary attention (off-task vs. on-task, coded as 0 for off-task and 0 for on-task), as well as the interaction assistance by binary attention as fixed effects and by-participant random intercept and random slope. The dependent variable was absolute recall error. Given that the scales of the variables differ, we z-transformed all variables. Descriptive statistics are reported in Figure 6a. The analysis revealed a significant main effect of binary attention (b = −.311, SE = 0.044, 95% CIs [−0.398, −0.255], p < .001), showing larger recall errors (i.e., worse memory performance) when participants were off-task (M = 54.05, SD = 22.10) compared with on-task during encoding (M = 38.97, SD = 16.70). The main effects of the assistance condition (b = .012, SE = 0.096, 95% CIs [−0.176, 0.200], p = .903), and the interaction (b = −.102, SE = 0.088, 95% CIs [−0.275, 0.071], p = .257), were non-significant.
Effect of continuous attention on memory performance
Next, we computed the same mixed model analysis as above (not preregistered) with the continuous attention variable as a fixed effect instead of binary attention. The results revealed a similar pattern of results, i.e., a significant effect of continuous attention; the more participants were on-task, the lower was the recall error (b = −.192, SE = 0.022, 95% CIs [−0.235, −0.148], p < .001). The main effects of the assistance condition (b = .003, SE = 0.048, 95% CIs [−0.091, 0.098], p = .943), and the interaction (b = −.004, SE = 0.022, 95% CIs [−0.047, 0.040], p = .870), were non-significant. The significant effect of attention is plotted in Figure 3. This analysis is important because it shows there is a continuous relation between attention at encoding and performance in a subsequent recall task. However, when visualising the relationship between continuous attention and absolute recall error (see scatterplots of each participant in the supplementary material, https://osf.io/6vnj3/), it remains open whether the relationship is truly linear on an individual level. We consider this point in the final discussion of this article.

The dependent variable is the absolute recall error, which is measured in degrees and can range from 0° (no deviation) to 180° (full deviation). Continuous attention varies from 0 (off-task) to 100 (on-task). Variables are z-transformed. Random effects are plotted by participant and 95% confidence interval is displayed.
Effect of task disengagement at different time windows before the thought probes
The results thus far have illustrated that task disengagement during encoding in a time window of 10.5 s before reported attention (i.e., the three trials preceding the thought probes [3 × 3 s] including fixation crosses [3 ± 500 ms]) has an impact on subsequent memory performance. We selected this time window because previous research has found the most robust mind-wandering-related effects in a similar time window of 10 s (Krasich et al., 2020). We computed the same mixed-model analyses as reported above with absolute recall error as a dependent variable and the additional factor time window (with three levels: a time window of 0.5 to 3.5 s, i.e., the trial immediately preceding a thought probe, a time window of 4 to 7 s, i.e., the second last trial preceding a thought probe, and a time window of 7.5 to 10.5 s, i.e., the third last trial preceding a thought probe) as a fixed effect (not preregistered analyses). Time window turned out to be non-significant in the two models, (b = −.010, SE = 0.008, 95% CIs [−0.026, 0.006], p = .236, in the first model with binary attention and b = −.012, SE = 0.008, 95% CIs [−0.028, 0.004], p = .148 in the second model with continuous attention). Adding time window to the models did not change the other results in terms of significance.
Discussion
Contrary to our hypotheses we could not find an impact of experimental assistance (not assisted vs. phone assisted) on rates of attention (both binary and continuous). Furthermore, experimental assistance had no impact on memory performance (recall error, guess rate, and precision). The main finding of our first study is the detrimental effect of task disengagement at encoding on subsequent long-term recall of location. The detrimental effect was consistent over three trials presented before the thought probes, which reflects an overall time window of about 10 s. The effect seems to be gradual (see results with continuous attention responses) thus suggesting that participants were not perceptually decoupled from the external world in an all-or-none manner. A large amount of the participants (46 out of 54 participants) showed less than 90 off-task or on-task trials, 1 so that we were unable to compute the standard mixture model and thus to investigate whether natural task disengagement at encoding alters the likelihood to recall and/or the precision of recall.
Study 2
To investigate our originally preregistered hypothesis concerning the likelihood to recall and the precision of recall as reflected by the guess rate and precision parameters in the standard mixture model, we decided to replicate the not-assisted condition of Study 1 in a larger sample. As we observed no effect of assistance (not assisted vs. phone assisted) on recall errors, we decided to collect the data with the not-assisted condition only. We preregistered this second Study on OSF (https://osf.io/2eudm). Based on the results of our first study, we expected disengagement at encoding to be associated with larger recall errors. Based on previous studies that directly manipulated attention at encoding (e.g., Emrich et al., 2017), we predicted that disengagement at encoding is associated not only with a lower likelihood to recall but also with lower precision of recall. To sum up, the main goal of Study 2 was to replicate findings of Study 1 and to investigate whether natural task disengagement at encoding alters the likelihood to recall and/or the precision of recall.
Method
Participants
The required sample size and exclusion criteria were preregistered on OSF (https://osf.io/2eudm). The guess rate and precision differences between on- and off-task trials in our first study (subsample of 15 participants) were large (Cohen’s d of 1.08 for guess rate and of 1.19 for precision). Based on these effect sizes, we would need a sample of 16 participants (parameters of G*Power analysis for a dependent t-test: Cohen’s d = 1, α = .05, 1−β = .95, two-tailed, Faul et al., 2007). To assure to achieve this sample size or a larger sample size for the model (i.e., at least 15 participants with more than 90 on- and off-task trials), we planned to test as many participants as possible from September 2021 to March 2022 via the participant pool of UniDistance Suisse. We expected to achieve a total sample size between 100 and 200 participants. None of the participants took part in Study 1.
The link to the task was retrieved 176 times between September 2021 and March 2022 and we were able to collect 124 complete datasets. From these, we removed eight second attempts from the same participants (we kept the first attempt in this case). From the complete and unique 116 datasets, we removed data of three participants because their responses relied on pure guessing to solve the memory task (distribution of recall errors was not significantly different from a uniform distribution, Rayleighs test all r0s ⩾ .13, all ps ⩽ .002). From the 113 remaining participants (Rayleigh tests, all r0s ⩽ .07, all ps ⩾ .051), we finally removed nine participants who did not respond congruently to the binary and continuous attention responses (more than 20% inconsistencies). The final sample of Study 2 thus consisted of 104 participants with a mean age of 38.80 years (SD = 11.54, 82.7% female). All participants were tested in the German version of the task, and they received course credits for participation. The study was approved by the local ethics committee and participants were informed before they consented to participate that they could withdraw at any time during the experiment.
Procedure, apparatus, and material
We used the exact same task as presented in Study 1 (not-assisted condition), including demographics.
Results
In the first step, we report the descriptive statistics, followed by the mixed-model analyses considering the effect of task disengagement on subsequent memory performance (recall error). We considered all participants from Study 2 for these analyses, which were not preregistered. In the second step, we report the results addressing the impact of task disengagement on the likelihood to recall and the precision of recall. Here, we considered all participants with more than 90 on- and off-task responses from both studies (preregistered analyses). We fitted their recall errors separately for on- and off-task responses with the standard mixture model (Zhang & Luck, 2008) to investigate the effect of task disengagement on the likelihood to recall (guess rate, modelled as a uniform distribution) and on the precision of recall for successfully remembered locations (precision, modelled as a von Mises distribution). Again, we followed a Bayesian MCMC approach using the CatContModel package (Hardman, 2017; see also Hardman et al., 2017). We ran three chains of 10,000 iterations per off-and on-task trial; the first 500 iterations were removed as burn-in, and we used uninformative priors. The analyses were computed with jamovi (The jamovi project, 2021) and R (R Core Team, 2020). The alpha level was set to .05 for all analyses. The dataset and the code to reproduce the main analyses are available on OSF (https://osf.io/m5fsd/).
Descriptive statistics
The descriptive statistics for all task disengagement and memory measures are reported in Table 1. The distributions of recall errors for the off-task and on-task trials can be seen in Figure 4. The threshold for guesses, estimated with the guess rate (mean posterior, see Table 1) from the standard mixture model (Zhang & Luck, 2008) across all trials (i.e., ignoring whether a trial was off- or on-task) per participant, was at ±26.05 degrees. From a descriptive point of view, the results were comparable to the results of Study 1.

Distribution of recall errors (difference between responses and original locations) for off-task and on-task attention at encoding. The dashed lines represent the threshold of guessing across all attention responses.
Again, we checked for internal consistency of our measurements (i.e., recall error, binary attention, and continuous attention). The split-half correlation procedure across 5,000 random splits (R package split-half, Parsons, 2021) revealed high Spearman–Brown corrected reliability (95% CI in squared brackets): rSB = .97 [0.97, 0.98] for recall error, rSB = .97 [0.96, 0.98] for binary attention, and rSB = .99 [0.98, 0.99] for continuous attention. As in Study 1, the high correlation values indicate the high internal consistency of the task measures.
Effects of binary attention on memory performance
The descriptive statistics for absolute recall error as a function of binary attention are plotted in Figure 6a. We computed the same mixed model analysis (not preregistered) as in Study 1 with binary attention (off-task vs. on-task, coded as 0 for off-task and 0 for on-task) and time window (with three levels: a time window of 0.5 to 3.5 s, i.e., the trial immediately preceding a thought probe, a time window of 4 to 7 s, i.e., the second last trial preceding a thought probe, and a time window of 7.5 to 10.5 s, i.e., the third last trial preceding a thought probe) as fixed effects and by-participant random intercept and random slope. The dependent variable was absolute recall error. We z-transformed all variables. The analysis revealed a significant main effect of binary attention (b = −.161, SE = 0.012, 95% CIs [−0.185, −0.137], p < .001), showing larger recall errors (i.e., worse memory performance) when participants were off-task (M = 60.45, SD = 21.98) compared with on-task during encoding (M = 40.76, SD = 1.82). The main effects of time window (b = .003, SE = 0.006, 95% CIs [−0.008, 0.015], p = .579) were not significant. This result confirmed the findings of Study 1.
Effects of continuous attention on memory performance
Next, we computed the same mixed-model analysis (not preregistered) with continuous attention and time window as fixed effects and by-participant random intercept and random slope as computed with the data of Study 1. The results revealed a significant effect of continuous attention (b = −.236, SE = 0.016, 95% CIs [−0.266, −0.205], p < .001), indicating that less task disengagement during encoding was related to smaller recall errors (i.e., more accurate recall) and thus replicating the findings of Study 1 with a larger sample (n = 104). The significant effect of attention is plotted in Figure 5. Time window again turned out to be non-significant (b = .003, SE = 0.006, 95% CIs [−0.008, 0.014], p = .578). We report individual scatterplots visualising the relationship between continuous attention and absolute recall error in the supplementary material (see https://osf.io/6vnj3/).

The dependent variable is the absolute recall error, which is measured in degrees and can range from 0° (no deviation) to 180° (full deviation). Continuous attention varies from 0 (off-task) to 100 (on-task). Variables are z-transformed. Random effects are plotted by participant and 95% confidence interval is displayed.
Effect of binary attention on guess rate and precision (subsample from both studies)
All the analyses that we report beginning from here were preregistered (https://osf.io/2eudm). We aimed to investigate how binary attention at encoding affected the guess rate and precision of recall. To this end, we fitted the recall error distributions to a standard mixture model (Zhang & Luck, 2008) separately for each binary attention response (off-task vs. on-task) and each participant. We considered only those participants with more than 90 trials per condition (off-task vs. on-task) for the following analyses because the model would not converge for participants with fewer trials in one of the conditions. We pooled participants from both studies and obtained a subsample of 22 participants for these analyses (three participants from the not-assisted condition of Study 1, five from the phone-assisted condition of Study 1, and 14 from Study 2) with totally 2,640 off-task and 2,733 on-task trials. Descriptive statistics for the recall errors of the binary attention measure (off-task vs. on-task trials) are shown in Figure 6b. A t-test for paired samples showed larger recall errors for off-task (M = 59.53, SD = 23.02) than on-task trials (M = 44.51, SD = 19.57), t(21) = 5.74, p < .001, d = 1.22.

Descriptive statistics for memory measures as a function of binary attention (off-task vs. on-task). a) Absolute recall errors in degrees across participants reporting off-task attention (n = 42 for Study 1, n = 76 for Study 2) separated by assistance (phone-assisted vs. not-assisted for Study 1) and the data of Study 2 (not assisted). b) Absolute recall errors in degrees for the subsample with at least 90 trials per condition (n = 22). Large points represent the mean recall error per condition, and small points show the mean recall errors per participant. Error bars indicate SE. c) and d) The estimated parameters of the standard mixture model for the same subsample (n = 22). c) Mean posteriors of the parameters’ guess rate (g) and d) precision (SD). Error bars for mean posteriors represent 95% Bayesian credible intervals. Lower values correspond to better performance in all memory measures.
The mean posteriors and 95% Bayesian credible intervals of the estimated parameters’ guess rate (g) and precision (SD) per binary attention (off-task vs. on-task) are shown in Figure 6c and d. Smaller values on both measures are indicative of better performance. The t-tests for paired samples across parameters revealed a significantly higher guess rate for off-task trials (M = 0.66, SD = 0.29) than for the on-task ones (M = 0.46, SD = 0.24), t(21) = 6.67, p = .001, d = 1.42. In contrast, precision was comparable for off-task trials (M = 13.51, SD = 1.10) and for the on-task ones (M = 13.29, SD = 1.36), t(21) = 0.94, p = .360, d = .20. Note that non-parametric tests (Wilcoxon signed rank exact test) also revealed significant differences between off- and on-task trials for guess rate, z = 4.65, p < .001, but not for precision, z = .83, p = .406. In this particular subsample, task disengagement during the encoding of locations seemed to affect the amount of information kept in memory (i.e., guess rate) but not the precision in which the location of the object was remembered.
Discussion
In Study 2, we replicated the findings of Study 1—task disengagement at encoding (both binary and continuous) was associated with larger recall errors in the subsequent memory task. In addition, we were able to run the planned mixture model to distinguish the likelihood to recall and the precision of recall with a subsample of 22 participants. This analysis revealed that task disengagement was associated with higher guess rates but not with a loss in recall precision. In other words, contrary to our hypothesis, we found that being disengaged decreased memory in terms of quantity (guess rate) but not the precision of retrieved locations in this subsample of 22 participants.
General discussion
Our study aimed to investigate the link between task disengagement during encoding and subsequent recall of spatial information by combining the methods of probe-caught and the continuous delayed estimation paradigm. We found that periods of task disengagement were associated with larger recall errors. These findings fit well with those of other studies, particularly those showing the detrimental impact of task disengagement during cognitively demanding tasks (McVay & Kane, 2009; Rummel & Boywitt, 2014).
Analyses including all participants (n = 54 in Study 1 and n = 104 in Study 2) showed a gradual decrease in memory performance the more participants were disengaged from the encoding task, thus highlighting that off-task encoding processes were not perceptually decoupled from the external world in an all-or-none manner. During task disengagement, attention is directed away from the task at hand, which reduces the processing of visual information (Schooler et al., 2011). Whereas previous studies have shown that early visual processes seem to be affected by reduced processing (Baird et al., 2014; Kam et al., 2011; Smallwood et al., 2008), more recent studies have not observed such an effect, implying a gradual rather than complete suppression of visual processes during task disengagement (Groot et al., 2021). This gradual suppression is incorporated in the levels of inattention hypothesis introduced by Schad et al. (2012), proposing different degrees of perceptual decoupling varying from weak to deep levels of task disengagement. Critically, the authors propose that weak perceptual decoupling is characterised by high-level processes being decoupled from the external world but not low-level processes. In our task, the objects (presented on a white background) might have worked as low-level cues, and participants might thus still have a general representation of the objects’ location even when they were not fully on-task at encoding. Krasich et al. (2020) showed that during off-task trials, the eyes were more attracted by salient low-level features than was the case during on-task trials. Their study indicates that the visual system can still track salient low-level features during certain levels of task disengagement. Our results fit well with their account and lend support to the graded nature of task disengagement, which predicts fine-grained differences in subsequent memory performance.
We measured task disengagement via the thought probes embedded during encoding. A key problem associated with such measures is that they do not always accurately reflect internal states (Schooler & Schreiber, 2004). One typical way to overcome this issue is to combine self-reports with, for example, behavioural markers. In our study, we considered a time window of 10.5 s prior to the thought probes. We found that both a classical assessment of task disengagement (binary off–on task attention) and a continuous measure of attention predicted subsequent memory performance, thus confirming the validity of these measures. Our findings support previous research using self-report measures in memory research (e.g., Smallwood et al., 2006) and extend previous findings in that our results show that participants were able to detail their task disengagement response with a slider representing values from 0 = off-task to 100 = on-task. It has, however, to be considered that in our studies participants responded to the continuous attention response scale after the classical binary attention report. Given the fixed order in our study (binary report always preceding continuous report), we do not know how participants would respond to continuous attention items in isolation (without a preceding binary attention response scale). There are further points to be considered before recommending using continuous scales to assess task disengagement. Visualising individual scatterplots for the continuous attention and recall error relationship (see scatterplots in the supplementary material, https://osf.io/6vnj3/) showed that individuals had different distributions of continuous attention responses. Furthermore, although our mixed model analyses revealed a negative linear association between continuous attention at encoding and absolute recall errors in the subsequent memory task at a group level, the individual scatterplots do not always show a linear relationship between the two variables. This absence of a linear relationship at the individual level reflects results by Kane et al. (2021), who found no association between graded probes (5-point Likert-type scale) and response time variability on a sustained attention to response task (SART). Moreover, the authors found that graded probes are less valid, and might be confounded with confidence ratings when compared to content probes. More generally, the authors question graded probes in terms of their psychological meaningfulness and propose that mind-wandering is not a graded experience. However, despite the large differences on an individual level, our results revealed a relationship between the continuous attention measure at encoding and subsequent recall (the more the participants were disengaged from the encoding task, the larger their subsequent recall errors). Further studies are thus needed to fully understand whether individuals can rate task disengagement on a continuous scale, how they interpret such ratings, and finally, whether mind-wandering is a continuous construct at all.
In Study 1, we compared a classical online setting with an online setting with the assistance of an experimenter by phone. We were expecting that assistance by phone would help in better understanding the instructions and serve as a motivator for a challenging memory task lasting for 1 hr and 30 min. Indeed, situational factors, such as motivation, have been identified to affect the rates of task disengagement (Rummel & Nied, 2017; Unsworth & McMillan, 2013). We found no difference between the two conditions in terms of both task disengagement and memory performance. Moreover, the match between the responses to binary and continuous thought probes under both assistance conditions and the overall good memory performance did not differ between assistance conditions. These results showed that participants equally complied with the instructions under both conditions. Our precautions to enhance compliance (phone-assisted condition) were found to be unnecessary in the specific situation of our study.
In an analysis of 22 participants showing enough off-task trials to fit the data with the standard mixture model, we observed that task disengagement at encoding was associated with a higher guess rate but not with reduced precision at recall. These results suggest that natural task disengagement at encoding affects the likelihood to recall (more items were completely lost when compared to task engagement) but not the precision of recall (the remembered items were equally precise). It is important to note that these analyses were carried out on a subsample that showed more task disengagement during encoding than most participants, thus making conclusions about the broader population impossible. It remains conceivable that individuals who are less disengaged during encoding show a difference in terms of precision of the remembered items in off-task vs. on-task trials.
To summarise, in this study, we tracked the role of task disengagement in how detailed spatial information is remembered. We used visual stimuli, and the features of the encoding material, such as high contrast, possibly partially attracted and sustained attention through mechanisms outside deliberate cognitive control (Christoff et al., 2016). Future research is needed, as these insights are crucial to understanding how natural task disengagement influences visual long-term memory. Given the substantial amount of time, we spend disengaged from the external world, systematically monitoring natural fluctuations of attention is critical (see Wolff & Martarelli, 2020, for a similar proposition) to better understand how individuals remember visual information in the laboratory as well as in daily life. Most empirical studies and theoretical models assume on-task encoding processes, while in reality, we spend a considerable amount of our daily lives somewhat disengaged from the perceptual world.
Footnotes
Acknowledgements
We thank Lukas Schumacher for programming a first version of the memory task. Valentina Triantafyllidou and Sina Jossen for help with data collection. All participants for taking part to the study, all anonymous reviewers, and Michael Kane for helpful comments in the review process—especially as regards the construct validity of the continuous attention response scale.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Data accessibility statement
The data and code for the main analyses are available at (https://osf.io/6vnj3/ and https://osf.io/m5fsd/) and the studies were preregistered (https://osf.io/kh49x and
)
