Abstract
In this direct replication of Houben, Otgaar, Roelofs, and Merckelbach (
Keywords
Eye-movement desensitization and reprocessing (EMDR) is an evidence-based and first-choice treatment of posttraumatic stress disorder (PTSD) that is widely applied in Western countries (e.g., American Psychological Association, or APA, 2017; Bisson, Roberts, Andrew, Cooper, & Lewis, 2013). Still, a substantial number of patients do not show clinical improvement after EMDR treatment (e.g., Bradley, Greene, Russ, Dutra, & Westen, 2005). Not surprisingly, therefore, recent years have witnessed an increase in research on EMDR’s mechanisms of change in an ultimate attempt to optimize the treatment (see van den Hout & Engelhard, 2012).
Specifically, much attention has been focused on unraveling the mechanism underlying EMDR’s core element: Patients recall the hotspot of their traumatic memory while they make horizontal eye movements (EMs; Shapiro, 2017). In laboratory studies using an analogue of this procedure, participants typically report that the EM intervention renders their memories less vivid and less emotional (van den Hout & Engelhard, 2012). A meta-analysis showed that EMs are an effective and essential component of reductions in vividness and emotionality in analogue studies and (primarily) of reductions in subjective units of distress in full-protocol studies (Lee & Cuijpers, 2013).
Until recently, however, little was known about
It has further been shown that immediate changes in memory phenomenology are maintained over time: 24 hr (Leer, Engelhard, & van den Hout, 2014) and 1 week after the intervention (Gunter & Bodner, 2008). In addition, two studies have demonstrated that the EMs reduce memory accessibility, as evidenced by increased response latencies in a stimulus discrimination task (Leer et al., 2017; van den Hout, Bartelski, & Engelhard, 2013). Presumably, these changes reflect that the EMs result in the formation of a new memory trace or affect the original memory trace through memory reconsolidation (van den Hout & Engelhard, 2012).
Such changes in memory may be desirable from a therapeutic point of view, but may come with unwanted side effects. Houben and colleagues (2018) hypothesized that the EM intervention results in increased susceptibility to accepting misinformation. They reasoned that the formation of false memories might arise when memory becomes vague (i.e., less detailed) as a result of the EM intervention and that people then become reliant on memory for central (gist) elements (see fuzzy trace theory; Brainerd, Reyna, & Ceci, 2008). Given that the EMs indeed fuel the formation of false memories, they may have significant consequences for the treatment of patients with PTSD using EMDR. Through suggestive pressure and Ems, patients may more easily recover false memories of trauma (Patihis & Pendergrast, 2018), which consequently may affect the credibility of (eyewitness) testimony and in extreme cases could result in innocent people being falsely accused. Indeed, EMDR specifically has been linked to potential false-memory cases (Patihis & Pendergrast, 2018; Shaw & Vredeveldt, 2019). In a broader sense, this is relevant to the ongoing discussion about the reliability of repressed memories (i.e., the “memory wars”; Patihis, Ho, Tingen, Lilienfeld, & Loftus, 2014).
Houben et al. (2018) showed 82 undergraduates a video of a car crash, after which the participants recalled what they saw either with or without simultaneous horizontal EMs. Subsequently, participants were provided with misinformation about the video in the form of an eyewitness narrative. Afterward, participants were provided with forced-choice interview questions about video details. Consistent with the hypotheses, EMs simultaneous with memory recall resulted in (a) reduced recall of actual video details and (b) increased recall of incorrect details that were provided in the eyewitness narrative as misinformation. However, there was no evidence that EMs decreased vividness and/or emotionality more than did keeping eyes stationary.
Given the potential impact of the findings, the current study’s goal was to perform a direct replication of Houben et al. (2018). Thus far, there has been no strong tradition of (direct) replication research in social and behavioral sciences (e.g., Koole & Lakens, 2012; Makel, Plucker, & Hegarty, 2012), though recently, substantial efforts have been made (e.g., Open Science Collaboration, 2015). Replication ensures the self-correcting nature of psychological science, which is vital to scientific progress (e.g., Asendorpf et al., 2013; Nosek, Spies, & Motyl, 2012). It increases the reliability and generalizability of research findings and helps to determine a fairer estimation of the effect size, because reported effect sizes are about twice as large as unreported effect sizes (Franco, Malhotra, & Simonovits, 2016).
Because it was a direct replication, in the current study we used the same design, stimuli, and procedures as Houben et al. (2018), taken from the Open Science Framework. The sample size was set at 2.5 times the original sample (as recommended by Simonsohn, 2015) because sample estimations based on published (inflated) effect sizes may be misleading (Simonsohn, Nelson, & Simmons, 2014). We hypothesized that making EMs simultaneously with memory recall (a) reduces vividness and/or emotionality of the memory, (b) reduces recall of memory details, and (c) increases endorsement of misinformation.
Method
Participants
An ethnically diverse sample of first-year (62.6%), second-year (31.1%), and third-year (6.3%) undergraduate psychology students (
Materials
Beck Depression Inventory–II
The Beck Depression Inventory–II (BDI-II; Beck, Steer, & Brown, 1996) is a reliable and valid self-report questionnaire that assesses depressive mood via 21 items (Wang & Gorenstein, 2013). Each item measures a symptom that indicates depression and consists of four statements; one statement must be selected. All items are scored on a scale from 0 to 3, and higher total scores (range = 0–63) are indicative of higher severity of depressive mood.
Video
The video is a 3-min, 34-s graphic public-service announcement about the dangers of texting while driving (Strange & Takarangi, 2012). It depicts three female teenagers chatting in a car, one of whom is texting and driving. As a result, the driver crosses the centerline and collides with an oncoming vehicle. When the cars have come to a full stop, a third car crashes into them. The video depicts at least five fatalities, including a baby. The video continues to show the aftermath of the accident, including emergency vehicles and an air ambulance arriving. It ends with a close-up of the driver’s face before she is transported to the hospital.
Eye-movement task
A gray dot was presented on a black background. In the recall + EM condition, the dot moved horizontally with a speed of 1 Hz (1 left-right-left movement per second) during four 24-s intervals separated by 10-s breaks. The recall + eyes stationary (ES) condition was identical to the recall + EM condition, except that the dot remained stationary in the middle of the screen. We used E-Prime software (Version 2.0; E-Prime, Psychology Software Tools, Pittsburgh, PA).
Filler task
An online, nontimed version of the tile-matching game
Misinformation narrative
Misinformation was provided in the form of a printed eyewitness narrative containing 10 true statements and 5 false statements (e.g., “The girl who was driving was texting with a boy called
Recognition test
The recognition test contained 15 questions, each with two answer options (e.g., “To whom were the girls writing a text message? John/James”). All questions consisted of a true answer and a foil. For five questions, the foil contained false information (presented in the eyewitness narrative). For the remaining 10, the foil contained information that was not presented before and was thus incorrect. The experimenter administered the recognition test orally and the participant responded orally (Parker, Buckley, & Dagnall, 2009).
Procedure
Throughout the experiment, all communication between experimenter and participants was in English. After giving informed consent, participants reported demographics (age, sex, gender, year of study, and ethnicity) and completed the BDI-II. Next, they watched the video while sound was played at a moderate volume via headphones. Directly after the video, participants rated vividness and emotionality of the observed event on a 100-mm pen-and-paper visual analog scale (VAS) that ranged from 0 (
Results
Randomization and manipulation checks
Using null-hypothesis significance testing (NHST) in JASP (Version 0.9.1; https://jasp-stats.org/), there was no evidence for differences in age,
Vividness and emotionality
For vividness, there was a main effect of time, which showed a drop in ratings from preintervention (
Vividness and Emotionality Scores Before and After the Intervention and Recognition Test Results
Note: Values are means with 95% confidence intervals (CI) in square brackets.
For emotionality, there was a main effect of time; compared with ratings before the intervention (
Correct answers and misinformation answers
Contrary to our hypothesis, there was no evidence that participants in the recall + EM condition more readily endorsed misinformation answers (
Like Houben et al. (2018), we also analyzed our data on the endorsement of misinformation using Bayesian hypothesis testing (BHT) in JASP. We performed a sequential analysis with a robustness check, which shows how the evidential strength develops from when data are collected until the intended sample size is reached. An advantage of BHT over NHST is that it also allows for quantifying evidence in favor of

The evidential trajectory for sequential analysis of endorsed misinformation answers with robustness checks for different priors (user, wide, and ultrawide). The graph shows the value of BF01 as a function of the number of participants in the sample for three Cauchy priors (

The evidential trajectory for sequential analysis of correct answers with robustness checks for different priors (user, wide, and ultrawide). The graph shows the value of BF01 as a function of the number of participants in the sample for three Cauchy priors (
Correlation between memory change and endorsement of misinformation
Although the working mechanism hypothesized to be crucial for endorsement of misinformation—decreases in vividness and/or emotionality—was not assessed in Houben et al. (2018), we did test it. There was no evidence for a correlation between endorsing misinformation and decreases in vividness,
Discussion
The aim of this direct-replication study was to test whether making EMs during memory recall increases a person’s susceptibility to endorsing misinformation. This is a question of significance, because this is a key element of EMDR, a treatment often used for PTSD (e.g., APA, 2017). Thus, a therapeutically beneficial intervention may have adverse effects. However, we found that memory recall with simultaneous EM (vs. eyes stationary) did not increase endorsement of misinformation and thus false memory. Moreover, it also did not reduce (correct) memory details or self-reported memory vividness and emotionality.
Evidently, our results contradict Houben et al.’s (2018) findings. Making EMs simultaneously with memory recall did not result in higher false-memory rates compared with keeping eyes stationary. Conceivably, this is the result of our finding that EMs did not reduce vividness and emotionality more so than merely recalling the memory. Although differential changes were numerically slightly larger in Houben et al., they reported no statistically significant differences between the conditions, either. Absence of differential decreases in vividness and/or emotionality is not uncommon, but might be more related to studies testing memory for novel materials (e.g., pictures) than to studies testing memory for autobiographical events (e.g., Leer, Engelhard, Dibbets, & van den Hout, 2013; van Schie, Engelhard, & van den Hout, 2015; but see Leer et al., 2017). Moreover, experiential evidence relates the size of the decrease to the duration of the intervention (e.g., Leer et al., 2014), but novel (yet unconsolidated) materials might benefit from shorter instead of longer EM interventions (Leer et al., 2017). However, Houben et al. hypothesized specifically that when a memory is less vivid, a person would be more prone to accepting misinformation. No such relationship was present in the current study or even in the original study, 1 which casts doubt on whether this truly is the mechanism of action, provided that endorsement of misinformation is affected by the EMs at all.
Another reason for the discrepancy in results may simply be random variation in sampling or measurement. For instance, the original study’s between-subjects design and relatively small sample size (compared with that of the current study) may have unintentionally contributed to spuriously large differences between the two conditions (i.e., creating a false-positive result). Moreover, the reliability of the forced-choice interview questions is problematic; the Kuder–Richardson 20 internal consistency score in the current study was .31 (95% CI = [.15, .45]), whereas .7 or .8 is deemed “acceptable” for general research purposes (Henson, 2001). Low reliability of the test instrument might result in floor effects for endorsed misinformation and consequently may have compromised finding a difference between conditions in the current sample.
It is also possible that the current study represents a false-negative finding. Given the tenets of NHST, it is inevitable that a number of replication attempts are bound to be unsuccessful simply as a result of chance (Fisher, 2006; Neyman & Pearson, 1928). Thus, ruling out a false negative with absolute certainty is impossible. Assuming that the current study is not a false negative and represents a true effect, a crucial question is: To what extent do these nonsignificant replication results actually support the null hypothesis? Using NHST, such a claim is difficult to substantiate (but see Schuirmann, 1987), but the additional BHT showed that there is indeed evidence in favor of the null hypothesis that increased with more collected data, especially for endorsement of misinformation.
What else may then explain the difference in outcomes between the original study and the present study? Observer-expectancy effects may have played a role in both studies (Rosenthal, 1966). Investigators in either study may have subconsciously influenced the participants in the experiment, causing them to respond in correspondence with their expectations. Although this is only one of the many biases that could have influenced the results (see Sackett, 1979), there are ways to circumvent or minimize the effects of biases (in future experiments); for example, by using double-blind testing, by using methods that do not rely on external pressure but rely more on automatic processes such as the Deese/Roediger-McDermott false-memory paradigm (Deese, 1959; Roediger & McDermott, 1995), or by standardizing procedures via computerized data-acquisition techniques. The recognition test that was presented orally by the experimenter seems an especially ideal candidate for standardization in order to keep risk for bias at bay.
Alternatively, many different moderators could be argued to explain the discrepancies between studies—for example, motivational differences between participants, differences in sample composition between the original and current study (e.g., more female participants, a wider age range), or use of the revised BDI. However, a list of such moderators would be infinite, and thus any claims about moderation in the current study would be largely speculative at this point. Moreover, even perfectly matched populations may produce contradictory results that one may theoretically attribute to a moderator. The only way to assess whether an effect is robust (and largely independent of biases or moderators) is to have multiple (independent) laboratories perform the same direct replication (Simons, 2014).
In conclusion, the current experiment was a first direct replication of Houben et al. (2018), but failed to find the original study’s effect that making EMs during memory recall increases a person’s susceptibility to endorsing misinformation. This suggests that the original study may be a false positive and that treatment of PTSD via EMDR does not come with the adverse effect of increased false memory formation. This does not mean that EMDR is free of memory distortions per se. After all, recalling a memory (as is done in EMDR) is inherently a reconstructive process, and distortions in the form of misinformation can slip in at any time (Loftus, 2005). At this point, one study showed the presence of an effect of EMs on false memory and one study showed the absence of such an effect. Reliably and validly investigating whether (and how) making EMs during recall robustly increases a person’s susceptibility to false memory formation will require a multilab, direct-replication attempt and the use of standardized test instruments.
Supplemental Material
vanSchie_Leer_Open_Practices_Disclosure – Supplemental material for Lateral Eye Movements Do Not Increase False-Memory Rates: A Failed Direct-Replication Study
Supplemental material, vanSchie_Leer_Open_Practices_Disclosure for Lateral Eye Movements Do Not Increase False-Memory Rates: A Failed Direct-Replication Study by Kevin van Schie and Arne Leer in Clinical Psychological Science
Footnotes
Acknowledgements
We are grateful to Sanne Houben and her coauthors for providing us with materials and any requested clarifications. We thank Jarinne de Jong, Dewi Kooren, Marion Steenbakker, and Jolien van der Velden for their assistance in testing.
Action Editor
Scott O. Lilienfeld served as action editor for this article.
Author Contributions
K. van Schie and A. Leer developed the study concept. K. van Schie was responsible for data collection and data analysis. K. van Schie and A. Leer interpreted the data, drafted the manuscript, and approved the final version of the article.
Declaration of Conflicting Interests
The author(s) declared that there were no conflicts of interest with respect to the authorship or the publication of this article.
Open Practices
All data and materials have been made publicly available via the Open Science Framework and can be accessed at https://osf.io/j479p/ and https://osf.io/4UBJ8/. The complete Open Practices Disclosure for this article can be found at http://journals.sagepub.com/doi/suppl/10.1177/2167702619859335. This article has received badges for Open Data and Open Materials. More information about the Open Practices badges can be found at
.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
