Sage Journals: Discover world-class research

Abstract

Legal psychologists sometimes provide expert witness testimony about eyewitness memory in court. In their testimony, they regularly rely on scientific findings that decision-makers (i.e., jurors, judges) likely assume are practically relevant. However, it is not yet known which effect sizes are large enough to be deemed practically relevant for the courtroom, also known as the smallest effect size of interest (SESOI). One way to estimate the SESOI is to engage stakeholders. In two studies, we recruited 97 legal professionals (e.g., defense lawyers, prosecution lawyers, judges) from the Netherlands and Belgium and presented them with hypothetical scenarios about an unarmed robbery wherein an eyewitness made different types of memory errors (e.g., misremembering a black gun). Then, legal professionals were asked how many of such memory errors they would allow before taking certain legal actions. The majority of legal professionals viewed 1–3 memory errors as practically relevant for legal decisions or actions, but this depended on the type of memory error. A nontrivial number of participants indicated that they would never undertake legal actions after a witness made memory errors. The current studies can guide the challenging task of estimating the SESOI in forensic psychological contexts that may assist future researchers.

Keywords

Smallest effect size of interest eyewitness memory legal professionals consensus study practical relevance

Psychologists are sometimes consulted to provide expert witness testimony regarding the validity of memory-based statements. The need for such expert witness testimony becomes apparent when considering that eyewitness memory errors are a major contributor to wrongful convictions in many jurisdictions. For example, in the United States, 337 of the 617 exonerations (55%) were cases that had eyewitness memory errors (The National Registry of Exonerations, 2025), such as misidentification of innocent suspects (Saks and Koehler, 2005). When psychologists educate legal professionals about factors that can lead to eyewitness memory errors, they frequently rely on the scientific literature, thereby assuming this literature has direct implications for the legal process. However, it remains unclear under which conditions psychologists can confidently state which factors may meaningfully contaminate memory-based testimony (e.g., Riesthuis et al., 2022).

Recently, Otgaar et al. (2022a) argued that the replicability, generalizability, and practical relevance of memory research should be specifically discussed by expert witnesses in their written or oral testimony (see also Riesthuis and Otgaar, 2024a; Tullett, 2022). Replicability refers to whether the phenomenon under investigation has been consistently observed and supported by data from independent researchers (Chin, 2014; Nosek et al., 2022). Generalizability pertains to whether psychological effects demonstrated in highly controlled laboratory settings can be applied to other contexts (Yarkoni, 2022). Assuming that eyewitness memory research is replicable and generalizable (but see Klein et al., 2018; Yarkoni, 2022), it is imperative to examine whether the magnitude of an observed phenomenon (e.g., alcohol leading to forgetting), better known as an effect size (Cohen, 1988), is practically relevant.

Accordingly, the main objective of the current studies was to examine when eyewitness memory research can be deemed practically relevant through the lens of legal professionals. Asking legal professionals about when they, for example, take certain legal actions or decisions after encountering eyewitness memory errors can help estimate which effect sizes in eyewitness memory research are practically important in legal proceedings.

Smallest effect size of interest and eyewitness memory

To assess which effect sizes are practically relevant, it is important to estimate the smallest effect size of interest (SESOI;¹ Lakens et al., 2018). The SESOI is the smallest effect size that researchers consider personally interesting, is practically relevant, or has theoretical implications (Riesthuis, 2024). The latter is problematic for establishing the SESOI because most psychological theories are verbal in nature (e.g., alcohol leads to forgetting; Gruijters and Peters, 2020; Meehl, 1967), meaning that specific predictions (e.g., an x increase in alcohol leads to an x amount of forgetting) cannot be derived from such theories. Hence, this article focused on the smallest effect size that would have practical consequences in a legal setting. Notably, the field of eyewitness memory research has yet to establish which effect sizes would be considered practically relevant or, in other words, what the SESOI is.

Contextualized SESOIs for eyewitness memory research are necessary when legal professionals and scholars need to know which factors do or do not affect the validity of testimony in practically meaningful ways. To do so, some memory researchers examine experimentally which factors (e.g., therapy, alcohol, type of line-up) might undermine eyewitness memory by facilitating forgetting (e.g., failing to report information; failing to identify the perpetrator; e.g., Radvansky et al., 2022) or increasing false memory formation (e.g., reporting non-experienced events; mistaken eyewitness identifications; e.g., Loftus, 2005).

To determine which factors may affect eyewitness memory, researchers currently tend to rely on statistical significance (Riesthuis et al., 2022; Riesthuis and Otgaar, 2024a, 2024b). However, just because a factor yields a statistically significant effect does not necessarily mean that the effect also has practical relevance in a legal setting (Otgaar et al., 2023b). Relying on statistical significance can be problematic for several reasons. On the one hand, when small sample sizes are used in research, a statistically significant effect might be spurious, unreliable, and difficult to replicate (Bakker et al., 2012). On the other hand, when large sample sizes are collected, overreliance on statistical significance can lead to issues such as detecting statistically significant effects that might be trivial in real-life settings (Anvari and Lakens, 2021).

To further highlight the relevance of establishing the SESOI in eyewitness memory research when large sample sizes are collected, consider the following hypothetical scenario. Imagine that researchers want to experimentally examine the effects of alcohol on false memory formation using the misinformation paradigm (Takarangi et al., 2006) in a multi-lab study (to show that the effect is replicable and generalizable). In this hypothetical experiment, participants watch a video of an unarmed robbery and later receive post-event misinformation, such as that the suspect was holding a gun. Some participants consume alcohol before watching the video, while others do not. One week later, their memory of the unarmed robbery is assessed. Suppose meta-analytic results reveal a reliable and statistically significant effect (statistical power = .95; p < .05; low heterogeneity), indicating that participants who consumed alcohol reported, on average, .05 more misinformation details (e.g., suspect was holding gun) than those who did not consume alcohol. Can this meta-analytic effect be used by expert witnesses to give testimony that alcohol contaminates memory-based testimony? Even though the effect was replicable, generalizable, and statistically significant, it is challenging to argue that a difference of .05 misinformation details would have practical relevance in court (but see Anvari et al., 2023 for when small effects may matter). To determine whether effects are practically meaningful (i.e., minimum-effect testing; Murphy and Myors, 1999) or are too small to care about (i.e., equivalence testing; Lakens et al., 2018), contextualized SESOIs need to be estimated for eyewitness memory studies (Chin, 2023; Riesthuis, 2024).

Approaches to establishing the smallest effect size of interest

There are various approaches through which the SESOI can be established. Many researchers implicitly set their SESOI using Cohen's benchmarks for small (e.g., d = .2), medium (d = .5), and large (d = .8) effect sizes (Cohen, 1988) in their power analyses (Correll et al., 2020). Using these benchmarks in a power analysis indirectly indicates the SESOI (Lakens et al., 2020), because this is the effect that can be reliably detected based on the given significance level (α) and statistical power (1 − β). In other words, researchers can only detect effect sizes equal to or larger than the given effect size in the power analysis (Riesthuis, 2024). Interestingly, Cohen explicitly advised against the use of these effect size benchmarks because they fail to consider the context of the studied phenomenon and research methodology (Cohen, 1988: 25, 532). In sum, there is no rational reason to equate Cohen's benchmarks with practical relevance.

Alternatively, researchers can base their SESOI on meta-analyses, previous studies, or pilot studies. However, even these methods are not without criticism (e.g., Chin and Neal, 2023). This is because the derived effect sizes might be overestimations and could be unreliable due to low statistical power and publication bias in psychological research (Bakker et al., 2012; Bartoš et al., 2023). Moreover, the derived effect sizes only estimate the magnitude of an effect but they do not address whether the effect size yields any practical meaning. Along these lines, Panzarella et al. (2021) recommended against the use of field specific effect size distributions (i.e., establishing which effect sizes are small, medium, or large based on the distribution of published findings) because they do not provide any information about the meaning of the effect sizes and the estimates are also affected by the aforementioned issues of low statistical power in previous research and publication bias.

Another way the SESOI can be established is by engaging stakeholders (Byrne, 2019). Consulting, for example, researchers, patients, and doctors has been strongly recommended in the field of medical and health research to improve the appropriateness, acceptability, and practical relevance of the research (Staniszewska et al., 2012), but also to establish the SESOI (e.g., Bonini et al., 2020; Lemay et al., 2019).² Recently, Riesthuis et al. (2022) involved memory scientists and implemented a consensus method to examine what these stakeholders would consider the SESOI to be for false memory research. In their study, memory scientists were presented with several hypothetical experimental designs examining the effects of a certain therapy approach on false memory formation, for example. Then, participants were asked to provide their SESOI. Specifically, memory scientists were asked how many falsely remembered details due to therapy would be needed for them to conclude that the therapy can contaminate eyewitness testimony. Interestingly, there was no consensus among memory scientists regarding the SESOI. Many relied on conventions such as Cohen's benchmarks, even though they were specifically asked to express the SESOI in unstandardized effect sizes (i.e., raw mean differences). Further, and as observed in other research (Riesthuis and Otgaar, 2024a, 2024b), many memory scientists relied on statistical significance to infer whether an effect was practically meaningful. Providing expert witness testimony on effects that are solely based on statistical significance might unwarrantedly boost the practical implications of these effects.

However, memory scientists are not the only stakeholders invested in the field of eyewitness memory research (Chin, 2023). For example, legal professionals frequently have to make decisions based on eyewitness testimony. Examples of this are judges determining the reliability of eyewitness testimony or lawyers contemplating whether to appeal the admissibility of an eyewitness testimony or consult an expert witness. It is possible that certain memory undermining effects are too small (even if replicable and generalizable) for legal professionals to care about. The reason for this is that the effects might be too trivial (recall the .05 memory errors example) to build a case upon (e.g., appealing for the inadmissibility of an eyewitness). Hence, legal professionals can provide pivotal information in assessing which effects could be practically relevant for the courtroom in terms of their legal decision making. One way to examine this is by investigating when legal professionals make certain legal decisions based on eyewitness memory errors. The number of memory errors legal professionals permit before making certain legal decisions (e.g., challenge the reliability of a witness’s testimony) can be used to estimate and justify a SESOI for eyewitness memory studies.

The present studies

In the present studies, we examined how many and what type of eyewitness memory errors legal professionals would permit before taking certain legal decisions or actions. To do so, we provided them with three scenarios of an unarmed bank robbery. In each scenario, an eyewitness made a different type of memory error. Specifically, the type of memory errors were: (1) misremembering the color of a jacket, (2) misremembering an entirely new detail (i.e., the perpetrator was wearing a yellow hat), or (3) misremembering an entirely new and incriminating detail that the perpetrator was holding a black gun (recall it was an unarmed robbery). For each scenario, legal professionals were asked how many of such errors would be required before they would take certain legal decisions or actions (e.g., challenge the admissibility of the eyewitness). In the first study, we targeted a wide range of legal professionals by contacting bar associations in Belgium and the Netherlands.³ In the second study, we recruited judges and prosecution lawyers from the Netherlands who participated in a training about legal psychological research given by the second author (ER).

Study 1

Method

Participants

We recruited as many legal professionals as possible and because the research was exploratory, we did not conduct an a priori power analysis to estimate a sample size for inferential testing (Scheel et al., 2021). In Belgium, we contacted the Dutch-speaking bar associations “Orde van Vlaamse Balies,” who shared our study on their LinkedIn profile and distributed it among its members. In the Netherlands the “Nederlandse Orde van Advocaten” declined to share our study among their members. The bar association “Nederlandse Vereniging van Jonge Strafrechtadvocaten” did not reply to our two emails. However, the bar association “Nederlandse Vereniging van Strafrechtadvocaten” shared our study among its members. In total, 424 Belgian and Dutch participants took part in our questionnaire.

Participants who completed at least one scenario were included in the data analysis (n = 330). Upon visual inspection of the data, we detected irregularities that could indicate that some bots completed our survey. That is, participants from non-Dutch speaking countries participated in the study even though the questionnaire was in Dutch. Moreover, there were participants with more years of practice than age/years of practice (i.e., started legal career before the age of 18), participants indicated to currently be a “judge” but had less than 5 years’ experience in the legal system, and participants who completed the study very fast (faster than 5 min). To ensure that we only included data from legal professionals, we excluded 261 participants based on the abovementioned criteria and remained with a sample of 69 participants that were included in the analyses (M_age = 42.3, SD_age = 12.9, range_age = 23–70; see Table 1 for demographics). These exclusions were not preregistered, but were conducted before we analyzed the data.

Table 1.
Demographics of legal professionals study 1.

Characteristic N %

Sex

Female 34 49.3

Male 34 49.3

Other 1 1.4

Country

Belgium 24 34.8

Netherlands 45 65.2

Occupation

Judge 1 1.4

Lawyer 48 69.6

Other* 9 13.0

Paralegal 6 8.7

Prosecutor 5 7.2

Background

Civil law 11 15.9

Criminal law 51 73.9

Other 7 10.1

Note. Excluded participants are not presented in the table. * “Other” category included six PhD students, one lawyer and deputy judge, one police officer, and one notary.

Study 1 was preregistered as exploratory and descriptive, and we did not have defined hypotheses (https://osf.io/ayj9d). The Social and Societal Ethics Committee and Privacy and Ethics Unit of KU Leuven approved this study (G-2022-5363-R2(MIN)). All data, code, materials, and rMarkdown files for the data analysis with interactive plots and additional analyses are openly available on the Open Science Framework at https://osf.io/nt82u/.

Design and materials

We used a within-subjects design. All participants received the same three hypothetical legal cases wherein an eyewitness made a memory error. The cases were presented one-by-one in a completely random order for each participant to avoid order effects. Similarly, for each scenario, six questions concerning different legal decisions were presented one-by-one in a completely random order for each participant. All participants received the same six questions.

The three cases were identical except for the memory errors that were made by the eyewitness (see OSF for all three cases, https://osf.io/nt82u/). An example of one of the legal cases is as follows:
Consider the following case of the People v. S.W.C. (2012). S.W.C. is charged with unarmed robbery. The police obtained incriminating evidence from one of the adult eyewitnesses of the robbery who stated in an initial interview that they saw S.W.C. at the scene of a crime wearing white pants and a blue jacket. Video footage confirmed that S.W.C. was wearing white pants and a blue jacket 15 minutes before the crime occurred. However, during a second interview, the eyewitness mistakenly remembers and reports that S.W.C. was also holding a black gun.
The different memory errors were: (1) misremembering the color of the jacket, (2) misremembering that the suspect wore a yellow hat, and (3) misremembering that the suspect was holding a black gun (see example scenario above). We selected these three types of memory errors based on their presumed severity in the context of legal proceedings. Specifically, we assumed that misremembering the color of a jacket would be viewed as the least problematic, as it involves altering an existing detail rather than introducing a new one. Misremembering a new but non-crime-related detail (e.g., a yellow hat) was expected to be seen as moderately problematic. Finally, misremembering a new and explicitly crime-related detail (e.g., a black gun) was assumed to be the most problematic.

For each scenario, participants received the following six questions: (1) “In your opinion, how many of these errors in an eyewitness account are sufficient to consider the eyewitness testimony unreliable?”, (2) “In your opinion, how many of these errors in an eyewitness account are sufficient to consider the eyewitness testimony inadmissible?”, (3) “In your opinion, how many of these errors in an eyewitness account would generally lead you to challenge the admissibility of the witness?”, (4) “In your opinion, how many of these errors in an eyewitness account would make a judge in your jurisdiction rule the eyewitness’s evidence inadmissible?”, (5) “In your opinion, how many of these errors in an eyewitness account would it take for you to consider retaining an expert witness to inform the factfinder about the functioning of human memory?”, (6) “In your opinion, how many of these errors in an eyewitness account would generally cause the average person to reduce the weight they place on the eyewitness’s evidence?”.⁴ Participants could answer each question via a slider going from 0.0–10.0 memory errors or indicate that they would never make a decision based on such errors.

We chose these questions because they address memory errors from multiple perspectives and explore their potential consequences in legal contexts. Specifically, the first question captures a general threshold at which legal professionals might begin to view eyewitness testimony as unreliable. While this does not reflect a formal legal decision, it provides insight into when skepticism about the testimony may arise—potentially influencing later legal actions. The second and third questions focus more directly on legal thresholds, assessing when legal professionals would consider the testimony inadmissible or take action to challenge its admissibility. The fourth question explores perceptions of judicial decision making, specifically, when legal professionals believe a judge would deem the testimony inadmissible which can influence when they would take legal actions or decisions. The fifth question examines the point at which a legal professional would decide to retain an expert witness to educate the court about memory to strengthen their claim. Finally, the sixth question assesses when legal professionals believe a layperson (e.g., potential juror) would reduce the weight they assign to the eyewitness testimony which could influence whether the legal professional might want to question the reliability of a testimony.

Procedure

The study was conducted online via Qualtrics and responses were completely anonymous. Participants were informed that they would receive three different legal criminal cases for which their professional opinion (i.e., how they would respond if they were the legal professional handling this case) was requested. Subsequently, they were presented with each legal case one by one in a completely random order and were asked to respond to the six questions. Afterwards, participants were debriefed and thanked for their participation.

Data analysis

The results are presented as follows. We first provide the results of the linear mixed model conducted in R using the lme4 package (Bates et al., 2015) to examine whether there were differences in terms of overall memory errors permitted for the various legal decisions between the scenarios.⁵ Then, we provide the summary statistics, linear mixed models, and histograms for each scenario separately (see also OSF for more details such as trace and correlation plots, and interactive histograms https://osf.io/nt82u/). Variables x1–x6 are the six legal decisions for Scenario 1, x7–x12 are the six legal decisions for Scenario 2, x13–x18 are the six legal decisions for Scenario 3.⁶

Results

Differences between scenarios

In total, 69 participants completed Study 1. For each scenario, there were six legal decisions, meaning that we had 414 responses in total for each scenario (69 * 6 = 414; see Table 2 for overall summary statistics).

Table 2.
Summary statistics for each scenario.

Scenario N M SD Median Never

Scenario 1 307 3.21 2.13 2.90 107

Scenario 2 292 3.79 2.28 3.10 122

Scenario 3 311 2.79 2.22 2.00 103

Note. A total of 414 responses were possible as there were 69 participants and 6 questions per scenario. “Never” responses were categorical responses indicating that participants would never base a legal decision on this kind of memory error.

A linear mixed model with the type of scenario as fixed effect and random slope and participants as random intercept (see rMarkdown file on OSF for model comparisons) indicated that legal professionals permitted more memory errors such as misremembering a yellow hat (Scenario 2; β = 3.75, SE = .24; see Table 3) before taking legal decisions compared with misremembering the color of the jacket (Scenario 1; β = 3.30, SE = .22) or a black gun (Scenario 3; β = 2.84, SE = .21). However, post-hoc pairwise comparisons using Tukey's honest significant difference (HSD) test showed that there was only a statistically significant difference in terms of memory errors between Scenario 2 and Scenario 3, t(61.0) = 4.36, p < .001, Cohen's d = .64, 95%CI [.34; .93], but not between Scenario 1 and Scenario 2, t(60.4) = 2.15, p = .09, Cohen's d = .31, 95%CI [.02; .61] or Scenario 1 and Scenario 3, t(63.1) = 1.97, p = .13, Cohen's d = .32, 95%CI [−.01; .65]. The intraclass correlation coefficient showed that 59% of the variance in how many memory errors are permitted by legal professionals before taking legal decisions is attributed to the individual legal professional.

Table 3.
Summary statistics linear mixed model examining memory error differences between scenarios.

Fixed effects Β SE df t p

Scenario 2—intercept 3.75 .24 63.9 15.65 < .001

Scenario 1 −.45 .21 59.1 −2.18 .034

Scenario 3 −.91 .21 62.2 −4.41 < .001

Random effects Variance SD

Participants—intercept 3.15 1.77

Scenario 1 1.60 1.27

Scenario 3 1.65 1.28

Residual 2.05 1.43

ICC .59

R²—Nakagawa (fixed) .03 [.01; .05]

R²—Nakagawa (total) .60 [.49, .66]

Scenario 1: Eyewitness misremembers the color of the jacket (i.e., misremembering a blue jacket as a green jacket)

For the first scenario, 307 (74%) out of the 414 responses referred to an amount of memory errors in response to the legal decisions and 107 were “never” responses (26% of decisions; i.e., indicating that participants would never make such a legal decision based on the eyewitness mistake they were considering). Of those memory error responses, the overall average for all legal decisions was 3.21 (SD = 2.13, median = 2.90; see Table 2).⁷ Table 4 provides the summary statistics for each legal decision separately.

Table 4.
Summary statistics for each question for Scenario 1.

Legal decisions N Mean SD Median Never

Unreliable (x1) 59 2.92 1.96 2.80 10

Inadmissible (x2) 49 3.20 2.05 3.00 20

Challenge (x3) 50 3.22 2.05 2.95 19

Judge (x4) 50 3.80 2.59 3.00 19

Expert (x5) 44 2.89 1.80 2.50 25

Average Person (x6) 55 3.23 2.18 2.90 14

A linear mixed model with the type of legal decision as fixed effect and participants as random intercept (see rMarkdown file on OSF for model comparisons) showed that legal professionals indicated that more memory errors were necessary to make a judge in their jurisdiction rule the eyewitness’s evidence inadmissible compared with other legal decisions, β = 4.01, SE = .27 (Judge [x4], see Table 5). However, post-hoc pairwise comparisons using Tukey's HSD test only found evidence for a statistically significant differences between the first legal decision (Unreliable [x1]; M = 2.95, SE = .28) and the fourth legal decision (Judge [x4]; M = 4.01, SE = .29), t(253) = −3.85, p = .002, Cohen's d = −.76, 95%CI [−1.15; −.36] and between the fifth legal decision (Expert [x5]; M = 3.03, SE = .30) and the fourth legal decision (Judge [x4]), t(260) = 3.22, p = .018, Cohen's d = .70, 95%CI [.27; 1.13] (see OSF for all pairwise comparisons). The intraclass correlation coefficient indicated that 56% of the variance in the amount of memory errors permitted for the various legal decisions can be attributed to the individual legal professional. Overall, this suggests that legal professionals had a certain threshold of memory errors they adhered to for the various legal decisions, which only slightly changed for the legal decision of the judge ruling eyewitness’s evidence inadmissible.

Table 5.
Summary statistics linear mixed model Scenario 1.

Fixed effects Β SE df t p

Unreliable (x1)—Intercept 2.95 .27 137 10.74 < .001

Inadmissible (x2) .38 .28 247 1.38 .17

Challenge (x3) .41 .27 247 1.50 .14

Judge (x4) 1.06 .27 247 3.89 < .001

Expert (x5) .08 .29 252 .29 .77

Average Person (x6) .34 .27 246 1.27 .21

Random effects Variance SD

Participants—intercept 2.54 1.60

Residual 1.98 1.41

ICC .56

R²—Nakagawa (fixed) .03 [.01, .06]

R²—Nakagawa (total) .57 [.45, .70]

Lastly, in Figure 1 we display the views among legal professionals regarding the amount of similar memory errors permitted before taking certain legal decisions. As can be seen in Figure 1, the majority of participants allowed 1–3 of such memory errors before they would make certain legal decisions or take any legal actions. However, a substantial number of participants indicated that they would never make any legal decision based on these types of memory errors.

Figure 1.
Histogram visualizing the agreement of legal professionals for Scenario 1.

Scenario 2: Eyewitness misremembers entirely new detail (i.e., yellow hat)

For Scenario 2, 292 (71%) out of 414 responses were an amount of memory errors and 122 were “never” responses (29% of decisions). The overall mean for Scenario 2 was 3.79 (SD = 2.28, median = 3.10; see Table 2). The summary statistics for each legal decision are provided in Table 6.

Table 6.
Summary statistics for each question for Scenario 2.

Legal decisions N Mean SD Median Never

Unreliable (x7) 57 3.59 2.17 3.10 12

Inadmissible (x8) 46 3.57 2.31 3.00 23

Challenge (x9) 47 3.92 2.27 3.90 22

Judge (x10) 46 4.56 2.71 4.00 23

Expert (x11) 42 3.41 2.04 3.00 27

Average Person (x12) 54 3.73 2.11 3.05 15

We conducted a linear mixed model with the type of legal decision as fixed effect and participants as random intercept (see rMarkdown file on OSF for model comparisons), which suggested that legal professionals are of the opinion that more memory errors are necessary to make a judge in their jurisdiction rule the eyewitness’s evidence inadmissible compared with other legal decisions, β = 4.63, SE = .27 (Judge [x10], see Table 7). Post-hoc pairwise comparisons using Tukey's HSD test indicated that there were statistically significant differences between the first legal decision (Unreliable [x7]; M = 3.63, SE = .29) and the fourth legal decision (Judge [x10]; M = 4.63, SE = .31), t(240) = −3.60, p = .005, Cohen's d = −.73, 95%CI [−1.14; −.33], between the fifth legal decision (Expert [x11]; M = 3.30, SE = .32) and the fourth legal decision (Judge [x10]), t(244) = 4.36, p < .001, Cohen's d = .98, 95%CI [.53; 1.42], and between the sixth legal decision (Average Person [x12]; M = 3.72, SE = .30) and the fourth legal decision (Judge [x10]), t(241) = 3.25, p = .017, Cohen's d = .67, 95%CI [.26; 1.08] (see OSF for all pairwise comparisons). Moreover, the intraclass correlation coefficient showed that 63% of the variance in the amount of memory errors permitted for the various legal decisions was attributed to the individual legal professional. This indicates again that legal professionals had a certain threshold of memory errors they adhered to for the various legal decisions or actions, which mainly differed for the question about whether a judge would rule the eyewitness’s evidence inadmissible.

Table 7.
Summary statistics linear mixed model Scenario 2.

Fixed effects Β SE df t p

Unreliable (x7)—Intercept 3.63 .29 119 12.45 < .001

Inadmissible (x8) .18 .27 236 .66 .51

Challenge (x9) .51 .27 237 1.86 .06

Judge (x10) 1.00 .27 236 3.64 < .001

Expert (x11) −.33 .28 236 −1.18 .24

Average Person (x12) .09 .26 236 .33 .74

Random effects Variance SD

Participants—intercept 3.09 1.76

Residual 1.86 1.36

ICC .63

R²—Nakagawa (fixed) .03 [.02, .07]

R²—Nakagawa (total) .64 [.55, .74]

We also provide a histogram for the different legal decisions for the second scenario (see Figure 2). For Scenario 2, the histogram shows that most legal professionals permitted 1–5 of such memory errors before making any legal decision or action. Again, many participants also indicated that they would never take any legal decision or action based on eyewitness memory errors.

Figure 2.
Histogram visualizing the agreement of legal professionals for Scenario 2.

Scenario 3: Eyewitness misremembers entirely new incriminating detail (i.e., black gun)

For the third scenario, 311 (75%) out of the 414 responses, there were 311 amounts of memory error responses and 103 “never” responses (25% of decisions; see Table 2). The overall average for this scenario was 2.79 (SD = 2.22, median = 2.00). We also provide the summary statistics for each legal decision in Table 8.

Table 8.
Summary statistics for each question for Scenario 3.

Legal decisions N Mean SD Median Never

Unreliable (x13) 60 2.56 2.06 1.95 9

Inadmissible (x14) 47 2.44 2.06 1.80 22

Challenge (x15) 43 2.42 1.97 1.90 26

Judge (x16) 50 3.48 2.71 2.75 19

Expert (x17) 52 2.96 2.28 2.00 17

Average Person (x18) 59 2.85 2.10 2.00 10

We ran a linear mixed model with type of legal decision as fixed effect and participants as random intercept (see rMarkdown file on OSF for model comparisons) and found that legal professionals indicate that more memory errors are needed to have a judge in their jurisdiction rule the eyewitness’s evidence inadmissible compared with other legal decisions, β = 3.78, SE = .26 (Judge [x16], see Table 9). Post-hoc pairwise comparisons using Tukey's HSD test showed that statistically significantly more memory errors were permitted before a judge would rule eyewitness evidence inadmissible compared with all the other legal decisions. The intraclass correlation coefficient indicated that 64% of the variance in memory errors permitted for the various legal decisions can be attributed to the individual legal professional. This suggests that the legal professionals had a certain threshold in terms of memory errors they permitted before making legal decisions and this mainly deviated when they were asked about whether judges would judge the eyewitness’s evidence inadmissible or when they would indicate never.

Table 9.
Summary statistics linear mixed model Scenario 3.

Fixed effects Β SE df t p

Unreliable (x13)—Intercept 2.59 .28 120 9.12 < .001

Inadmissible (x14) .11 .27 250 .40 .69

Challenge (x15) .19 .27 251 .71 .48

Judge (x16) 1.18 .26 249 4.56 < .001

Expert (x17) .36 .26 250 1.41 .16

Average Person (x18) .33 .25 248 1.35 .18

Random effects Variance SD

Participants—intercept 3.18 1.78

Residual 1.79 1.34

ICC .64

R²—Nakagawa (fixed) .03 [.01, .07]

R²—Nakagawa (total) .65 [.54, .76]

Agreement among legal professionals for the third scenario was visualized in Figure 3. For Scenario 3, many legal professionals indicated that one such memory error is sufficient to take various legal decisions or actions, and the majority indicated 1–3 memory errors. However, many legal professionals would never make any legal decisions or actions even when eyewitnesses would misremember an incriminating detail like a black gun.

Figure 3.
Histogram visualizing the agreement of legal professionals for Scenario 3.

Discussion

In Study 1, we examined how many memory errors eyewitness testimony may contain before legal professionals in Belgium and the Netherlands would take certain legal decisions or actions regarding this testimony. Across all three scenarios, most legal professionals deemed 1–5 memory errors sufficient to affect legal decisions (e.g., deem testimony unreliable). The misremembering of an entirely new detail (i.e., the yellow hat) was deemed least problematic by the legal professionals. Although we explicitly stated that the eyewitness made a memory error, the legal professionals might have interpreted the situation differently, perhaps assuming that the perpetrator could have put on the hat at a later time, thereby not necessarily indicating a true memory error. When only considering the misremembering of the color of the jacket (Scenario 1), it seems that 1–3 memory errors were allowed before participants would take certain legal decisions. Interestingly, for the misremembering of the black gun, many participants indicated that one such memory error was already sufficient to take legal decisions and actions. This finding suggests that the type of memory error and the detail it concerned matter greatly for legal professionals when evaluating eyewitness testimony. Specifically, the misremembering of incriminating details, such as the presence of a black gun, was regarded as the most problematic memory error. Interestingly, we found no statistically significant differences in the amount of memory errors permitted between Scenario 1 (i.e., misremembering the color of the jacket) and Scenario 3 (i.e., misremembering a black gun). It is possible that the legal professionals perceived the former memory error as a direct contradiction (Smeets et al., 2004) and therefore deemed it as problematic as misremembering an incriminating detail. This supports the notion that legal professionals tend to rely heavily upon contradictions to assess the validity of eyewitness testimony (Fisher et al., 2009).

When examining the variability among responses for each participant, it seemed that each participant had a certain threshold they adhered to. For instance, if they responded that two memory errors would lead them to view the statement as unreliable, they reported similar amounts of memory errors for the other legal decisions (e.g., retain an expert witness to inform the factfinder about the functioning of human memory). This result implies that whether or not certain legal decisions are made might depend on the individual legal professional and their personal threshold for how many mistakes they allow before making certain decisions. Put differently, legal professionals did not tend to apply different thresholds depending on the type of legal decision, but rather used a singular threshold across decisions. Legal professionals only deviated from this threshold because they thought more memory errors were allowed before a judge would rule the eyewitness’s evidence inadmissible compared with the other legal decisions. Further, a substantial number of participants expressed that they would never take a certain legal decision based on the presented types of memory errors. Specifically, deviations from the legal professionals’ personal threshold mainly arose if they responded that they would never take a legal decision or action based on such memory errors. One possible explanation is that legal professionals needed more information before taking certain legal decisions.

The findings of Study 1 suggest that the threshold of legal professionals before they take certain legal decisions differ based on the type of memory error. However, our sample of Study 1 consisted mainly of defense lawyers, which limits our conclusions. Hence, in Study 2, we recruited prosecution lawyers and judges in the Netherlands who attended a training on legal psychology to examine when they would take certain legal decisions or actions after encountering eyewitness memory errors.

Study 2

Method

Participants

We recruited 28 legal professionals who attended a training on legal psychology given by the second author. Of these 28 participants, 12 were male and 16 were female. The average age was 38.6 (SD = 8.2; range 26–57). Of these 28 participants, 17 worked for the District's attorney's office (OM; Standing Magistrate), 10 worked as a judge (ZM; Sitting Magistrate), and 1 worked for the Ministry of Justice and Police. On average, they had 9.5 years (SD = 6.4; range 1–21) of work experience. The study was conducted in person before a training on legal psychology that was given by the second author. Participation in the study was completely voluntary.

Design and materials

The design and materials were the same as in Study 1 except for two changes. First, we only presented Scenarios 2 and 3 in Study 2 in order not to burden the participants, as they were also participating in a training. Second, participants completed the study on paper. This means that the scenarios were presented in fixed order with Scenario 2 presented first. Moreover, the questions were presented in fixed order (in the order presented in the “Design and materials” section of Study 1). Because participants completed the study on paper, some added explanations with their answers. The provided explanations were analyzed descriptively.

Procedure

The study was conducted in person and participants were asked to give their informed consent before starting. Participants were informed that they would receive two different legal criminal cases for which their professional opinion (i.e., how they would respond if they were the legal professional handling this case) was requested. Subsequently, they were first presented with Scenario 2 and then 3 and responded to the six questions for each scenario. Afterwards, they were debriefed and thanked for their participation.

Results

We present the results of the entire sample together (see summary statistics of prosecution lawyers and judges separately at https://osf.io/nt82u/). We conducted the same analyses as in Study 1. Supplementary materials, data, code, and rMarkdown files for the data analysis with interactive plots and additional analyses can be found at https://osf.io/nt82u/.

Differences between scenarios

In total there were 28 participants. For each scenario there were six questions, meaning that there were 168 responses in total (28 * 6 = 168; see Table 10).

Table 10.
Summary statistics for each scenario.

Scenario N Mean SD Median Never No Response

Scenario 2 55 4.33 2.59 3.00 47 66

Scenario 3 71 2.32 1.71 2.00 39 58

Note. A total of 168 responses were possible as there were 28 participants and 6 questions per scenario.

A linear mixed model with the type of scenario as fixed effect and random slope and participants as random intercept (see rMarkdown file on OSF for model comparisons) indicated that legal professionals permitted more memory errors such as misremembering a yellow hat (Scenario 2; β = 4.09, SE = .56; see Table 11a) before taking legal actions or decisions compared with misremembering a black gun (Scenario 3; β = 2.26, SE = .44), t(15) = −4.20, p < .001. The intraclass correlation coefficient showed that 65% of the variance in how many memory errors are permitted by legal professionals before taking legal actions or decisions is attributed to the individual legal professional.

Table 11a.
Summary statistics linear mixed model examining memory error differences between scenarios.

Fixed effects β SE df T p

Scenario 2—intercept 4.09 .56 18 7.32 < .001

Scenario 3 −1.83 .44 15 −4.20 < .001

Random effects Variance SD

Participants—intercept 4.83 2.20

Scenario 3 2.11 1.45

Residual 1.65 1.28

ICC .65

R²—Nakagawa (fixed) .15 [.03, .31]

R²—Nakagawa (total) .70 [.55, .81]

Scenario 2: Eyewitness misremembers entirely new detail (i.e., yellow hat)

For Scenario 2, out of the 168 responses, 55 responses were a number of memory errors (33% of decisions), 47 “never” responses (28% of decisions), and 66 no responses (39% of decisions; e.g., participant left the question open or provided a rationale; see Table 10). For the 55 responses in terms of memory errors, the overall average was 4.33 (SD = 2.59; median = 3.00). The summary statistics for each legal decision separately are presented in Table 11b.

Table 11b.
Summary statistics for each question for Scenario 2.

Legal decisions N Mean SD Median Never No Response

Unreliable (x7) 10 4.00 2.36 3.00 6 12

Inadmissible (x8) 8 4.88 3.27 3.00 10 10

Challenge (x9) 7 4.29 2.69 3.00 12 9

Judge (x10) 8 4.62 1.60 4.50 9 11

Expert (x11) 7 5.86 3.02 5.00 6 15

Average Person (x12) 15 3.40 2.50 3.00 4 9

A linear mixed model with type of legal decision as fixed effect and participants as random intercept (see rMarkdown file on OSF for model comparisons) suggested that more memory errors were needed before an expert witness would be consulted compared with other legal decisions, β = 5.75, SE = .69 (Expert [x11], see Table 12). However, post-hoc pairwise comparisons using Tukey's HSD test only found evidence for a statistically significant difference between the fifth legal decision (Expert [x11]; M = 5.75, SE = .79) and the sixth legal decision (Average Person [x12]; M = 3.20, SE = .65), t(46) = 3.77, p = .006, Cohen's d = 2.02, 95%CI [.86; 3.18] (see OSF for all pairwise comparisons). The intraclass correlation coefficient showed that 75% of the variance in the memory errors permitted for the various legal decisions was attributed to the individual legal professional. Hence, as observed in Study 1, this suggests that legal professionals had a certain threshold of memory errors they adhered to for the various legal decisions or actions.

Table 12.
Summary statistics linear mixed model Scenario 2.

Fixed effects β SE df t p

Unreliable (x7)—Intercept 4.04 .69 33 5.84 < .001

Inadmissible (x8) .90 .61 38 1.48 .15

Challenge (x9) .89 .64 38 1.40 .17

Judge (x10) .38 .62 39 .62 .54

Expert (x11) 1.71 .69 40 2.49 .02

Average Person (x12) −.84 .57 41 −1.48 .15

Random effects Variance SD

Participants—intercept 4.76 2.18

Residual 1.59 1.26

ICC .75

R²—Nakagawa (fixed) .10 [.06, .25]

R²—Nakagawa (total) .78 [.58, .89]

To examine the consensus among the participants, we provided histograms of the six legal decisions together in Figure 4. The histogram shows that most participants would never make a legal decision based on these types of memory errors or they provided an explanation for why they did not provide a response (see section below about rationales). However, when participants indicated an amount of memory errors, it seems that they allowed 2–5 memory errors before taking a legal decision or action.

Figure 4.
Histogram visualizing the agreement of legal professionals for Scenario 2 of legal professionals.

Scenario 3: Eyewitness misremembers entirely new incriminating detail (i.e., black gun)

Of the 168 possible responses, 71 were an amount of memory errors (42% of decisions), 39 “never” responses (23% of decisions), and 58 no responses (35% of decisions; see Table 10). The overall average for the 71 responses in terms of memory errors was 2.32 (SD = 1.71; median = 2.00). The summary statistics for each legal decision are presented in Table 13.

Table 13.
Summary statistics for each scenario for Scenario 3.

Legal decisions N Mean SD Median Never No Response

Unreliable (x13) 13 2.00 1.41 2.00 5 10

Inadmissible (x14) 11 2.64 2.11 2.00 9 8

Challenge (x15) 9 1.89 0.78 2.00 10 9

Judge (x16) 13 2.46 1.56 2.00 6 9

Expert (x17) 9 3.33 2.83 3.00 6 13

Average Person (x18) 16 1.94 1.24 1.50 3 9

Note. A total of 168 responses were possible as there were 28 participants and 6 questions per scenario.

As in Scenario 2, we ran a linear mixed model with type of legal decision as fixed effect and participants as random intercept (see rMarkdown file on OSF for model comparisons) which indicated that more memory errors were needed before an expert witness would be consulted by the legal professionals compared with other legal decisions, β = 3.26, SE = .42, t(55) = 3.03, p = .003 (Expert [x17], see Table 14). However, post-hoc pairwise comparisons using Tukey's HSD test only found evidence for a statistically significant difference between the fifth legal decision (Expert [x17]; M = 3.26, SE = .43) and the sixth legal decision (Average Person [x18]; M = 1.78, SE = .41), t(61) = 3.43, p = .013, Cohen's d = 1.63, 95%CI [.64; 2.62] (see OSF for all pairwise comparisons). The intraclass correlation coefficient showed that 69% of the variance in memory errors permitted can be attributed to the individual legal professional. This again highlights that that legal professionals have a certain threshold of memory errors they adhered to for the various legal decisions or actions.

Table 14.
Summary statistics linear mixed model Scenario 3.

Fixed effects β SE df t p

Unreliable (x13)—Intercept 1.98 .42 39 4.75 < .001

Inadmissible (x14) .66 .38 53 1.74 .09

Challenge (x15) .56 .40 53 1.38 .17

Judge (x16) .45 .36 53 1.24 .22

Expert (x17) 1.29 .42 55 3.03 .004

Average Person (x18) −.20 .36 56 −.54 .59

Random effects Variance SD

Participants—intercept 1.88 1.37

Residual .83 .91

ICC .69

R²—Nakagawa (fixed) .08 [.03, .21]

R²—Nakagawa (total) .72 [.58, .85]

The consensus among the participants is visualized in Figure 5. The histogram shows that the majority of participants permitted 1–3 of such memory errors before taking any legal decision or action. However, a substantial number of participants indicated that they would never make a decision based on such memory errors or provided an explanation (see section below about rationales).

Figure 5.
Histogram visualizing the agreement of legal professionals for Scenario 3 of legal professionals.

Rationales for no response

Out of 28 participants, 18 provided rationales for when they did not provide a numerical or “never” response. A total of 33 different rationales were given (some participants gave multiple rationales). The rationales were thematically coded by the first author and presented in Table 15 (see Excel file on https://osf.io/nt82u/ for all rationales).

Table 15.
Rationales for non-response categorized.

Rationale N %

- Depends on other factors for the errors (e.g., type, amount, circumstance, relevance, importance)
9 27.3

- Depends on other factors concerning the statement (e.g., length, importance, delay, and events between statements)
8 24.2

- Depends on the crime
4 12.1

- Depends on other evidence (not specified)
3 9.1

- Depends on consistency
2 6.1

- Depends on whether witness clearly lies
2 6.1

- Other
5 15.2

Discussion

In Study 2, most participants either indicated that they would never draw strong conclusions based on these types of memory errors or did not provide a response. Of those that did provide a numerical response, a similar pattern of results as in Study 1 was found. Specifically, most participants indicated that 1–5 memory errors for the second scenario (i.e., misremembering yellow hat) and roughly 1–3 memory errors for the third scenario (i.e., misremembering black gun) were necessary before they would take any legal decisions or actions. This finding highlights that the type of detail that is misremembered is important for legal professionals. This was further supported by the provided rationales of participants who did not give a numerical response (i.e., amount of memory errors) or “never” response. That is, approximately 25% (9/33) of the rationales indicated that the professionals’ legal decisions would depend on factors surrounding the error, such as the type or importance. However, it has been argued that it is difficult to decide beforehand which details are of importance for a specific case (Otgaar et al., 2022b). Take, for instance, the case of Anna Lindh, in which eyewitnesses misremembered the color of the perpetrator's jacket. This error led to delays in the investigation and initially set the police on the wrong track—an outcome that could have had serious consequences if pursued further (e.g., wrongful conviction or the perpetrator remaining at large). What seemed like an unimportant detail at first later proved to be a crucial piece of information in the investigation (Granhag et al., 2013). While a single critical memory error could, in principle, undermine the probative value of a testimony, legal professionals must often make judgments under conditions of uncertainty, where the importance of individual details is not known a priori. It is also important to note that not all errors are necessarily dispositive; assuming that any error renders testimony fundamentally flawed may be unwarranted. This may explain why legal professionals in our study tolerated more than one memory error, even when they concerned criminally related details (e.g., a black gun). Hence, although the type of detail and its importance play a crucial role in determining the validity of a statement, it is hard to determine what such a detail looks like without knowing the ground truth.

Although the pattern was not as robust as in Study 1, legal professionals in Study 2 also tended to specify a specific range of memory errors they would permit before they would consider any legal decision or action. That is, they expressed a consistent number of memory errors they would tolerate before taking any of the legal decisions or actions, wherein the type of legal decision did not alter the number greatly. In other words, rather than treating each legal decision as requiring a different threshold, our results suggest that many professionals rely on a single threshold for different kinds of decisions. Deviations from this range arose mainly when participants indicated that they would never take a legal decision or when they indicated that it depended on other information (i.e., provided an explanation). This implies that the threshold for when a legal professional takes a certain legal decision or action depends heavily on the individual legal professional and their personal threshold. Nevertheless, 1–3 memory errors concerning incriminating details (i.e., misremembering a black gun) seems to be considered problematic by legal professionals and can already have practical legal consequences.

General discussion

The requests for legal psychologists to provide expert witness testimony prompts the question of whether the present state of legal psychological research can be used to provide practical recommendations in the courtroom. A crucial aspect that needs to be examined when scientific evidence enters the courtroom is whether this evidence is practically relevant, besides being replicable and generalizable (Otgaar et al., 2022a; Riesthuis and Otgaar, 2024a; Tullett, 2022). One way to assess practical relevance is by establishing the SESOI. We approached this issue from a practical perspective by engaging stakeholders as has been done and recommended in medical research (e.g., Bonini et al., 2020; Byrne, 2019; Lemay et al., 2019; van der Heijde, 2001). Specifically, we examined when legal professionals would take certain legal decisions or actions due to eyewitness memory errors to estimate possible SESOIs for eyewitness memory research.

Across the two studies, we did not find a consensus among legal professionals for an exact number of memory errors that would lead them to take legal decisions or actions. One reason for the lack of consensus is that the number of memory errors permitted before taking legal decisions depended on the type of memory error that the witness made in the scenario. Specifically, the contradiction of a previous detail (i.e., misremembering the color of a jacket) or misremembering an entirely new incriminating detail (i.e., black gun), 1–3 of such memory errors were enough for the majority of legal professionals to take legal decisions or actions. This result suggests that legal professionals differentiate between the types of memory errors and the type of details such errors concern when examining the validity of a testimony. This was further supported by the rationales participants provided in Study 2 wherein several participants indicated that their responses depended on the importance and/or relevance of a detail and/or statement. This is interesting because legal professionals, for example in the United States, tend to scrutinize any type of inconsistency or contradiction in witnesses’ statements to question its validity (Fisher et al., 2009).

The legal professionals permitting 1–3 memory errors such as misremembering incriminating details (e.g., a black gun) before they would take legal actions is roughly in line with a recent proposal for the SESOI for eyewitness memory research (Otgaar et al., 2022b, 2023). That is, Otgaar et al. (2022b) conducted a cost-benefit analysis and concluded that one memory error (in terms of a raw mean difference) could already be the SESOI for eyewitness memory research, in particular for the field of alcohol and memory. They argued that in several legal cases the misremembering of one detail yielded negative consequences in the legal process (e.g., see the case of Anna Lindh, Granhag et al., 2013; see the case of Ronald Cotton, Thompson-Cannino et al., 2009). In our study, a nontrivial amount of legal professionals indicated they would permit one memory error before they would start certain legal actions, but this depended heavily on the type of memory error. Hence, a possible SESOI for the eyewitness memory research studies could already be one detail forgotten (or not reported) or misremembered (or mistakenly reported) in terms of raw mean difference (Otgaar et al., 2022b, 2023). Importantly, this does not mean that eyewitness testimony should not be used if one memory error is made because this would mean that many cases cannot go to court. Specifically, even though eyewitness memory is prone to errors under suggestive or misleading conditions, it is heavily relied upon by the justice system because it is frequently the only available evidence (Howe and Knott, 2015). However, when, for example, certain factors such as that intoxicated eyewitness have one more memory error, on average, compared with sober eyewitnesses, this could be seen as a practically memory undermining effect and could serve as the SESOI for eyewitness memory studies (Otgaar et al., 2022b).

Important to note is that this SESOI is not fixed and might be better interpreted as a threshold. That is, certain manipulations such as asking only one suggestive question might lead to a lower effect size (raw mean difference of .5 details misremembered) than the SESOI of one detail. This would imply that the effect would not be considered practically meaningful. However, it could be that when these suggestive questions are repeated, or more suggestive questions are asked, the effect might surpass the threshold of one detail more misremembered and become practically relevant. There are various mechanisms that may increase or counteract effect sizes (for a comprehensive list see Anvari et al., 2023). For instance, the repetition of questions or the idea that misinformation can be spread to many people via social media (i.e., scaling up) can explain that an effect size smaller than the threshold is still practically relevant. It is also possible that effects diminish when participants, for example, habituate to the stimulus. Expert witnesses should contextualize, explain, and justify when their reported findings are still practically relevant even when they are smaller than their justified threshold or that the effects greater than the threshold do not diminish due to, for example, habituation. Our present findings can guide the difficult task of estimating a contextualized SESOI or threshold, but we urge that discussions about the SESOI for eyewitness memory research continue as a way to see that consensus is reached among legal professionals and psychologists so that suitable thresholds and SESOIs can be defined.

Besides the type of memory error and detail, occupation also seemed to influence the threshold of the legal professionals. Specifically, prosecution lawyers and judges (Study 2) tended to permit more memory errors or would never take a legal decision or action when it concerned the misremembering of a nonincriminating detail (i.e., yellow hat) but were stricter when the memory error concerned an incriminating detail (i.e., black gun) than legal professionals in Study 1 (mostly defense lawyers). Moreover, defense lawyers tended to think that judges are reluctant to rule eyewitness evidence inadmissible (Study 1), while prosecutors and judges were more hesitant about retaining an expert witness (Study 2). However, it is important to note that the sample size in Study 2 was rather small. Future research should investigate whether these differences between occupations exist. Such differences could shed light on whether different justice positions within such a system alter the threshold for legal psychological evidence to enter the courtroom.

Another explanation for why no general consensus was found is because a nontrivial number of legal professionals in both studies indicated that they would never make any legal decisions based on the presented memory errors. One possibility is that the legal professionals did not want to disregard an entire statement even when memory errors are present due to the typically limited amount of alternative evidence available in legal proceedings (e.g., Brainerd and Reyna, 2005). It is also possible that some legal professionals indicated “never” because they held the misconception that memory works like a video recorder and therefore did not deem it plausible that such memory errors could be included in an eyewitness’s statement (e.g., Simons and Chabris, 2011). Alternatively, as some legal professionals indicated in Study 2, they might have needed more information before they would take certain legal decisions. That is, some participants indicated that in order to decide how many memory errors they would allow for a certain decision, they needed more information regarding the length of statements and delay between statements. This is related to the SESOI where its estimate depends on the context (Riesthuis, 2024). Future research could examine how the length of statements and delay between statements may alter the number of memory errors permitted before legal decisions are taken by the legal professionals.

In contrast to what we expected, many judges and prosecutors (Study 2) were more reluctant to consult a memory expert to further examine the statements’ validity after different types of memory errors occurred compared with the other legal decisions or actions. We expected that legal professionals would, at the very least, be willing to examine the validity of a statement after any of such memory errors has occurred to ensure that the eyewitness testimony can still be used reliably in court. Some legal professionals in Study 2 highlighted that consulting an expert can always be done (i.e., irrespective of the number of mistakes) but did not specify when they would do so. It is possible that legal professionals generally examine the validity of memory-based testimonies themselves and only consult expert witnesses in extreme cases which may be problematic considering that faulty eyewitness memory has played a major role in wrongful convictions (The National Registry of Exonerations, 2025).

Limitations and considerations

One limitation is that in the second study, participants completed the questionnaire on paper and the scenarios and questions were not randomized. This means that there could be an order effect (Schuman and Presser, 1996). That is, it is possible that presenting the question about whether legal professionals would consult an expert witness would be answered differently if it was presented as the first question compared with as last. In Study 1, all scenarios and questions were fully randomized, which minimizes the risk of order effects. However, using pseudo randomization to ensure a balanced presentation order across participants would have been an even more rigorous approach. Interestingly, the results of Study 1 and Study 2 were rather similar, meaning that any possible order effects might be minimal.

Another limitation is that because Study 2 was completed on paper, some participants did not provide an answer in line with the instructions (i.e., amount of memory errors or “never”), resulting in a loss of data. Interestingly, some provided rationales for why they were unable to give a specific answer which provided new insights. For example, they indicated that their response depended on the type of crime and that they would not consult an expert witness for a robbery. Our decision for the unarmed robbery scenario was because it facilitated the explanation that the eyewitness made a memory error as there was also ground truth from the camera. Nonetheless, this could indicate that legal professionals might change their threshold depending on the type of crime in a legal case.

Our study also presents issues of generalizability. That is, our samples consisted of legal practitioners in different jurisdictions and although the similarity in findings across the samples in Studies 1 and 2 lends some support for generalizability within civil law jurisdictions, we did not examine common law or accusatorial jurisdictions. In such contexts, defense practitioners may be more motivated to challenge witnesses based on memory deficiencies. Moreover, the SESOI may change based on the legal context. For instance, a lower SESOI might be warranted in criminal cases where the consequences can be detrimental (i.e., wrongful conviction leading to imprisonment of many years; see The National Registry of Exonerations, 2023) compared to civil cases.

Our findings regarding individuals’ patterns of responding suggest that beliefs about how many memory errors might warrant legal actions are highly particularized. As a result, adherence to practitioners’ views might require tailored studies and testing phenomena under the exact conditions relevant to the practitioners of interest. This does not mean that surveys must be conducted for every type of expert testimony; however, researchers should justify the relevance of the SESOI they adopt, including when using the SESOI derived from this research. Moreover, our study focused on a specific aspect of eyewitness memory research, and we encourage applied memory researchers to initiate similar discussions and conduct studies to estimate SESOIs (e.g., via anchor-based approaches, cost-benefit analyses, or consensus methods) in other related domains, such as eyewitness identification. For research intended to inform legal proceedings, we further encourage collaboration with legal professionals (Chin, 2024). Our study, in combination with the cost-benefit analysis by Otgaar et al. (2023), proposes a SESOI of one detail, which may serve as a reference point for related areas where the contextual factors align (Camilleri et al., 2022).

Conclusion

In the present studies, we examined the practical relevance of eyewitness memory through the lens of legal professionals. Although there was no clear consensus among the legal professionals for a specific amount of memory errors that would lead them to take certain legal decisions or actions, it seemed that 1–3 memory errors were deemed consequential. Also, this depended on the type of memory error, type of detail, and occupation. However, a substantial number of legal professionals indicated that they would never take any legal action after an eyewitness made memory errors. This highlights the need to further establish what the SESOI should be for eyewitness memory research, as its implications can have far-reaching consequences such as the prevention of wrongful convictions.

Characteristic	N	%
Sex
Female	34	49.3
Male	34	49.3
Other	1	1.4
Country
Belgium	24	34.8
Netherlands	45	65.2
Occupation
Judge	1	1.4
Lawyer	48	69.6
Other*	9	13.0
Paralegal	6	8.7
Prosecutor	5	7.2
Background
Civil law	11	15.9
Criminal law	51	73.9
Other	7	10.1

Scenario	N	M	SD	Median	Never
Scenario 1	307	3.21	2.13	2.90	107
Scenario 2	292	3.79	2.28	3.10	122
Scenario 3	311	2.79	2.22	2.00	103

Fixed effects	Β	SE	df	t	p
Scenario 2—intercept	3.75	.24	63.9	15.65	< .001
Scenario 1	−.45	.21	59.1	−2.18	.034
Scenario 3	−.91	.21	62.2	−4.41	< .001

Random effects	Variance	SD
Participants—intercept	3.15	1.77
Scenario 1	1.60	1.27
Scenario 3	1.65	1.28
Residual	2.05	1.43
ICC	.59
R²—Nakagawa (fixed)	.03 [.01; .05]
R²—Nakagawa (total)	.60 [.49, .66]

Legal decisions	N	Mean	SD	Median	Never
Unreliable (x1)	59	2.92	1.96	2.80	10
Inadmissible (x2)	49	3.20	2.05	3.00	20
Challenge (x3)	50	3.22	2.05	2.95	19
Judge (x4)	50	3.80	2.59	3.00	19
Expert (x5)	44	2.89	1.80	2.50	25
Average Person (x6)	55	3.23	2.18	2.90	14

Fixed effects	Β	SE	df	t	p
Unreliable (x1)—Intercept	2.95	.27	137	10.74	< .001
Inadmissible (x2)	.38	.28	247	1.38	.17
Challenge (x3)	.41	.27	247	1.50	.14
Judge (x4)	1.06	.27	247	3.89	< .001
Expert (x5)	.08	.29	252	.29	.77
Average Person (x6)	.34	.27	246	1.27	.21

Legal decisions	N	Mean	SD	Median	Never
Unreliable (x7)	57	3.59	2.17	3.10	12
Inadmissible (x8)	46	3.57	2.31	3.00	23
Challenge (x9)	47	3.92	2.27	3.90	22
Judge (x10)	46	4.56	2.71	4.00	23
Expert (x11)	42	3.41	2.04	3.00	27
Average Person (x12)	54	3.73	2.11	3.05	15

Fixed effects	Β	SE	df	t	p
Unreliable (x7)—Intercept	3.63	.29	119	12.45	< .001
Inadmissible (x8)	.18	.27	236	.66	.51
Challenge (x9)	.51	.27	237	1.86	.06
Judge (x10)	1.00	.27	236	3.64	< .001
Expert (x11)	−.33	.28	236	−1.18	.24
Average Person (x12)	.09	.26	236	.33	.74

Legal decisions	N	Mean	SD	Median	Never
Unreliable (x13)	60	2.56	2.06	1.95	9
Inadmissible (x14)	47	2.44	2.06	1.80	22
Challenge (x15)	43	2.42	1.97	1.90	26
Judge (x16)	50	3.48	2.71	2.75	19
Expert (x17)	52	2.96	2.28	2.00	17
Average Person (x18)	59	2.85	2.10	2.00	10

Fixed effects	Β	SE	df	t	p
Unreliable (x13)—Intercept	2.59	.28	120	9.12	< .001
Inadmissible (x14)	.11	.27	250	.40	.69
Challenge (x15)	.19	.27	251	.71	.48
Judge (x16)	1.18	.26	249	4.56	< .001
Expert (x17)	.36	.26	250	1.41	.16
Average Person (x18)	.33	.25	248	1.35	.18

Scenario	N	Mean	SD	Median	Never	No Response
Scenario 2	55	4.33	2.59	3.00	47	66
Scenario 3	71	2.32	1.71	2.00	39	58

Fixed effects	β	SE	df	T	p
Scenario 2—intercept	4.09	.56	18	7.32	< .001
Scenario 3	−1.83	.44	15	−4.20	< .001

Random effects	Variance	SD
Participants—intercept	4.83	2.20
Scenario 3	2.11	1.45
Residual	1.65	1.28
ICC	.65
R²—Nakagawa (fixed)	.15 [.03, .31]
R²—Nakagawa (total)	.70 [.55, .81]

Legal decisions	N	Mean	SD	Median	Never	No Response
Unreliable (x7)	10	4.00	2.36	3.00	6	12
Inadmissible (x8)	8	4.88	3.27	3.00	10	10
Challenge (x9)	7	4.29	2.69	3.00	12	9
Judge (x10)	8	4.62	1.60	4.50	9	11
Expert (x11)	7	5.86	3.02	5.00	6	15
Average Person (x12)	15	3.40	2.50	3.00	4	9

Fixed effects	β	SE	df	t	p
Unreliable (x7)—Intercept	4.04	.69	33	5.84	< .001
Inadmissible (x8)	.90	.61	38	1.48	.15
Challenge (x9)	.89	.64	38	1.40	.17
Judge (x10)	.38	.62	39	.62	.54
Expert (x11)	1.71	.69	40	2.49	.02
Average Person (x12)	−.84	.57	41	−1.48	.15

Legal decisions	N	Mean	SD	Median	Never	No Response
Unreliable (x13)	13	2.00	1.41	2.00	5	10
Inadmissible (x14)	11	2.64	2.11	2.00	9	8
Challenge (x15)	9	1.89	0.78	2.00	10	9
Judge (x16)	13	2.46	1.56	2.00	6	9
Expert (x17)	9	3.33	2.83	3.00	6	13
Average Person (x18)	16	1.94	1.24	1.50	3	9

Fixed effects	β	SE	df	t	p
Unreliable (x13)—Intercept	1.98	.42	39	4.75	< .001
Inadmissible (x14)	.66	.38	53	1.74	.09
Challenge (x15)	.56	.40	53	1.38	.17
Judge (x16)	.45	.36	53	1.24	.22
Expert (x17)	1.29	.42	55	3.03	.004
Average Person (x18)	−.20	.36	56	−.54	.59

Rationale	N	%
- Depends on other factors for the errors (e.g., type, amount, circumstance, relevance, importance)	9	27.3
- Depends on other factors concerning the statement (e.g., length, importance, delay, and events between statements)	8	24.2
- Depends on the crime	4	12.1
- Depends on other evidence (not specified)	3	9.1
- Depends on consistency	2	6.1
- Depends on whether witness clearly lies	2	6.1
- Other	5	15.2

Footnotes

ORCID iDs

Paul Riesthuis

Eric Rassin

Funding

The current manuscript has been supported by a FWO post-doctoral fellowship grant to the first author (1203824N), a FWO PhD fellowship grant (11K3121N) to the third author, and a FWO Research Project grant (G0D3621N) awarded to the last author.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

All data, code, materials, and rMarkdown files for the data analysis with interactive plots and additional analyses are openly available on the Open Science Framework at https://osf.io/nt82u/.

Notes

References

Anvari

Kievit

Lakens

, et al. (2023) Not all effects are indispensable: Psychological science requires verifiable lines of reasoning for whether an effect matters. Perspectives on Psychological Science 18: 503–507.

Anvari

Lakens

(2021) Using anchor-based methods to determine the smallest effect size of interest. Journal of Experimental Social Psychology 96: 104159.

Bakker

Van Dijk

Wicherts

(2012) The rules of the game called psychological science. Perspectives on Psychological Science 7: 543–554.

Bartoš

Maier

Shanks

, et al. (2023) Meta-analyses in psychology often overestimate evidence for and size of effects. Royal Society Open Science 10: 230224.

Bates

Mächler

Bolker

, et al. (2015) Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1): 1–48.

Bonini

Di Paolo

Bagnasco

, et al. (2020) Minimal clinically important difference for asthma endpoints: an expert consensus report. European Respiratory Review 29(156): 190137.

Brainerd

Reyna

(2005) The Science of False Memory. Oxford University Press.

Byrne

(2019) Increasing the impact of behavior change intervention research: Is there a role for stakeholder engagement? Health Psychology 38(4): 290–296.

Camilleri

Beribisky

Cribbie

(2022) The minimally meaningful effect size: A vital component of pre-registrations. PsyArxiv. https://doi.org/10.31234/osf.io/jbgtm.

10.

Chin

(2014) Psychological science’s replicability crisis and what it means for science in the courtroom. Psychology, Public Policy, and Law 20: 225–238.

11.

Chin

(2023) Law and psychology must think critically about effect sizes. Discover Psychology 3: 3.

12.

Chin

(2024) Why applied psychologists should consider their work’s value-laden context. In: Forscher

Schmidt

(eds) A Better How: Notes on Developmental Meta-Research. Busara, pp.168–174.

13.

Chin

Neal

TMS

(2023) Further caution is required on what memory experts can reliably say. Forensic Science International: Mind And Law 4: 100113.

14.

Cohen

(1988) Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Erlbaum.

15.

Correll

Mellinger

McClelland

, et al. (2020) Avoid Cohen’s “small”, “medium”, and “large” for power analysis. Trends in Cognitive Sciences 24: 200–207.

16.

Fisher

Brewer

Mitchell

(2009) The relation between consistency and accuracy of eyewitness testimony: Legal versus cognitive explanations. In: Bull

Valentine

Williamson

(eds) Handbook of Psychology of Investigative Interviewing: Current Developments and Future Directions. Wiley Blackwell, pp.121–136.

17.

Granhag

Ask

Rebelius

, et al. (2013) ‘I saw the man who killed Anna Lindh!’ An archival study of witnesses’ offender descriptions. Psychology, Crime & Law 19: 921–931.

18.

Gruijters

Peters

GJY

(2020) Meaningful change definitions: Sample size planning for experimental intervention research. Psychology & Health 37: 1–16.

19.

Howe

Knott

(2015) The fallibility of memory in judicial processes: Lessons from the past and their modern consequences. Memory 23(5): 633–656.

20.

Jaeschke

Singer

Guyatt

(1989) Measurement of health status: Ascertaining the minimal clinically important difference. Controlled Clinical Trials 10(4): 407–415.

21.

Klein

Vianello

Hasselman

, et al. (2018) Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science 1(4): 443–490.

22.

Lakens

McLatchie

Isager

, et al. (2020) Improving inferences about null effects with Bayes factors and equivalence tests. The Journals of Gerontology: Series B 75: 45–57.

23.

Lakens

Scheel

Isager

(2018) Equivalence testing for psychological research: A tutorial. Advances in Methods and Practices in Psychological Science 1: 259–269.

24.

Lemay

Tulloch

Pipe

, et al. (2019) Establishing the minimal clinically important difference for the hospital anxiety and depression scale in patients with cardiovascular disease. Journal of Cardiopulmonary Rehabilitation and Prevention 39: E6–E11.

25.

Loftus

(2005) Planting misinformation in the human mind: A 30-year investigation of the malleability of memory. Learning & Memory 12: 361–366.

26.

McGlothlin

Lewis

(2014) Minimal clinically important difference: Defining what really matters to patients. JAMA 312(13): 1342–1343.

27.

Meehl

(1967) Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science 34: 103–115.

28.

Murphy

Myors

(1999) Testing the hypothesis that treatments have negligible effects: Minimum-effect tests in the general linear model. Journal of Applied Psychology 84: 234–248.

29.

Nosek

Hardwicke

Moshontz

, et al. (2022) Replicability, robustness, and reproducibility in psychological science. Annual Review of Psychology 73: 719–748.

30.

Otgaar

Howe

Dodier

(2022a) What can expert witnesses reliably say about memory in the courtroom? Forensic Science International: Mind And Law 3: 100106.

31.

Otgaar

Riesthuis

Neal

TMS

, et al. (2023) If generalization is the grail, practical relevance is the nirvana: Considerations from the contribution of psychological science of memory to law. Journal of Applied Research in Memory and Cognition 12: 176–179.

32.

Otgaar

Riesthuis

Ramaekers

, et al. (2022b) The importance of the smallest effect size of interest in expert witness testimony on alcohol and memory. Frontiers in Psychology 13: 980533.

33.

Panzarella

Beribisky

Cribbie

(2021) Denouncing the use of field-specific effect size distributions to inform magnitude. PeerJ 9: e11383.

34.

Radvansky

Doolen

Pettijohn

, et al. (2022) A new look at memory retention and forgetting. Journal of Experimental Psychology: Learning, Memory, and Cognition 48(11), 1698–1723.

35.

Riesthuis

(2024) Simulation-based power analyses for the smallest effect size of interest: A confidence-interval approach for minimum-effect and equivalence testing. Advances in Methods and Practices in Psychological Science 7(2). https://doi.org/10.1177/25152459241240722.

36.

Riesthuis

Mangiulli

Broers

, et al. (2022) Expert opinions on the smallest effect size of interest in false memory research. Applied Cognitive Psychology 36: 203–215.

37.

Riesthuis

Otgaar

(2024a) An overview of the replicability, generalizability and practical relevance of eyewitness testimony research in the Journal of Criminal Psychology. Journal of Criminal Psychology. 15(2). https://doi.org/10.1108/JCP-04-2024-0031.

38.

Riesthuis

Otgaar

(2024b) The nature of ROC practices in eyewitness memory research. PsyArxiv, https://doi.org/10.31234/osf.io/qpmh5.

39.

Saks

Koehler

(2005) The coming paradigm shift in forensic identification science. Science 309: 892–895.

40.

Scheel

Tiokhin

Isager

, et al. (2021) Why hypothesis testers should spend less time testing hypotheses. Perspectives on Psychological Science 16: 744–755.

41.

Schuman

Presser

(1996) Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. Sage.

42.

Simons

Chabris

(2011) What people believe about how memory works: A representative survey of the US population. PloS one 6(8): e22757.

43.

Smeets

Candel

Merckelbach

(2004) Accuracy, completeness, and consistency of emotional memories. The American Journal of Psychology 117(4): 595–609.

44.

Staniszewska

Haywood

Brett

, et al. (2012) Patient and public involvement in patient-reported outcome measures: Evolution not revolution. The Patient-Patient-Centered Outcomes Research 5: 79–87.

45.

Takarangi

Parker

Garry

(2006) Modernising the misinformation effect: The development of a new stimulus set. Applied Cognitive Psychology 20: 583–590.

46.

The National Registry of Exonerations (2025) Exonerations in the United States Map. Exonerations in the United States map. Available at: www.law.umich.edu/special/exoneration/Pages/Exonerations-in-the-United-States-Map.aspx.

47.

Thompson-Cannino

Cotton

Torneo

(2009) Picking Cotton: Our Memoir of Injustice and Redemption. Macmillan.

48.

Tullett

(2022) The limitations of social science as the arbiter of blame: An argument for abandoning retribution. Perspectives on Psychological Science 17(4): 995–1007.

49.

van der Heijde

Lassere

Edmonds

, et al. (2001) Minimal clinically important difference in plain films in RA: Group discussions, conclusions, and recommendations. OMERACT imaging task force. The Journal of Rheumatology 28: 914–917.

50.

Yarkoni

(2022) The generalizability crisis. Behavioral and Brain Sciences 45: e1.

Random effects	Variance	SD
Participants—intercept	2.54	1.60
Residual	1.98	1.41
ICC	.56
R²—Nakagawa (fixed)	.03 [.01, .06]
R²—Nakagawa (total)	.57 [.45, .70]

Random effects	Variance	SD
Participants—intercept	3.09	1.76
Residual	1.86	1.36
ICC	.63
R²—Nakagawa (fixed)	.03 [.02, .07]
R²—Nakagawa (total)	.64 [.55, .74]

Random effects	Variance	SD
Participants—intercept	3.18	1.78
Residual	1.79	1.34
ICC	.64
R²—Nakagawa (fixed)	.03 [.01, .07]
R²—Nakagawa (total)	.65 [.54, .76]

Random effects	Variance	SD
Participants—intercept	4.76	2.18
Residual	1.59	1.26
ICC	.75
R²—Nakagawa (fixed)	.10 [.06, .25]
R²—Nakagawa (total)	.78 [.58, .89]

Random effects	Variance	SD
Participants—intercept	1.88	1.37
Residual	.83	.91
ICC	.69
R²—Nakagawa (fixed)	.08 [.03, .21]
R²—Nakagawa (total)	.72 [.58, .85]

Through the lens of legal professionals: Examining the smallest effect size of interest for eyewitness memory research

Abstract

Keywords

Smallest effect size of interest and eyewitness memory

Approaches to establishing the smallest effect size of interest

The present studies

Study 1

Method

Participants

Design and materials

Procedure

Data analysis

Results

Differences between scenarios

Scenario 1: Eyewitness misremembers the color of the jacket (i.e., misremembering a blue jacket as a green jacket)

Scenario 2: Eyewitness misremembers entirely new detail (i.e., yellow hat)

Scenario 3: Eyewitness misremembers entirely new incriminating detail (i.e., black gun)

Discussion

Study 2

Method

Participants

Design and materials

Procedure

Results

Differences between scenarios

Scenario 2: Eyewitness misremembers entirely new detail (i.e., yellow hat)

Scenario 3: Eyewitness misremembers entirely new incriminating detail (i.e., black gun)

Rationales for no response

Discussion

General discussion

Limitations and considerations

Conclusion

Footnotes

ORCID iDs

Funding

Declaration of conflicting interests

Data availability statement

Notes

References