Sage Journals: Discover world-class research

Abstract

Previous research has indicated that investigative interviews with adult victims and witnesses are often of low quality. Earlier findings also indicate that interviewing skills exhibit improvement when the interviewers are given feedback on their performance. Technological innovations make it possible to implement such an approach in a scalable manner. Simulated interviews with child avatars have shown that repeated feedback improves the proportion of recommended questions in those interviews. We created adult witness avatars (AWA) to simulate investigative interviews with adult victims and witnesses. We examined whether avatar interviews coupled with feedback (vs no feedback) would result in improvements in interview quality. Of the 60 participants, half received process feedback after each of four simulated interviews. The avatars revealed predefined memories and made errors as a function of algorithms formulated based on previous empirical research on the response behavior of adult witnesses in experimental studies. Results showed that receiving feedback after the simulated interviews increased the proportion of recommended questions (free recall and open questions) in avatar interviews compared with not receiving feedback (90.3% vs 72.6%, respectively). There was also a significant correlation between the question types and accuracy of details, even when analyzed separately for feedback and control groups. We demonstrated that with feedback after each AWA interview, the proportion of recommended questions was significantly higher in the AWA group than in the control group. The implications for practice are discussed.

Keywords

Serious games training with virtual reality interview training investigative interviewing avatars

Witness and victim statements (hereafter witness statements for the sake of brevity) are frequently the sole evidence available to authorities (Powell et al., 2005). The criminal justice system considers eyewitness testimony largely factual despite psychological research showing that witness reports may be misleading even while appearing credible (Brewer et al., 2005). Irrespective of the witnesses’ age and background, their statements may be influenced by inappropriate interviewing style, possibly resulting in miscarriages of justice.

When witnesses have to give evidence, they are asked to reconstruct and recount personally experienced episodes that have occurred in a particular temporal and spatial context (Tulving, 1993). The reconstructive nature of the memory leaves recollections susceptible to external influences, such as questions asked, as shown by Loftus and Palmer (1974). In their study (Experiment 1), participants were shown several films of traffic accidents. Afterwards, the participants were asked to estimate the speed of the cars in the films. However, the researchers manipulated the wording of the question (using words like “smashed” or “contacted” among three other words) and found that this affected the participants’ speed estimates. Specifically, when the word “smashed” was used, participants estimated higher speeds than when the word “contacted” was used. Thus, the wording and framing of questions are crucial in eliciting accurate information, because they can significantly influence the reconstruction of memories.

Research so far has identified certain elements of police interviews, including questions asked, that promote factual and detailed eyewitness statements (Bull and Blandon-Gitlin, 2019). To increase the quality and quantity of information elicited from eyewitnesses, it is recommended to use free recall questions (“Tell me what happened!” or cued recall “Tell me more about the car!”), open-ended questions (“Where were you?”) and “facilitators” (e.g., “Go on!”) with both children (Orbach et al., 2000; Roberts et al., 2004) and adults (Geiselman et al., 1985; Valentine and Maras, 2011; Webster et al., 2021). Conversely, the use of option-posing (“Did he push you once or twice?”), repeated (repeating a question with the exact same wording), suggestive (“You were there, weren’t you?”) or misleading questions (“The car was green, wasn’t it?”; Lamb et al., 2018) is not recommended in investigative interviews (Powell et al., 2005). Although it may seem straightforward which types of questions are recommended for use in police interviews, the categorization of the question types can be complex because of the various ways of distinguishing between them (Oxburgh et al., 2010). Question types can be dichotomized (Nunan et al., 2020) as either productive or unproductive (Griffiths and Milne, 2006), and appropriate or inappropriate (Phillips et al., 2012). For example, closed questions can be both appropriate and inappropriate, depending on the context. They are appropriate when used at the end of a topic where open and probing questions have been exhausted. However, they can be inappropriate if used prematurely in the interaction, resulting in a closing down of the range of responses (Nunan et al., 2020).

In addition to general recommendations, it is crucial to understand how different question types affect the responses of adults. Studies have shown that the answers of adults are affected by question types (Lipton, 1977; Oxburgh et al., 2010; Sutherland and Hayne, 2001). It has been determined that, in general, adults are most accurate during free recall (Allwood et al., 2008; Cassel et al., 1996; Lipton, 1977), and compared with non-recommended questions that reduce accuracy (Kebbell and Johnson, 2000), open questions lead to more accurate accounts (Ibabe and Sporer, 2004; Lipton, 1977). More research with adults has been done on the effect of non-recommended and leading questions. Regarding non-recommended questions, it has been shown that option-posing (Ibabe and Sporer, 2004), repeated (Sharps et al., 2012), negative and double-negative (Wade and Spearing, 2022), and complex question forms (Chrobak et al., 2015; Jack and Zajac, 2014; Valentine and Maras, 2011) impair accuracy of answers. Adults even tend to speculate and provide answers to specific questions that are unanswerable (Waterman et al., 2001) or that they have no information on (Poole and White, 1991). Although the effect of misleading and suggestive questions on the accuracy of answers is stronger in children (Cassel et al., 1996; Roebers and Schneider, 2000), and older adults (Saunders and Jess, 2010; but see West and Stone, 2014), adults are also susceptible to misleading information in questions (Molyneaux and Larsen, 1992; Roebers and Schneider, 2000; Shapiro et al., 2005). Both closed and open presumptive questions have been shown to generate misinformation effects (Bowles and Sharman, 2014), but open leading questions (e.g. “Tell me about the shotgun” when a shotgun has not been mentioned by the witness) may be even more dangerous (Brubacher et al., 2020; Sharman et al., 2015). Thus, police officers need to be mindful of the questions asked in interviews with adults as well.

Launay and Py (2015) point out that relatively little work so far has focused on the methods and goals for investigative interviewing of adult witnesses. Fisher et al. (1987) conducted the first systematic study of such interviews, analyzing 11 interviews conducted by police officers with adults and children. They found that the interviews had little structure and that the interviewers used many closed as well as negative or leading questions. Launay and Py (2015) in their overview of eight studies of investigative interviews with adult witnesses noted that free recall was rarely used (see Ginet and Py, 2001; Wright and Alison, 2004) and that the collection of facts typically began with focused Wh- (e.g. Who, What, Where) questions. They concluded that interviewers continue to use more questions that are deleterious to witness’ memory recall than non-deleterious ones, even after being trained in interviewing techniques. This suggests that investigative interviewing practices may be difficult to modify with investigators maintaining strategies and tactics learned in the field using a series of specific, mostly closed, questions. Thus, it seems that transferring theoretical knowledge of best-practice guidelines into real-life interviews is surprisingly difficult (Sternberg et al., 2001).

Theory-based training remains the most common training format for investigators (Pompedda, 2018). Although the traditional classroom-based training model improves participants’ knowledge of interviewing skills, it is not successful in translating this theoretical knowledge into practical skills (Lamb, 2016). To overcome this problem, serious games in virtual environments combined with feedback have been proposed as a solution to improve the quality of child sexual abuse interviews (Benson and Powell, 2015; Powell et al., 2016).

Two components foster learning: high realism and immediate feedback. Virtual environments are computer simulations that represent activities at a high degree of realism (see Witmer and Singer, 1998). In investigative psychology, simulations in virtual environments have involved three-dimensional (3D) avatars on a computer screen (Pompedda et al., 2022) as well as virtual reality (VR) in which participants interact with 3D avatars in a 3D virtual environment using a VR headset (Baugerud et al., 2021; Taylor and Dando, 2018). In the interviewing of witnesses, Taylor and Dando (2018) have found advantages of gathering witness information in virtual environments where episodic performance improved with a significant reduction in errors. In addition, research into the use of structured interview protocols with children has found training to be effective when feedback is provided on a continuous basis and is detailed and immediate (Sternberg et al., 2002). It seems that learning the skills to elicit a narrative account effectively requires time and practical exercises that include personalized feedback (Sternberg et al., 2002).

Regarding immediate and detailed feedback, Pompedda and colleagues (2022) conducted a mega-analysis of the effects of feedback on the quality of simulated child sexual abuse interviews with avatars. Their analysis, involving 2208 interviews, showed that feedback increased recommended questions and decreased non-recommended questions, improving the quality of details elicited from the avatar, and resulted in more correct conclusions regarding the suspected “abuse”. Thus, it provided overall strong support for immediate feedback during avatar training. In the studies included in this mega-analysis, interview quality was considerably improved in just one hour when the interviewers were given feedback on their performance after each completed interview with child avatars (Haginoya et al., 2020; Krause et al., 2017; Pompedda, 2018; Pompedda et al., 2015, 2017, 2021).

Importantly, Pompedda et al. (2021) found evidence for transfer with those interviewers who were provided feedback during the avatar training–—these interviewers asked more recommended questions in interviews with actual children who had experienced a mock event. Similarly, Kask and colleagues (2022) found evidence for transfer to actual police interviews with child victims. Because investigative interviewing is cognitively demanding, serious games are ideal to increase training effect transfer in the light of theoretical frameworks for understanding how these effects can be facilitated (Blume et al., 2010). Instead of reading a guideline or hearing an explanation from an expert, participants learn about the negative effects of using suggestive questions and the positive effects of using open questions through their questioning and the feedback given to the questions they ask. Taken together, serious games in virtual environments that lead to high realism together with feedback may be more successful than traditional training models in training practical skills.

Current study

Because previous research with child avatars has shown that repeated feedback in simulated investigative interviews improves the proportion of recommended questions used, and that the quality of investigative interviews with adults is lacking, we decided to develop a computer-based solution to train investigators (police officers, prosecutors, judges, and lawyers) in interviewing adult witnesses (see Tohvelmann and Kask, 2022 for a review). In addition, recent guidelines, the Mendez Principles (Principles on effective interviewing for investigations and information gathering, 2021) and United Nation's manual on investigative interviewing for criminal investigation (United Nations, 2024) both indicate that asking questions that enhance more accurate answers are extremely important in implementing investigative interviewing best practices all over the world (similarly to PEACE model in England and Wales, and similar models elsewhere. (Halley et al., 2023).

In this study, we examined whether training with adult witness avatars (AWAs) coupled with feedback would result in improvements in interview quality compared with receiving no feedback. To achieve our aim, we posed two hypotheses and a study question. First, we hypothesized that receiving feedback regarding questions asked results in a larger proportion of recommended questions asked compared with not receiving feedback. More specifically, (a) receiving feedback results in a larger number of recommended questions asked compared with not receiving feedback; and (b) receiving feedback results in a smaller number of non-recommended questions asked compared with not receiving feedback (Hypothesis 1). Second, we hypothesized that over four consecutive interviews, those subjects receiving feedback after each subsequent interview will have a larger increase in the proportion of recommended questions compared with those not receiving feedback (Hypothesis 2). We also posed a study question in which we expected recommended questions to be associated with more correct and fewer incorrect details elicited from the avatars and non-recommended questions to have the reverse pattern. We also conducted analyses using age, gender and education as covariates to ensure that any significant effects were not due to possible random differences in participants’ background or previous experiences in the two groups given the relatively small sample size. Although not a substantive finding, this pattern would show that the algorithms driving the avatar responses are working as they should, because this is the pattern found in actual witness interviews (Tohvelmann and Kask, 2022).

Method

The desired sample size, included variables, hypotheses and planned analyses for the study were pre-registered on AsPredicted.com (https://aspredicted.org/blind.php?x=JQN_DX3) prior to any data being collected. The data set containing all primary study variables can be obtained from the authors.

Participants

A total of 60 participants (age M = 33.53, SD = 10.47, range 19–61, 18 male) took part in the study. The interviews were conducted in Estonian; 59 participants were native Estonian speakers and 1 was a native Russian speaker, who spoke Estonian freely. The highest level of education for 2 participants (3.3%) was basic education, for 15 (25%) was secondary education and for 21 (20%) was vocational education, 21 (35%) had a bachelor's degree and 10 (16.7%) had a master's degree. The participants were recruited by posting advertisements on university internal and external communication channels including social media. The participants had no background in either psychology or law, and had conducted no prior investigative interviews. The participants did not receive any reward for their participation. The feedback and control groups did not differ significantly in terms of participant gender, χ²(1) = .00, p = 1.000, age, t(58) = .811, p = .20 or education level, χ²(4) = 2.45, p = .69.

Power analysis [repeated-measures analysis of covariance (ANCOVA) within–between interaction] on the entire sample (N = 60) was conducted using the software package GPower (Faul et al., 2009) with the recommended effect sizes being small (w = 0.10), medium (w = 0.25), and large (w = 0.40; Cohen, 1988). The alpha level used for this analysis was p < .05. The analyses indicated that a statistical power of.41 is needed to detect a small effect,.99 to a medium effect, and 1.00 to a large effect. In sum, there was adequate power at the medium and large effect sizes, but less than adequate statistical power to detect a small effect size.

Study design

We used a between-subjects design with two groups: (a) the feedback group (n = 30), in which participants received process feedback after interviewing each avatar; and (b) the control group (n = 30), in which participants did not receive any process feedback. Each participant performed four interviews with avatars. The four avatars were the same for each participant, but we randomized the order of avatars between the participants.

Materials

For this study, four (two male and two female; three victims and one bystander) AWAs were created based on real-life crime scenarios in which either a victim or a bystander witnessed a criminal event where the offenders were either familiar or not familiar to the witness or victim. In all events, the victim or witnesses were sober. The scenarios were of different types of crimes (Table 1) retrieved from the county court verdicts (Court Verdict Database, 2023 https://www.riigiteataja.ee/kohtulahendid/koik_menetlused.html). The scenarios were anonymized. For each scenario we created a list of details based on the information described in the court verdicts. These constituted the AWAs memories.

Table 1.
Adult witness avatar case scenarios.

Domestic violence A woman reports domestic violence in the apartment next door in a large apartment building, alleging that a girl's health and well-being is in danger

Fraud An older man reports that he cannot pay with his debit card because there is not enough money on his account and he has also discovered that his driver's license is missing from his wallet

Physical violence A witness reports that a woman has been physically attacked on the street

Robbery A security guard reports that two men left a store without paying for a teddy bear, then one of them punched the security guard in the face

AWA images were created morphing different images of real adults. Using website “SitePal”, first audio clips for the avatar answers were recorded in Estonian by three persons (two male, one female), and then merged with the images to create animated video clips containing all the predefined answers of the AWAs.

We created response algorithms based on a systematic review of experimental studies about how actual adult witnesses behave during interviews. The algorithms were finalized based on 53 studies identified in the systematic review as containing relevant empirical findings, see examples in Table 2 (Allwood et al., 2006, 2008; Bärthel et al., 2017; Bjorklund et al., 2000; Bonham and González-Vallejo, 2009; Boon and Noon, 1994; Boon et al., 2020; Brackmann et al., 2017; Brubacher et al., 2020; Buratti et al., 2014; Carol et al., 2021; Cassel and Bjorklund, 1995; Cassel et al., 1996; Collins et al., 2002; Dahl et al., 2015; Dando et al., 2011; De La Fuenter Vilar et al., 2020; Eastwood et al., 2019; Eisen et al., 2002; Evans and Fisher, 2011; Gawrylowicz et al., 2014, 2019; Ginet et al., 2014; Hagsand et al., 2013; Hope et al., 2014; Ibabe and Sporer, 2004; Jack et al., 2014; Karlen et al., 2017; Kebbell and Johnson, 2000; Knutsson et al., 2011; Krix et al., 2014, 2016; Ma et al., 2021; Matsuo and Miura, 2016; Murnikov and Kask, 2021; Nahouli et al., 2021; Rechdan et al., 2017; Roebers, 2002; Roebers and Fernandez, 2002; Roebers and Howie, 2003; Roebers and McConkey, 2003; Roebers and Schneider, 2000, 2001, 2005; Roebers et al., 2001, 2007; Saraiva et al., 2020; Sarwar et al., 2011; Schreiber Compo et al., 2017; Scoboria et al., 2008, 2013; Wang et al., 2014; Wysman et al., 2014). Use of the algorithms allowed a realistic simulation of how a real adult would respond to a certain question type (i.e., free or cued recall, open, closed, forced-choice, suggestive or misleading questions). Potential responses consisted of answers containing correct details (mentioned by the victims and witnesses based on the information from actual court verdicts) and incorrect details (one or two different options) were created. For example, based on the calculations regarding accurate responses to certain question types from the literature review, if an interviewer asks a free recall question, then in 89% of the cases the algorithm would launch a correct answer (e.g., “I did not see them at first because they were not passing by the cashiers”) and in 11% of the cases the algorithm would launch one of the two incorrect answers (e.g., “I had seen them earlier in the store” or “I didn’t see him as he was near the cashiers”).

Table 2.
Examples of details of the studies the algorithms are based.

Reference Question type Accuracy (%) No. of participants Age of participants Delay in days Source of video Source of recall

Ibabe and Sporer (2004) Open 74.00 20 19 0 Video Written

Forced-choice 66.50 22 Video Written

Closed 73.00 20 Video Written

Murnikov and Kask (2021) Leading 76.00 30 26 7 Live Oral

Misleading 76.00 30 Live Oral

Nahouli et al. (2021) Free recall 87.00 20 23 1 Video Oral

Cued recall 66.00 20 Video Oral

Procedure

Participants were tested individually in Tallinn University Experimental Psychology Laboratory. All participants signed an informed consent form before taking part in the study. The participants were informed that “the aim of the study is to examine the AWA software which is created to increase the quality of investigative interviews. The results of the study will provide us information about if and how effective this software could be in increasing interviewers’ interviewing skills of adult witnesses in criminal justice system”. The participants were informed that they could withdraw from the study anytime during the experiment.

Participants were seated in front of a 13.3-inch Lenovo ThinkPad X13 laptop computer where the videos of the AWAs were displayed. First, participants had to read a description of the best practices of investigative interviewing of adults and answer two control questions about what they read. If they answered incorrectly, the same questions were asked again until both questions were answered correctly (Appendix A).

Each participant interviewed four different avatars. Before interviewing each avatar, participants read a short description of the criminal case of the AWA to be interviewed (e.g., “a security guard reported that two men left a store without paying for a teddy bear, then one of them punched the security guard in the face”). The participants then had 10 minutes to interview the AWA to find out what happened. If they were satisfied that they had found out what had happened, they could finish the interview earlier. The participants’ questions were audio recorded. To ask a question, participants had to press a red button that initiated recording of the question. After asking a question, they had to press a black square that finished recording the question (Figure 1). The operator then manually coded the question type (free or cued recall, open, closed, forced-choice, suggestive or misleading questions) and the detail type [general information (yes, no, don’t know, etc.), avatar personal information (name, etc.), location, time, action (who did what), objects (e.g., knife), offender characteristics, victim/witness characteristics and a way how the crime was committed (e.g., he hit with a knife)] the question addressed. The software then launched a video clip with the avatar's response, which included either correct or incorrect details, depending on the question asked. If there were two possible incorrect answers available, the operator manually chose which answer to play depending on the storyline (but not accuracy) of previous AWA answers. At the end of the interview, the participants had to report what they believed had happened to the AWA.

Figure 1.
Adult witness avatar working mechanism.

After interviewing each avatar, the control group (n = 30) proceeded to learn the story of the next avatar and started the next interview. Participants in the feedback group (n = 30) received feedback on four questions (two recommended and two non-recommended questions) they had used during that interview. Feedback was provided in a way that covered as many question types as possible. Specifically, if the interviewer used several different new question types during the interview, these were prioritized during the feedback session. Regardless, feedback was always provided for at least two recommended and two non-recommended questions. However, if the interviewer did not ask any questions of a particular type at all (either recommended or non-recommended), feedback was given on only two questions. As a result, the number of questions receiving feedback could vary among participants. Example feedback to recommended questions would be “You asked the avatar, ‘Where were you?’ which is an open question; carry on with these types of questions”. For non-recommended questions, feedback might be “You asked ‘Was there one or two persons?’ which is an option-posing question; try to phrase this question in a more open manner next time”.

The experiment lasted for about 90 minutes. At the end of the experiment, all participants were briefed about the aim of the study.

Coding of interviews and participants’ reports

During the interviews with the AWAs, the operator (first author) coded all the questions asked by the participants. The following schema (Table 3), which is based on the literature review, was used. The coded question types were saved by the program together with the correct and incorrect details that the AWAs elicited in response to the questions.

Table 3.
Question categories with definitions and examples.

Category Definition Examples

Recommended questions

Free recall These are invitations that help the interviewee to provide a free recall response, without any influence by the interviewer “Tell me everything”“Tell me all you remember”

Cued recall These invitations are similar to free recall, but related to a previous statement elicited from the interviewee “You mentioned a car. Tell me everything about the car”

Open Open-ended non-suggestive questions that focus the interviewee's attention on a detail and ask for a focalized explanation (usually Wh- questions) “What does it mean?”“Where did you go with her?”

Non-recommended questions

Closed These questions focus on the details the interviewee may have mentioned. Closed questions do not imply a particular type of response, although they can be answered “yes” or “no” “Did it happen in the kitchen?”

Forced-choice These questions provide possible choices that the interviewee has to choose from “Did it happen once or twice or three times?”

Suggestive The interviewer strongly communicates what kind of response is expected using details that the interviewee may have mentioned before “She touched you, didn’t she?” “I know that someone touched you, tell me who it was!”

Misleading The interviewer strongly communicates what kind of response is expected but mentions a detail that does not emerge from the interviewee's previous responses “So I understand that car was red” (if the color of the car was not mentioned before and is not known)

Note. Modified from table 4 in Pompedda (2018).

After the interviews, an independent research assistant re-coded the question type classifications (recommended or not recommended) for a random sample of 40 interviews. Transcripts of the interviews were used for this purpose. The interrater reliability of the codings was Cohen's κ = .91 (95% CI.86 to.96), p < .001.

Participants’ reports of what had happened to the AWAs were also coded as accurate (the details reported by the participant matched with the details reported by the avatars) or inaccurate (participant reported something not mentioned in the accurate avatar answer files). The interrater reliability for the accuracy codings of these reports was calculated using 10 randomly selected participants’ reports. These were coded by two coders and reliability was analyzed using Kendall tau analysis, the interrater reliability was τ = .988, p < .01.

We counted the number of correct and incorrect responses produced by the avatars for each participant, and also calculated the proportion of correct avatar responses out of all of the potential correct avatar responses that were available in the system.

Statistical analyses

The dependent measures for testing hypotheses 1 and 2 were the proportion of recommended questions. In addition, the dependent measures for Hypothesis 1 were the number of recommended questions and the number of non-recommended questions. Free recall, cued recall, and open questions were grouped as recommended questions; closed, forced-choice, suggestive, and misleading questions were grouped as non-recommended questions. The proportions of recommended questions were calculated by dividing the number of recommended questions by the sum of recommended and non-recommended questions in each interview.

To evaluate how receiving feedback affects the questions asked, we conducted three separate 2 (Group: Feedback × Control; Between-Subjects) × 4 (Time: Interviews 1–4; Within-Subjects) repeated-measures ANCOVAs with either the proportion of recommended questions, the number of recommended questions, and the number of non-recommended questions as dependent variables. When Mauchly's test of sphericity was significant, we used a Greenhouse–Geisser correction. The analyses were conducted using SPSS version 27 with the p level adjusted to < .05. To test Hypothesis 1, we were interested in the Group (Feedback vs Control) main effect. To test Hypothesis 2, we were interested in the interaction effect of Group (Feedback vs Control) and Time (Interviews 1–4) on the proportion of recommended questions. When the interaction effect was statistically significant, we conducted pairwise comparisons with a Bonferroni correction. All these analyses were controlled for gender, age, and the level of education to explore the robustness of the finding. Although not in our preregistration report, we conducted three 2 (Group: Feedback vs. Control; Between-Subjects) × 4 (Time: Interviews 1–4; Within-Subjects) mixed analyses of variance with the aforementioned dependent variables without controlling for covariates. We report these analyses first to create a baseline to examine whether the demographic variables (age, gender, education) influence the results.

To analyze the study question (associations between question types and accuracy of the details) we used generalized estimating equations with a binomial logit model given that this allowed us to control for dependencies between the behavior of the same participant over all four interviews and also for the behavior of the same participant within one interview. We performed these analyses separately with the feedback and control groups.

Ethics

The Board of Research Ethics at Tallinn University (Estonia) approved the study before the data collections commenced. The study was conducted following the guidelines of the Declaration of Helsinki.

Results

Descriptive analyses

The average length of the avatar interviews from the first to the last question asked was 564 s (SD = 48, range 310–600 s). There were no differences in interview length between the feedback and control groups, t(238) = .70, p = .242 (feedback M = 566, SD = 48 vs control M = 562, SD = 47).

Effect of feedback on question type without controlling for covariates

The descriptive statistics for the number of recommended and non-recommended questions and the proportion of recommended questions in both the feedback and control group and for each interview are presented in Table 4.

Table 4.
Differences in interview quality between feedback and control group.

Interview Recommended Not recommended Proportion of recommended

M SD M SD M SD

1 Feedback 15.17 4.20 3.57 3.37 81.20 15.68

Control 13.13 4.30 5.40 3.84 71.34 18.91

Total 14.15 4.34 4.48 3.70 76.27 17.93

2 Feedback 17.57 3.44 1.80 2.16 91.33 9.50

Control 14.27 4.42 4.83 3.19 75.68 14.71

Total 15.92 4.26 3.32 3.10 83.51 14.59

3 Feedback 19.70 1.99 1.40 1.71 93.86 6.60

Control 15.07 4.93 6.13 3.47 70.70 16.58

Total 17.38 4.40 3.77 3.61 82.28 17.11

4 Feedback 20.53 3.87 1.13 1.52 94.95 6.75

Control 15.37 4.61 5.67 3.19 72.57 16.59

Total 17.95 4.96 3.40 3.37 83.76 16.88

Overall Feedback 18.24 4.02 1.97 2.47 90.33 11.56

Control 14.46 4.59 5.51 3.42 72.57 16.67

First, we analyzed the results concerning the number of recommended questions. Mauchly's test of sphericity was not significant, so sphericity was assumed. The analysis showed that there was a significant main effect of Group, F(1, 58) = 27.39, p = .001, η_p²= .32, 1 – β = .99 and Time, F(3,174) = 15.01, p = .001, η_p²= .206, 1 – β = 1.00. As can be seen from Table 4, participants in the feedback group used more recommended questions (90.3%) than participants in the control group (72.6%). Although the overall interaction between Group and Time was not significant, F(3,174) = 2.57, p = .056, η_p²= .042, 1 – β = .626, pairwise comparisons indicated differences within Interviews 2, 3 and 4 (p = .002).

Next, we analyzed the results concerning the number of non-recommended questions. Because Mauchly's test of sphericity was significant, we used a Greenhouse–Geisser correction. The analysis showed that there was a significant main effect of Group, F(1, 58) = 33.64, p = .001, η_p²= .37, 1 – β = .99 and Time, F(2.314, 134.076) = 4.27, p = .012, η_p²= .069, 1 – β = .78. Overall, participants in the control group used non-recommended questions more than twice as often as participants in the feedback group. There was also a significant interaction between Group and Time, F(2.312, 134.076) = 7.03, p = .001, η_p²= .108, 1 – β = .948. For Time and Group interaction, pairwise comparisons indicated differences within Interviews 2, 3 and 4 (p = .001).

Finally, we analyzed the results concerning the proportion of recommended questions. Because Mauchly's test of sphericity was significant, we used a Greenhouse–Geisser correction. The analysis showed that there was a significant main effect of Group, F(1, 58) = 38.85, p = .001, η_p²= .40, 1 – β = 1.00 and Time, F(2.138, 124.017) = 7.75, p = .001, η_p²= .124, 1 – β = .96. As with the number of recommended questions, the proportion of recommended questions used was higher for participants in the feedback group than participants in the control group. There was also a significant interaction between Group and Time, F(2.138, 124.017) = 6.12, p = .002, η_p²= .095, 1 – β = .897. For Time and Group interaction, pairwise comparisons indicated differences between the Feedback and Control group within Interviews 2, 3 and 4 (p = .001).

Effect of feedback on question style while controlling for covariates

As described in our preregistration, we conducted the analyses with gender, age, and level of education as covariates. Figure 2 shows the differences in our dependent variables between feedback and control groups in all four interviews.

Figure 2.
Estimated marginal means (EMM) of the difference in four adult witness avatar interviews between feedback and control groups in the number of recommended questions (A), the number of non-recommended questions (B) and the proportion of recommended questions (C). Error bars show 95% confidence intervals.

First, we analyzed the results concerning the number of recommended questions. Mauchly's test of sphericity was not significant, so sphericity was assumed. The analysis showed that there was significant main effect of Group, F(1, 55) = 27.22, p = .001, η_p²= .33, 1 – β = .99 but not for Time, F(3,165) = 2.06, p = .107, η_p²= .036, 1 – β = .52. As can be seen from Figure 2A, participants in the feedback group used more recommended questions than participants in the control group. There was a significant interaction between Group and Time, F(3,165) = 2.86, p = .038, η_p²= .049, 1 – β = .677. Pairwise comparisons indicated differences within Interviews 2, 3 and 4 (p = .001).

Next, we analyzed the results concerning the number of non-recommended questions. Mauchly's test of sphericity was not significant, so sphericity was assumed. The analysis showed that there was a significant main effect of Group, F(1, 55) = 33.87, p = .001, η_p²= .38, 1 – β = 1.00 but not for Time F(2.248, 123.621) = .91, p = .414, η_p²= .016, 1 – β = .22. Figure 2B shows that the number of non-recommended questions was higher for the control group than the feedback group. There was significant interaction between Group and Time, F(2.248, 123.621) = 7.21, p = .001, η_p²= .116, 1 – β = .95. Pairwise comparisons indicated differences within Interviews 2, 3 and 4 (p = .001).

Finally, we analyzed the results concerning the proportion of recommended questions. Because Mauchly's test of sphericity was significant, we used a Greenhouse–Geisser correction. ANCOVA indicated a significant main effect of Group, F(1, 55) = 37.23, p = .001, η_p²= .40, 1 – β = 1.00. The proportion of recommended questions used was significantly higher in the feedback group than the control group (Figure 2C). There was no significant main effect of Time, F(2.088, 114.832) = 1.11, p = .34, η_p²= .02, 1 – β = .25. However, there was a significant interaction between Group and Time, F(2.088, 114.832) = 6.33, p = .002, η_p²= .103, 1 – β = .900. Pairwise comparisons indicated differences within Interviews 2, 3 and 4 (p = .001). Figure 2C illustrates that the proportion of recommended questions increased in the feedback group with each subsequent interview, while in the control group they did not.

Associations between question types and the accuracy of details

In addition to the analysis stated in our preregistration, we used Pearson correlation analyses to look at different correlations between the number of different question types used. We found significant correlations between the number of recommended and non-recommended questions, r(238) = −.542, p < .001, the number of recommended questions and proportion of recommended questions, r(238) = .688, p < .001, and the number of non-recommended questions and proportion of recommended questions, r(238) = −.963, p < .001.

As stated in our preregistration, we looked at the association between question type (recommended or non-recommended) and the accuracy of details the question produced (correct vs incorrect). Using non-recommended questions was negatively related to the accuracy of the responses given by the avatars (B = −0.403, SE = 0.084, Wald χ² (1) = 23.143, p < .001). This was also the case when the analysis was run separately in the feedback (B = −0.585, SE = 0.164, Wald χ² (1) = 12.671, p < .001) and control (B = −0.320, SE = 0.100, Wald χ²(1) = 10.137, p < .001) groups.

Additional analyses

Using generalized estimating equations, we found that feedback and control group participants did not differ in the amount of correct details they reported in their descriptions of what had happened to the avatars χ²(1) = .32, p = .57 (feedback M = 40.43, SD = 19.25 vs control M = 38.22, SD = 17.21). Finally, the proportion of correct avatar responses produced by the avatars out of all possible correct avatar responses available in the system was calculated. Generalized estimating equations indicated that feedback and control groups did not differ, χ²(1) = 2.78, p = .096 (feedback M = 15.61, SD = 4.51 vs control M = 17.95, SD = 5.65).

Discussion

In this study, we examined whether the quality of interviews with adults can be increased by training with AWAs coupled with feedback compared with training with no feedback. Overall, we found that combining feedback with simulated avatar training is effective in improving the quality of interviews in terms of the questions used. We continue to discuss each hypothesis in turn.

Our first hypothesis, stating that compared with receiving no feedback, receiving feedback about the questions asked results in a larger proportion of recommended questions being asked, as well as a larger number of recommended questions and a smaller number of non-recommended questions was supported. In all of our analyses, with or without gender, age and education included as covariates, we demonstrated that receiving feedback after AWA interviews resulted in a larger number of recommended questions, a smaller number of non-recommended questions, and a higher proportion of recommended questions, similar to what has been found with Empowering Interviewer Training (EIT) software in the case of simulating child witness interview training (Pompedda et al., 2022). Thus, we confirmed that an experiential serious game that includes personalized feedback (Sternberg et al., 2002) will improve the use of recommended question types in interviews with adult avatars as well. Our results indicate that if the requirements of the task of interviewing adult witnesses are introduced to the participants and immediate feedback is provided, then the performance of novice interviewers will potentially benefit from it.

We also found support for our second hypothesis, which stated that over four consecutive interviews, the participants receiving feedback after each subsequent interview will have a larger increase in the proportion of recommended questions than those not receiving feedback. In both our analyses (with and without controlling for gender, age and education), we found an interaction effect of the presence of feedback and the number of the interview. Because the first time the participants in the feedback group received feedback was after the first interview, there were no differences in the proportion of recommended questions between the groups in the first interview. However, we found that for participants who received feedback about their performance, but not for participants who did not receive feedback, the proportion of recommended questions increased with each subsequent interview. Thus, we replicated previous results that a training session with just four avatars can improve the participants’ interview performance with avatars (Haginoya et al., 2020; Krause et al., 2017; Pompedda, 2018; Pompedda et al., 2015, 2017, 2021). If the effects of this training transfer to practitioners as they do with child avatars (Kask et al., 2022), the relatively short training time can be of huge benefit to police investigators. It has been shown that police investigators use a large number of non-recommended questions (Launay and Py, 2015) in investigative interviews with adults, which leads to decreased quality of responses in adults (Ibabe and Sporer, 2004; Jack and Zajac, 2014; Molyneaux and Larsen, 1992; Roebers and Schneider, 2000; Shapiro et al., 2005; Sharps et al., 2012; Wade and Spearing, 2022). Because the police officers may be under constant time pressure collecting evidence, training with AWAs could potentially result in an increase in performance in a short amount of time.

Finally, we established that different question types (recommended and non-recommended) were associated with each other, and that using non-recommended questions was negatively related to the accuracy of the responses given by the avatars, both overall and when the analysis was run separately in the feedback and control groups. These results show that the algorithms were working as expected (see Tohvelmann and Kask, 2022).

However, there were no significant differences between the feedback and control groups in the amount of correct detail that participants reported in their descriptions of what had happened to the avatars, as well as in the proportion of correct responses produced by the avatars out of all possible correct responses. These findings are probably because the difference in the likelihood of correct or incorrect responses being elicited (based on the algorithms, i.e. 89% correct answers in case of a free recall question was asked) as a function of recommended vs non-recommended questions is not particularly large. It is therefore particularly important to conduct a transfer study where the participants also interview a real mock witness.

Strengths and limitations

One of the main strengths of this study is that the case scenarios are based on court verdicts, which make the cases realistic. Furthermore, the algorithms we used in the AWA software were based on a literature review of how accurately participants in previous research have responded to different question types. In this way, we can say that the AWAs behaved as real humans would have behaved under similar circumstances. However, different papers operationalize question types in various ways. Therefore, we used examples of questions in these papers to determine whether our categorization of question types in creating the algorithm was similar to that presented in the papers. Despite controlling for participants’ backgrounds in law and psychology, a potential limitation of the study is that participants may still have had some prior interviewing experience. Thus, the fact that we used a convenience sample and the specific field of education is unknown to us could have influenced our results. This is especially true when considering the ceiling effects in the feedback group when asking recommended questions. Nonetheless, we have to bear in mind that the interviewers were novices in the field. They did their best to adhere to the guidelines of best-practice investigative interviews. Therefore, their approach to interviewing may differ from that of police officers, especially when it comes to asking questions intended to gather evidence (where, when and what happened with whom; Code of Criminal Procedure, 2023).

AWA training was conducted by a single experimenter. This could raise a question of whether categorization of the question types was conducted in an unified manner. To diminish this risk, we performed a post-experiment interrater reliability analysis, which demonstrated good reliability.

The feedback group almost reached a ceiling in the proportion of recommended questions asked. One way to interpret this (and to compare with previous results with EIT avatar software) could be that in EIT (compared with AWA), the case scenarios that participants received before interviewing the avatar contained more facts about what may have happened. This could have led the participants to investigate whether the facts were correct or not. However, in AWA training we provided the participants with only a short crime report notice with the instruction to find out what happened. Because it did not contain many facts, this may have resulted in a more generic description of the crime compared with what may be important for police officers to ask from a victim/witness (Code of Criminal Procedure, 2023). Rivard et al. (2015) also noted that if the interviewers were “blind” about the case before interviewing, then they were more likely to have begun their interviews with a non-suggestive question than the informed interviewers. In addition, in EIT training, the time frame of 10 minutes per interview was chosen, because younger children (four- and six-year-olds) could get tired after a long interview; however, because the interviews with adults can last longer (an average interview with child witnesses lasted 30 minutes in Kask et al., 2022 but 89 minutes with adult suspects in Oxburgh et al., 2014), 10 minutes could have been too short for asking more specific questions in AWA interviews.

When providing feedback on question types to the participants, this was given on two recommended and two non-recommended questions. However, if a participant did not ask any questions from one broader category (either recommended or non-recommended), or asked only one type of question from one broader category, then feedback was given on fewer than four questions. In future studies, it is important to record this type of information in more detail because there could have been some variability between the participants regarding this matter.

Conclusion

To our knowledge, this is the first study that has used web-based software to train the skill of interviewing adult witnesses. We demonstrated that with feedback after each AWA interview, the proportion of recommended questions asked was significantly higher than the proportion of recommended questions asked by the control group by the end of the fourth interview. This study shows great potential for applying the methods of serious gaming into the training of not only investigative interviewing of children, but also of adult witnesses.

Training programs are often logistically complicated and expensive, in addition to requiring a lot of time from those participating (Pompedda, 2018). The AWA solution could contribute to training interviewers in a structured witness interviewing method such as the cognitive interview (Fisher et al., 1987). It could help interviewers practice the skill of asking (and if not recommended, then rephrasing) questions from a witness at their work location at a time most convenient to them, and receive immediate feedback on their questioning skills.

Further research is needed to examine whether this effect also transfers to increasing the proportion of recommended questions in real-life interviews with adult witnesses. The next step in validating the effectiveness of AWA would be to test whether a transfer effect is present (i.e., whether the AWA software would also be effective among police officers conducting real adult witness interviews). We anticipate that there would be an increase in the proportion of recommended questions in not only AWA interviews, but also interviews conducted with real adult witnesses.

Supplemental Material

sj-docx-1-psm-10.1177_14613557241310014 - Supplemental material for Providing feedback in simulated investigative interviews with adult witness avatars increases the use of free recall and open questions

Supplemental material, sj-docx-1-psm-10.1177_14613557241310014 for Providing feedback in simulated investigative interviews with adult witness avatars increases the use of free recall and open questions by Mari-Liis Tohvelmann, Kristjan Kask, Annegrete Palu, Shumpei Haginoya and Pekka Santtila in International Journal of Police Science & Management

Reference	Question type	Accuracy (%)	No. of participants	Age of participants	Delay in days	Source of video	Source of recall
Ibabe and Sporer (2004)	Open	74.00	20	19	0	Video	Written
	Forced-choice	66.50	22			Video	Written
	Closed	73.00	20			Video	Written
Murnikov and Kask (2021)	Leading	76.00	30	26	7	Live	Oral
	Misleading	76.00	30			Live	Oral
Nahouli et al. (2021)	Free recall	87.00	20	23	1	Video	Oral
	Cued recall	66.00	20			Video	Oral

Category	Definition	Examples
Recommended questions
Free recall	These are invitations that help the interviewee to provide a free recall response, without any influence by the interviewer	“Tell me everything”“Tell me all you remember”
Cued recall	These invitations are similar to free recall, but related to a previous statement elicited from the interviewee	“You mentioned a car. Tell me everything about the car”
Open	Open-ended non-suggestive questions that focus the interviewee's attention on a detail and ask for a focalized explanation (usually Wh- questions)	“What does it mean?”“Where did you go with her?”
Non-recommended questions
Closed	These questions focus on the details the interviewee may have mentioned. Closed questions do not imply a particular type of response, although they can be answered “yes” or “no”	“Did it happen in the kitchen?”
Forced-choice	These questions provide possible choices that the interviewee has to choose from	“Did it happen once or twice or three times?”
Suggestive	The interviewer strongly communicates what kind of response is expected using details that the interviewee may have mentioned before	“She touched you, didn’t she?” “I know that someone touched you, tell me who it was!”
Misleading	The interviewer strongly communicates what kind of response is expected but mentions a detail that does not emerge from the interviewee's previous responses	“So I understand that car was red” (if the color of the car was not mentioned before and is not known)

Interview		Recommended		Not recommended		Proportion of recommended
1	Feedback	15.17	4.20	3.57	3.37	81.20	15.68
	Control	13.13	4.30	5.40	3.84	71.34	18.91
	Total	14.15	4.34	4.48	3.70	76.27	17.93
2	Feedback	17.57	3.44	1.80	2.16	91.33	9.50
	Control	14.27	4.42	4.83	3.19	75.68	14.71
	Total	15.92	4.26	3.32	3.10	83.51	14.59
3	Feedback	19.70	1.99	1.40	1.71	93.86	6.60
	Control	15.07	4.93	6.13	3.47	70.70	16.58
	Total	17.38	4.40	3.77	3.61	82.28	17.11
4	Feedback	20.53	3.87	1.13	1.52	94.95	6.75
	Control	15.37	4.61	5.67	3.19	72.57	16.59
	Total	17.95	4.96	3.40	3.37	83.76	16.88
Overall	Feedback	18.24	4.02	1.97	2.47	90.33	11.56
	Control	14.46	4.59	5.51	3.42	72.57	16.67

Footnotes

Acknowledgments

The authors wish to thank Tatsuro Ibe for programming the AWA and Emily Patterson for English proofreading.

Data availability

The data set containing all primary study variables can be obtained from the authors.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Tallinn University grant number TF/2221.

ORCID iDs

Kristjan Kask

Shumpei Haginoya

Supplemental material

Supplemental material for this article is available online.

Author biographies

Mari-Liis Tohvelmann, MA in Psychology. She is junior researcher at Tallinn University, Estonia. She is interested in investigative interviewing of victims and witnesses.

Kristjan Kask, PhD, is associate professor of legal psychology at Tallinn University, Estonia. His field of research is eyewitness testimony and investigative interviewing. He has conducted adult trainings in best practices of investigative interviewing child and adult witnesses in Estonia and abroad.

Annegrete Palu, MA in Psychology, is a junior lecturer in Experimental Psychology at the University of Tartu, Estonia. Her research interests include eyewitness identification and investigative interviewing. She has also conducted training sessions on investigative interviewing of child witnesses in Estonia.

Shumpei Haginoya, PhD, is junior associate professor in psychology at Meiji Gakuin University, Japan. In his research he is specializing in investigative psychology, investigative interviewing (victims, eyewitnesses, and suspects) training, polygraph test (memory detection), offender/geographic profiling, and crime linkage.

Pekka Santtila, PhD, is professor of Psychology in New York University Shanghai. He has conducted research in various fields of legal psychology including eyewitness identification as well as child witness and suspect interviewing.

References

Allwood

Helene Innes-Ker

Homgren

, et al. (2008) Children’s and adults’ realism in their event-recall confidence in responses to free recall and focused questions. Psychology, Crime & Law 14: 529–547.

Allwood

Knutsson

Granhag

(2006) Eyewitnesses under influence: how feedback affects the realism in confidence judgements. Psychology, Crime & Law 12(1): 25–38.

Bärthel

Wessel

Huntjens

RJC

, et al. (2017) Collaboration enhances later individual memory for emotional material. Memory 25(5): 636–646.

Baugerud

, et al. (2021) Multimodal virtual avatars for investigative interviews with children. In: Proceedings of the 2021 Workshop on Intelligent Cross-Data Analysis and Retrieval (ICDAR ‘21). New York: Association for Computing Machinery. doi:https://doi.org/10.1145/3463944.3469269.

Benson

Powell

(2015) Evaluation of a comprehensive interactive training system for investigative interviewers of children. Psychology, Public Policy, and Law 21: 309–322.

Bjorklund

Cassel

Bjorklund

, et al. (2000) Social demand characteristics in children's and adults’ eyewitness memory and suggestibility: the effect of different interviewers on free recall and recognition. Applied Cognitive Psychology 14: 421–433.

Blume

Ford

Baldwin

, et al. (2010) Transfer of training: a meta-analytic review. Journal of Management 36: 1065–1105.

Bonham

González-Vallejo

(2009) Assessment of calibration for reconstructed eye-witness memories. Acta Psychologica 131(1): 34–52.

Boon

Noon

(1994) Changing perspectives in cognitive interviewing. Psychology, Crime & Law 1: 59–69.

10.

Boon

Milne

Rosloot

, et al. (2020) Demonstrating detail in investigative interviews—an examination of the DeMo technique. Applied Cognitive Psychology 34: 1133–1142.

11.

Bowles

Sharman

(2014) The effect of different types of leading questions on adult eyewitnesses with mild intellectual disabilities. Applied Cognitive Psychology 28: 129–134.

12.

Brackmann

Otgaar

Roos af Hjelmsäter

, et al. (2017) Testing a new approach to improve recall in different ages: providing witnesses with a model statement. Translational Issues in Psychological Science 3(2): 131–142.

13.

Brewer

Weber

Semmler

(2005) Eyewitness identification. In: Brewer

Williams

(eds) Psychology and Law: An Empirical Perspective. New York: Guilford Press, 177–221.

14.

Brubacher

Sharman

Scoboria

, et al. (2020) The effect of question type on resistance to misinformation about present and absent details. Applied Cognitive Psychology 34: 1323–1334.

15.

Bull

Blandon-Gitlin

(2019) The Routledge International Handbook of Legal and Investigative Psychology. New York: Routledge. https://doi.org/10.4324/9780429326530.

16.

Buratti

Allwood

Johansson

(2014) Stability in the metamemory realism of eyewitness confidence judgments. Cognitive Processing 15(1): 39–53.

17.

Carol

Kieckhaefer

Johnson

, et al. (2021) Being a good witness: the roles of benevolence and working memory capacity in rapport’s effect on eyewitness memory. Journal of Applied Social Psychology 51(7): 730–745.

18.

Cassel

Bjorklund

(1995) Developmental patterns of eyewitness memory and suggestibility: an ecologically based short-term longitudinal study. Law and Human Behavior 19(5): 507–532.

19.

Cassel

Roebers

CEM

Bjorklund

(1996) Developmental patterns of eyewitness responses to repeated and increasingly suggestive questions. Journal of Experimental Child Psychology 61(2): 116–133.

20.

Chrobak

Rindal

Zaragoza

(2015) The impact of multifaceted questions on eyewitness accuracy following forced fabrication interviews. The Journal of General Psychology 142(3): 150–166.

21.

Code of Criminal Procedure (2023) Riigi Teataja, Available at: https://www.riigiteataja.ee/kohtulahendid/koik_menetlused.html (accessed 24 February 2023).

22.

Cohen

(1988) Statistical Power Analysis for the Behavioral Sciences. 2nd edn. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

23.

Collins

Lincoln

Frank

(2002) The effect of rapport in forensic interviewing. Psychiatry, Psychology and Law 9(1): 69–78.

24.

Court Verdict Database (2023) Retrieved on January 2nd 2025 from: https://www.riigiteataja.ee/kohtulahendid/koik_menetlused.html

25.

Dahl

Allwood

Scimone

, et al. (2015) Old and very old adults as witnesses: event memory and metamemory. Psychology, Crime & Law 21(8): 764–775.

26.

Dando

Wilcock

Behnkle

, et al. (2011) Modifying the cognitive interview: countenancing forensic application by enhancing practicability. Psychology, Crime & Law 17(6): 491–511.

27.

de la Fuente Vilar

Horselenberg

Stromwall

, et al. (2020) Effects of cooperation on information disclosure in mock-witness interviews. Legal and Criminological Psychology 25(2): 133–149.

28.

Eastwood

Snook

Luther

(2019) Establishing the most effective way to deliver the sketch procedure to enhance interviewee free recall. Psychology, Crime & Law 25(5): 482–493.

29.

Eisen

Morgan

Mickes

(2002) Individual differences in eyewitness memory and suggestibility: examining relations between acquiescence, dissociation and resistance to misleading information. Personality and Individual Differences 33: 553–571.

30.

Evans

Fisher

(2011) Eyewitness memory: balancing the accuracy, precision and quantity of information through metacognitive monitoring and control. Applied Cognitive Psychology 25(3): 501–508.

31.

Faul

Erdfelder

Buchner

, et al. (2009) Statistical power analyses using G*power 3.1: tests for correlation and regression analyses. Behavior Research Methods 41: 1149–1160.

32.

Fisher

Geiselman

Raymond

, et al. (1987) Enhancing enhanced eyewitness memory: refining the cognitive interview. Journal of Police Science & Administration 15: 291–297.

33.

Gawrylowicz

Memon

Scoboria

(2014) Equipping witnesses with transferable skills: the self-administered interview. Psychology, Crime & Law 20(4): 315–325.

34.

Gawrylowicz

Scoboria

Teodorini

, et al. (2019) Intoxicated eyewitnesses: the effect of a fully balanced placebo design on event memory and metacognitive control. Applied Cognitive Psychology 33(3): 344–357.

35.

Geiselman

Fisher

MacKinnon

, et al. (1985) Eyewitness memory enhancement in the police interview: cognitive retrieval mnemonics versus hypnosis. Journal of Applied Psychology 70: 401–412.

36.

Ginet

(2001) A technique for enhancing memory in eye witness testimonies for use by police officers and judicial officials : the cognitive interview. Le Travail Humain 64: 173–191.

37.

Ginet

Colomb

(2014) The differential effectiveness of the cognitive interview instructions for enhancing witnesses’ memory of a familiar event. Swiss Journal of Psychology 73(1): 25–34.

38.

Griffiths

Milne

(2006) Will it all end in tiers: police interviews with suspects in Britain. In: Williamson

(ed.) Investigative Interviewing Rights, Research, Regulation. Cullompton, UK: Willan, 167–189.

39.

Haginoya

Yamamoto

Pompedda

, et al. (2020) Online simulation training of child sexual abuse interviews with feedback improves interview quality in Japanese university students. Frontiers in Psychology 11: 998.

40.

Hagsand

Hjelmsäter

Granhag

, et al. (2013) Bottled memories: on how alcohol affects eyewitness recall. Scandinavian Journal of Psychology 54(3): 188–195. Epub 2013 Feb 6. PMID: 23384077.

41.

Halley

Walsh

Myklebust

, et al. (2023) Structured models of interviewing. In: Oxburgh

Myklebust

Fallon

Hartwig

(eds) Interviewing and Interrogation: A Review of Research and Practice Since World War II. Brussels: Torkel Opsahl, 257–281.

42.

Hope

Gabbert

Fisher

, et al. (2014) Protecting and enhancing eyewitness memory: the impact of an initial recall attempt on performance in an investigative interview. Applied Cognitive Psychology 28(3): 304–313.

43.

Ibabe

Sporer

(2004) How you ask is what you get: on the influence of question form on accuracy and confidence. Applied Cognitive Psychology 18: 711–726.

44.

Jack

Leov

Zajac

(2014) Age-related differences in the free-recall accounts of child, adolescent, and adult witnesses. Applied Cognitive Psychology 28. DOI: https://doi.org/10.1002/acp.2951.

45.

Jack

Zajac

(2014) The effect of age and reminders on witnesses’ responses to cross-examination-style questioning. Journal of Applied Research in Memory and Cognition 3: 1–6.

46.

Karlén

Roos Af Hjelmsäter

Fahlke

, et al. (2017) To wait or not to wait? Improving results when interviewing intoxicated witnesses to violence. Scandinavian Journal of Psychology 58(1): 15–22. PMID: 28054379.

47.

Kask

Pompedda

Palu

, et al. (2022) Transfer of avatar training effects to investigative field interviews of children conducted by police officers. Frontiers in Psychology 13: 753111. DOI: https://doi.org/10.3389/fpsyg.2022.753111.

48.

Kebbell

Johnson

(2000) Lawyers’ questioning: the effect of confusing questions on witness confidence and accuracy. Law and Human Behavior 24: 629–641.

49.

Knutsson

Allwood

Johansson

(2011) Child and adult witnesses: the effect of repetition and invitation-probes on free recall and metamemory realism. Metacognition Learning 6: 213–228.

50.

Krause

Pompedda

Antfolk

, et al. (2017) The effects of feedback and reflection on the questioning style of untrained interviewers in simulated child sexual abuse interviews. Applied Cognitive Psychology 31: 187–198.

51.

Krix

Sauerland

Gabbert

, et al. (2014) Providing eyewitnesses with initial retrieval support: what works at immediate and subsequent recall? Psychology, Crime & Law 20(10): 1005–1027.

52.

Krix

Sauerland

Raymaekers

LHC

, et al. (2016) Eyewitness evidence obtained with the self-administered interview is unaffected by stress. Applied Cognitive Psychology 30(1): 103–112.

53.

Lamb

(2016) Difficulties translating research on forensic interview practices to practitioners: finding water, leading horses, but can we get them to drink? American Psychologist 71: 710–718.

54.

Lamb

Brown

Hershkowitz

, et al. (2018) Tell me what Happened: Questioning Children about Abuse, 2nd edn. Chichester, UK: Wiley.

55.

Launay

(2015) Methods and aims of investigative interviewing of adult witnesses: an analysis of professional practices. Pratiques Psychologiques 21: 55–70.

56.

Lipton

(1977) On the psychology of eyewitness testimony. Journal of Applied Psychology 62: 90–95.

57.

Loftus

Palmer

(1974) Reconstruction of automobile destruction: an example of the interaction between language and memory. Journal of Verbal Learning & Verbal Behavior 13: 585–589.

58.

Paterson

Temler

(2021) The effects of immediate recall and subsequent retrieval strategy on eyewitness memory. Psychiatry, Psychology & Law 29(5): 788–805.

59.

Matsuo

Miura

(2016) Effectiveness of the self-administered interview and drawing pictures for eliciting eyewitness memories. Psychiatry, Psychology & Law 24(5): 643–654.

60.

Molyneaux

Larsen

(1992) Acceptance of misleading information by children and adults. Psychological Reports 71: 267–274.

61.

Murnikov

Kask

(2021) Recall accuracy in children: age vs. conceptual thinking. Frontiers in Psychology 12: 686904.

62.

Nahouli

Dando

Mackenzie

, et al. (2021) Rapport building and witness memory: actions may ‘speak’ louder than words. PLoS One 16(8): e0256084.

63.

Nunan

Stanier

Milne

, et al. (2020) Source handler telephone interactions with covert human intelligence sources: an exploration of question types and intelligence yield. Applied Cognitive Psychology 34(6): 1473–1484.

64.

Orbach

Hershkowitz

Lamb

, et al. (2000) Assessing the value of structured protocols for forensic interviews of alleged child abuse victims. Child Abuse & Neglect 24: 733–752.

65.

Oxburgh

Ost

Morris

, et al. (2014) The impact of question type and empathy on police interviews with suspects of homicide, filicide and child sexual abuse. Psychiatry, Psychology and Law 21: 903–917.

66.

Oxburgh

Myklebust

Grant

(2010) The question of question types in police interviews: a review of the literature from a psychological and linguistic perspective. International Journal of Speech, Language & the Law 17(1): 45–66.

67.

Phillips

Oxburgh

Gavin

, et al. (2012) Investigative interviews with victims of child sexual abuse: the relationship between question type and investigation relevant information. Journal of Police and Criminal Psychology 27(1): 45–54.

68.

Pompedda

(2018) Training in investigative interviews of children: Serious gaming paired with feedback improves interview quality. Doctoral dissertation, Åbo Akademi University, Finland. Available at: https://pdfs.semanticscholar.org/14d9/45841824f7a003eee3bb38cf1dac0a26a1ed.pdf (accessed 2 May 2024).

69.

Pompedda

Antfolk

Zappalà

, et al. (2017) A combination of outcome and process feedback enhances performance in simulations of child sexual abuse interviews using avatars. Frontiers in Psychology 8: 1474. DOI: https://doi.org/10.3389/fpsyg.2017.01474.

70.

Pompedda

Kask

Palu

, et al. (2021) Transfer of simulated interview training into interviews with Italian and Estonian children exposed to a mock event. Nordic Psychology 73: 43–67.

71.

Pompedda

Zappalà

Santtila

(2015) Simulations of child sexual abuse interviews using avatars paired with feedback improves interview quality. Psychology, Crime, & Law 21: 28–52.

72.

Pompedda

Zhang

Haginoya

, et al. (2022) A mega-analysis of the effects of feedback on the quality of simulated child sexual abuse interviews with avatars. Journal of Police and Criminological Psychology 37: 485–498.

73.

Poole

White

(1991) Effects of question repetition on the eyewitness testimony of children and adults. Developmental Psychology 27(6): 975–986.

74.

Powell

Fisher

Wright

(2005) Investigative interviewing. In: Brewer

Williams

(eds) Psychology and Law: An Empirical Perspective. New York, NY: Guilford Press, 11–42.

75.

Powell

Guadagno

Benson

(2016) Improving child investigative interviewer performance through computer-based learning activities. Policing and Society 26: 365–374.

76.

Principles on effective interviewing for investigations and information gathering (2021) Available at: https://interviewingprinciples.com/ (accessed 11 October 2024).

77.

Rechdan

Sauer

Hope

, et al. (2017) Computer mediated social comparative feedback does not affect metacognitive regulation of memory reports. Frontiers in Psychology 8: 1433.

78.

Rivard

Pena

Schreiber Compo

(2015) Blind interviewing: is ignorance bliss? Memory 24: 1256–1266.

79.

Roberts

Lamb

Sternberg

(2004) The effects of rapport-building style on children`s reports of a staged event. Applied Cognitive Psychology 18: 189–202.

80.

Roebers

(2002) Confidence judgments in children's and adult’s event recall and suggestibility. Developmental Psychology 38(6): 1052–1067.

81.

Roebers

Fernandez

(2002) The effects of accuracy motivation and children's and adults’ event recall, suggestibility, and their answers to unanswerable questions. Journal of Cognition and Development 3(4): 415–443.

82.

Roebers

Howie

(2003) Confidence judgments in event recall: developmental progression in the impact of question format. Journal of Experimental Child Psychology 85(4): 352–371.

83.

Roebers

McConkey

(2003) Mental reinstatement of the misinformation context and the misinformation effect in children and adults. Applied Cognitive Psychology 17(4): 477–493.

84.

Roebers

Moga

Schneider

(2001) The role of accuracy motivation on children’s and adults’ event recall. Journal of Experimental Child Psychology 78(4): 313–329.

85.

Roebers

Schneider

(2000) The impact of misleading questions on eyewitness memory in children and adults. Applied Cognitive Psychology 14: 509–526.

86.

Roebers

Schneider

(2001) Memory for an observed event in the presence of prior misinformation: developmental patterns of free recall and identification accuracy. British Journal of Developmental Psychology 19(4): 507–524.

87.

Roebers

Schneider

(2005) The strategic regulation of children's memory performance and suggestibility. Journal of Experimental Child Psychology 91(1): 24–44.

88.

Roebers

von der Linden

Howie

(2007) Favourable and unfavourable conditions for children's confidence judgments. British Journal of Developmental Psychology 25(1): 109–134.

89.

Saraiva

Hope

Horselenberg

, et al. (2020) Using metamemory measures and memory tests to estimate eyewitness free recall performance. Memory 28(1): 94–106.

90.

Sarwar

Allwood

Innes-Ker

(2011) Effects of communication with non-witnesses on eyewitnesses’ recall correctness and meta-cognitive realism. Applied Cognitive Psychology 25: 782–791.

91.

Saunders

Jess

(2010) The effects of age on remembering and knowing misinformation. Memory 18: 1–11.

92.

Schreiber Compo

Carol

Evans

, et al. (2017) Witness memory and alcohol: the effects of state-dependent recall. Law & Human Behavior 41(2): 202–215.

93.

Scoboria

Mazzoni

Kirsch

(2008) “Don’t know” responding to answerable and unanswerable questions during misleading and hypnotic interviews. Journal of Experimental Psychology: Applied 14(3): 255–265.

94.

Scoboria

Memon

Trang

, et al. (2013) Improving responding to questioning using a brief retrieval training. Journal of Applied Research in Memory and Cognition 2(4): 210–215.

95.

Shapiro

Blackford

Chen

C-F

(2005) Eyewitness memory for a simulated misdemeanor crime: the role of age and temperament in suggestibility. Applied Cognitive Psychology 19: 267–289.

96.

Sharman

Boyd

Powell

(2015) Disrupting the encoding of misinformation delivered in closed specific and open presumptive questions. Psychiatry, Psychology and Law 22: 535–541.

97.

Sharps

Herrera

Dunn

, et al. (2012) Repetition and reconfiguration: demand-based confabulation in initial eyewitness memory. Journal of Investigative Psychology and Offender Profiling 9: 149–160.

98.

Sternberg

Lamb

Davies

G M

, et al. (2001) The memorandum of good practice: theory versus application. Child Abuse and Neglect (25): 669–681. DOI: https://doi.org/10.1016/S0145-2134(01)00232-0.

99.

Sternberg

Lamb

Esplin

, (2002) Using a structured protocol to improve the quality of investigative interviews. In: Eisen

, et al. (eds) Memory and Suggestibility in the Forensic Interview. Mahwah, NJ: Erlbaum, 409–436. Available at: http://nichdprotocol.com/wp-content/uploads/2017/09/InteractiveNICHDProtocol.pdf (accessed 24 February 2023).

100.

Sutherland

Hayne

(2001) The effect of postevent information on adults’ eyewitness reports. Applied Cognitive Psychology 15: 249–263.

101.

Taylor

Dando

(2018) Eyewitness memory in face-to-face and immersive avatar-to-avatar contexts. Frontiers in Psychology 9: 507.

102.

Tohvelmann

M-L

Kask

(2022) From child to adult victims and witnesses: ways of improving the quality of investigative interviews. Juridica International 31: 136–146.

103.

Tulving

(1993) What is episodic memory? Current Directions in Psychological Science 2: 67–70.

104.

United Nations (2024) A Manual on Investigative Interviewing for Criminal Investigation. Available at: https://resourcehub01.blob.core.windows.net/$web/Policy%20and%20Guidance/corepeacekeepingguidance/Thematic%20Operational%20Activities/Police%20and%20Law%20Enforcement/2024.01%20Manual%20on%20Investigative%20Interviewing%20for%20Criminal%20Investigation%20(2024).pdf (accessed 11 October 2024).

105.

Valentine

Maras

(2011) The effect of cross-examination on the accuracy of adult eyewitness testimony. Applied Cognitive Psychology 25: 554–561.

106.

Wade

Spearing

(2022) The effect of cross-examination style questions on adult eyewitness accuracy depends on question type and eyewitness confidence. Memory 31: 163–178.

107.

Wang

Paterson

Kemp

(2014) The effects of immediate recall on eyewitness accuracy and susceptibility to misinformation. Psychology, Crime & Law 20: 619–634.

108.

Waterman

Blades

Spencer

(2001) Interviewing children and adults: the effect of question format on the tendency to speculate. Applied Cognitive Psychology 15: 521–531.

109.

Webster

Oxburgh

Dando

(2021) The use and efficacy of question type and an attentive interviewing style in adult rape interviews. Psychology, Crime & Law 27: 656–677.

110.

West

Stone

(2014) Age differences in eyewitness memory for a realistic event. The Journals of Gerontology. Series B, Psychological Sciences and Social Sciences 69: 338–347.

111.

Witmer

Singer

(1998) Measuring presence in virtual environments: a presence questionnaire. Presence 7: 225–240.

112.

Wright

Alison

(2004) Questioning sequences in Canadian police interviews: constructing and confirming the course of events? Psychology, Crime & Law 10: 137–154.

113.

Wysman

Scoboria

Gawrylowicz

, et al. (2014) The cognitive interview buffers the effects of subsequent repeated questioning in the absence of negative feedback. Behavioral Sciences and the Law 32(2): 207–219.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB

Domestic violence	A woman reports domestic violence in the apartment next door in a large apartment building, alleging that a girl's health and well-being is in danger
Fraud	An older man reports that he cannot pay with his debit card because there is not enough money on his account and he has also discovered that his driver's license is missing from his wallet
Physical violence	A witness reports that a woman has been physically attacked on the street
Robbery	A security guard reports that two men left a store without paying for a teddy bear, then one of them punched the security guard in the face