Abstract
Trials of international crimes frequently rely on a complex type of witnesses: insiders or accomplices. While harnessing essential knowledge, insiders pose serious challenges to the decision-makers assessing their credibility. Prior research suggests that judges dismiss a sizeable proportion of insider testimony during trials of international crimes. While some reasons might lie with the witnesses, a closer look at the professional practices is warranted. This study aimed to examine the process of insider witness statement assessments by international criminal justice professionals and to analyze how they resolve the tension between the concerns about witness truthfulness and the quality of the testimony. One hundred sixty practitioners took part in an experimental vignette survey. Results of qualitative analyses demonstrate that the assessments of the witness and the statement contents are interrelated: across all experimental conditions, respondents drew inferences about the quality of the testimony based on their assessment of the witness and vice versa. Furthermore, the same indicators were given various, at times contradictory, meanings, highlighting individual differences in professional practice and the noise in decision-making.
Introduction
Despite the increasing use of digital, forensic, and documentary sources of proof (Dubberley et al., 2019; Freeman, 2018), witness testimony is still frequently used in (international) criminal proceedings. Witness evidence, however, continues to challenge fact-finding. On the one hand, questions are raised regarding witnesses’ capabilities to provide accurate accounts of the events they had observed or experienced (Paulo et al., 2019; Wade et al., 2018). On the other, there is an increasing examination of the fact-finders’ abilities to reliably determine the witness's honesty and accuracy (McDermott, 2017; Sagana, 2018; Simon, 2012; Wistrich & Rachlinski, 2017). This determination is further complicated where fact-finders rely on accomplices or otherwise involved witnesses (Cryer, 2014; Nicolson & Auchie, 2017), commonly referred to as “insiders” in international crime cases (Kelsall, 2009; Whiting, 2009).
Assessing the accuracy, completeness, and objectivity of evidence provided by insider witnesses is a formidable task. Insiders are highly valuable. They harness privileged, unique information which might be crucial to establishing individual criminal responsibility of higher-ranking accused, commonplace at the International Criminal Courts and Tribunals (ICCTs) (Del Ponte, 2006; Fry, 2014; Wald, 2002). In this sense, insiders appear as quasi-experts on the organization in question, often providing evidence on the military and political structures, de facto functioning of the groups, and actions of the accused (Chlevickaitė et al., 2020; Wald, 2002). On the other hand, insiders present specific concerns regarding their objectivity and trustworthiness. The motivation of insider witnesses to testify, especially in the absence of plea agreements (Cook, 2005; Harmon, 2009), is a crucial issue in determining whether a witness can be expected to provide a complete and truthful account (Chlevickaitė et al., 2020; Stepakoff et al., 2014). Unlike victim-witnesses, insiders might, for instance, have been involved in the commission of crimes, have personal relationships with other members of the criminal organization, including the accused, or have related security concerns that might influence the extent to which they are willing or able to tell the whole truth (Cryer, 2014; Stover, 2005). Hence, the decision-makers are walking a tightrope between their concerns about the honesty of the witness and the significance of the evidence they might provide.
It is difficult, if not impossible, to estimate whether practitioners’ decisions about witness evidence are accurate, as no ground truth is readily available. Nevertheless, prior research has shown that judges find issues with up to 50% of insider witness evidence at ICCTs, to the extent that their probative value is profoundly compromised (Chlevickaitė et al., 2021). The implications of such a magnitude of loss of evidence at trial are considerable. Not only does it impact case outcomes, but it also means that many insider witnesses testify unnecessarily, without contributing to fact-finding, which is both expensive to the ICCTs and potentially life-altering to the witnesses. 1 Multiple explanations for this state of affairs are possible, both external: among other things, investigative challenges in uncooperative environments (Çakmak, 2017), multilingual and cross-cultural nature of the assessments (Kelsall, 2009; Nistor et al., 2020), time lapses (Bradfield, 2019), trauma (Smeulers & Grunfeld, 2011); and internal: for example, the mismatch between investigative, prosecutorial, and judicial witness assessment practices; inaccurate, inconsistent, or biased decision-making. While external factors are mainly out of the hands of ICCT practitioners, internal processes, and standards of witness evidence assessments can be studied and evaluated.
Assessment of Source: (Insider) Witness Credibility
Credibility assessments in criminal justice contexts consist of evaluating a witness's objectivity (or honesty) and competence (Delisle, 1978; Schum & Morris, 2007; Sluiter et al., 2013). Similar to domestic settings (Brodsky et al., 2010; Cohen, 2013; Kane, 2007), at the ICCTs, objectivity is assessed in reference to the witness's character and background information on prior relationships, membership in particular groups, identity (ethnic, national, other), the harm suffered, and incentives to testify (Chlevickaitė et al., 2020). Furthermore, though largely discouraged based on current scientific knowledge (Snook et al., 2017; Vrij et al., 2019), objectivity assessments continue to take into account the witness's behavior on the stand as an indicator of truthfulness as well (Chlevickaitė et al., 2021). While, arguably, nonverbal communication might serve other informative purposes during fact-finding (Denault et al., 2019), the risks of inappropriate assessments in cross-cultural settings are substantial (Johannesson, 2012; Vrij et al., 2011).
Assessments of insider witness credibility also differ from those of other witnesses: victims, experts, or overview witnesses (ICC, n.d.a). This is evidenced by the ICCT judges developing insider-specific credibility criteria: conditions of the plea agreement, criminal record, circumstances of confessions, detention or case status, involvement in the crimes, and others (Chlevickaitė et al., 2020). These indicators, as compared to the general witness assessments, are focused on examining witness motivations to avoid telling the truth overall or regarding specific areas of the testimony (e.g., personal involvement in the crimes or the involvement of the comrades) and indicate the specific focus on insiders’ reasons for not being truthful.
Another dimension of witness assessment is competence, characterized by the witness's ability to perceive, understand, and report the events witnessed. It includes evaluation of the witness's memory, medical issues, observation conditions, the ability to understand the language spoken, and related concerns (Agirre Aranburu, 2020; Brodsky et al., 2010). Researchers focusing on ICCT evidence have uncovered indications of serious witness competence issues arising from their conflict-related experiences, time lapse before the testimony, and similar reasons (Combs, 2009; Kelsall, 2009; Perrin, 2016; Swigart, 2017). However judicial assessments of insider witnesses seldom mention competence concerns (Chlevickaitė et al., 2021).
The difficulty inherent to assessing witness credibility is the essentially subjective nature of the process (Brodsky et al., 2010) and the difficulty in distinguishing between matters of truthfulness and competence issues. While truthfulness is assessed to identify a witness's inclination to tell deliberate lies, competence might provide honest explanations for the shortcomings in witness evidence that might otherwise be attributed to lying.
Assessment of Information: Testimonial Quality
Assessment of whether the information provided by the witness is accurate and complete, also known as evidence reliability, can also be divided into two aspects: external and internal validation (Chlevickaitė et al., 2021). External validation relates to consistency with prior statements by the same witness and corroboration or contradiction with other evidence in the case. Since external validation requires additional evidence, it is not always feasible. Accordingly, practitioners must assess witness statements in and of themselves (internal validation), which is most relevant for the current study.
ICCT researchers have uncovered a widespread lack of detail, implausibilities, inconsistencies, and related shortcomings of international witness testimonies (Cohen, 2013; Combs, 2017; Kelsall, 2009). While, at least initially, judges seemed to be willing to explain some of these deficiencies away by reference to competence issues (Combs, 2009), recent studies uncovered a stricter approach towards witnesses, where issues in testimonial quality are closely linked to the negative outcomes of judicial witness assessments (Chlevickaitė et al., 2021; Combs, 2017). Such assessments tend to focus on the extent of detail, basis of knowledge, and consistency, both internally and as compared to other evidence in the case (Combs, 2010; Kelsall, 2009). Furthermore, judgments commonly refer to the “plausibility” of testimony, indicating that a witness's account is, on the face of it, reasonable and compelling (see, e.g., Prosecutor v Ndindabahizi, 2004: §23; Prosecutor v Bemba, 2016: §230). To note, the assessment of what is “plausible” is especially complicated in cross-cultural settings (Granhag et al., 2017; Maegherman et al., 2018).
Indicators of high-quality testimony thus depend on both the contents of the testimony and the context of each case. Where additional evidence is available, decision-makers might focus on its corroboration and contradiction by external sources (Combs, 2017). Where such evidence is unobtainable, factfinders have only internal quality factors to rely upon. Furthermore, while reliability factors appear to be more objective, and thus fewer differences between insider and non-insider witnesses may be expected, legal decision-makers might have different, or higher, expectations of the type and quality of information to be provided by insider witnesses due to their rank, involvement, and other aspects of their profile (Agirre Aranburu, 2009).
Source and Contents: Inevitable Dependencies?
The assessments of credibility, and those of reliability, may not be easily separated. Research has demonstrated that source credibility impacts the persuasiveness of the message (Brodsky et al., 2010; Mondak, 1990): the more credible the source is considered to be, the more persuasive the message appears (Pornpitakpan, 2004; Smith et al., 2013). Furthermore, legal reasoning is first and foremost reasoning by inference (Roberts & Redmayne, 2007), whereby the credibility of the evidence source “forms the foundation for cascaded reasoning” (Schum & Martin, 1982, p. 114), also known as “inference networks” (Schum, 2009, p. 198). Here, the assessment of each fact is dependent on the assessment of the witness's credibility and the inference is directed from the witness to the information, as depicted in the figure below (De Smet, 2020, p. 627): Figure 1.

Components of trustworthiness.
While the assessment begins with observing the information provided, reliance on this information depends on whether the witness is considered to be trustworthy. Thus, source assessment appears to mediate or otherwise influence the assessment of information. Importantly, such an understanding of reasoning from evidence introduces the possibility that factors other than the relevance/quality of the information may determine the decision-maker's confidence in the truthfulness of the account (Schum & Martin, 1982).
The opposite inference is also possible, whereby the source is assessed based on the information provided. According to Sobel's theory of credibility, “someone becomes credible by consistently providing accurate and valuable information or performing useful services” (Sobel, 1985, p. 557). This link is explicit in some of the intelligence analysis models (UNODC, 2011; US Army, 2012), and has been found in communications research: message quality can directly affect and partially mediate the effects of initial credibility assessments on subsequent source credibility assessments (Pornpitakpan, 2004). Prior research on insiders’ assessments at ICCTs also found instances of witnesses who had serious trustworthiness concerns that were alleviated by providing highly relevant, (self-)incriminatory information (Chlevickaitė & Holá, 2016; Kelsall, 2007).
The relationship between the assessment of the source (the witness) and the information in (international) criminal justice settings is unclear. On the one hand, the jurisprudence of ICCTs supports a dual approach to witness evidence assessments, whereby a credible witness may provide inaccurate information and vice versa (Kunarac et al., 2000: §8; Ntaganda, 2019: §53). Hence, in theory, the credibility of the witness ought not to determine the overall assessment of the information and the other way around. However, whether practitioners follow this approach is not known. Judges at ICCTs are allowed “free evaluation of evidence” (Caianiello, 2011; McIntyre, 2014), which unbinds them from any regulations on the factors or aspects of witnesses or their evidence to take into account. Others, such as prosecutors, defence, victims’ lawyers, investigators, and analysts, may resort to internal guidelines, if available, or rely on their individual, professional experience and expertise. The only publicly known existence of analysis guidelines in practice at ICCTs is the International Criminal Court (ICC) Office of the Prosecutor model for analytical source assessment (Agirre Aranburu, 2020). 2 In line with the jurisprudence, the guidelines divide the assessment criteria between source- and information-related, to be assessed independently from one another.
While the division of source and information assessments might be desirable to structure the decision-making process, 3 both the possibility and the practicality of independent source and information assessments in criminal justice contexts are questionable. Irwin and Mandel found that assessors instructed to score information and source separately tended to pair source and information accuracy and to base decisions about accuracy more on information than on the source (2019). Conversely, where no other data on information reliability was provided, evaluators tended to base their content rating on source credibility, assuming that credible sources tend to produce reliable information (Irwin & Mandel, 2019; Volbert & Steller, 2014). Moreover, Samet showed that analysts estimate accuracy less reliably when basing their decisions on separate reliability and credibility metrics than when accuracy estimate is based on a single measure combining the two metrics (Samet, 1975). Finally, the assessors may well be unable to disregard the qualities of the source while assessing information and vice versa due to common heuristics and bounded rationality of human cognition (Cook et al., 2003; Nisbett & Wilson, 1977; Simon, 1990). Such difficulties are supported by studies on instructing juries to disregard specific evidence after it had been introduced (Lieberman, 2000; Steblay et al., 2006).
This study is a first attempt to assess the extent to which such a separation, or a lack of it, can be observed in international criminal justice practitioners’ assessments of insider witness statements by employing an experimental vignette study.
Methodology
Participants
One hundred sixty current and former practitioners of international criminal law completed the vignette survey. Respondents, all individuals with professional experience with witness evidence in at least one ICCT 4 were recruited via purposive and snowball sampling. While the contact with individuals was personal (thus, the details of the respondents are known to the author), the survey responses were collected and shall be reported anonymously. Table 1 presents an overview of the demographics of the respondents. The sample is nearly balanced among genders (42.5% female and 52.5% male). The majority of the respondents have either legal (Prosecution, Defence, Chambers, and Legal Representatives of Victims) or investigative (Investigator and Analyst) experience. The duration and range of professional experience are reflected in questions on institutional backgrounds and years of practice.
Overview of Respondents’ Characteristics.
Materials
An experimental vignette study with a 2×2 factorial design was used, where two independent variables: source quality (credibility) and information quality (reliability), were manipulated in text-based vignettes, depicting excerpts of fictitious insider witness statements in a hypothetical situation. Each vignette included basic witness information, an explanation of the witness's involvement in the armed forces/group, description of context, and a potentially criminal incident. Each respondent was exposed to two vignettes (thus response N = 320); therefore, two comparable witness statements were created: one depicting a military insider witness, another one—a rebel group insider witness (see Appendix). Table 2 presents a visual overview of the factors.
Vignette Factors and Levels.
Vignettes were chosen as the most appropriate method since witness assessments, and specifically, the assessment of insider witness evidence is a sensitive matter for practitioners (Aguinis & Bradley, 2014; Hughes & Huby, 2004). Furthermore, vignettes allow for manipulation of the evidence characteristics; thus, causal inferences may be drawn (Aguinis & Bradley, 2014). To ensure that the vignettes were true-to-life (Hughes & Huby, 2004), they were developed based on authentic witness statements retrieved from the evidence databases of the ICC and the International Criminal Tribunal for the Former Yugoslavia (ICTY) (ICC, n.d.b; ICTY, n.d.). To further test the internal validity, realism and clarity (Taylor, 2006), the vignettes were piloted twice with four expert practitioners from ICCTs. The experts who took part in the pilot sessions were not invited to participate in the study. The vignettes were revised based on their feedback and piloted for the third time with a group of 10 researchers at the Netherlands Institute for the Study of Crime and Law Enforcement (NSCR).
Factors: Source and Information Quality. In order to manipulate the two factors: quality of the source and quality of information, the most prevalent criteria appropriate for a text-based statement assessment were selected based on the literature and prior analyses of ICCT case law (Chlevickaitė et al., 2021; Combs, 2010). Regarding source (S0/S1, source quality low/high), the focus is on potential bias: motives, personal relationships, and (risk of) self-incrimination. For information (I0/I1, information quality low/high), amount and extent of detail, and privileged information, were manipulated. Table 3 presents the source/information factor characteristics represented in the different settings. Notably, certain aspects of the statements’ quality were constant throughout all the conditions in order for the assessors not to dismiss them outright. Hence, the following characteristics were kept stable: coherence, the extent of detail not directly related to the conduct of the superiors, description of contextual events, direct observation, and insider status/role in the group.
Construction of Vignette Factors.
Procedure
The study was designed with the online survey software LimeSurvey. Eight conditions, with two vignettes each, were created, and participants were randomly assigned to one of them while maintaining a balance in the sample (20 respondents per condition).
The respondents were first asked to complete an informed consent form, 5 after which they answered a set of demographic questions. The instructions for the vignettes and the situation context followed. After that, respondents were presented with Vignette A alongside the questions (on the same page). To explore the process and the focus of witness assessments, practitioners were asked first to provide a numerical answer to the question: Q1: Indicate how useful you consider this witness statement to be for further fact-finding in this situation (investigation/trial) on a scale from 1 (not at all useful) to 10 (extremely useful), followed by an open-ended explanation of the score given. Following that, respondents were asked to list three problematic areas in the statement excerpt that they would like to follow up on during further investigative activities (question: Q2: Could you indicate the problematic areas in the statement excerpt that you would like to acquire additional information/clarifications on from the witness? Try to list at least three). The respondents could progress to Vignette B only after answering the questions, and they were not allowed to return to Vignette A, to avoid revisions and direct comparisons.
Analytical Approach
The answers to the open questions (Q1 and Q2) were analyzed qualitatively, using the theoretical thematic analysis approach (Braun & Clarke, 2006). All responses were read and coded by the author, using Atlas.ti software. The broadest level of themes was in line with the framework set out in the introduction: the analytical focus of the participants (manifest, semantic level) and the inferential framework, that is, the relationships between the assessment of the source and the content (latent analysis) (Kleinheksel et al., 2020). Additionally, the responses were coded for assessment factors, source, and information characteristics, mentioned by the respondents. After the initial coding and grouping of themes were completed, the themes and codes were reviewed, similar codes and sub-themes were grouped, and codes were verified for internal coherence, consistency and distinctiveness (Braun & Clarke, 2006).
Following qualitative analysis and coding, the themes and codes were quantified. In subsequent sections, the most common themes (profile of the witness, information provided) and codes are listed in Figures 2–3. Quantification was used for descriptive and comparative purposes and does not necessarily represent the most important themes.

Inferences of bias and knowledge across conditions.

Assessment of contents across conditions.
Results
Profile of the Witness: The Same Characteristics, Opposite Conclusions
This sub-section presents the results related to the assessment of witness profile: insider status and role/rank in the group, personal involvement in potentially criminal events, and personal relationship with a commander. The insider status and personal involvement in the events were kept stable across the conditions; personal relationship was included only in low source quality conditions. Figure 2 presents the distribution of the factors mentioned by the respondents across the conditions. The y-axis denotes the frequency at which a certain factor was mentioned across the 320 responses (in %). The responses were coded for the factors indicated (0 = not mentioned, 1 = mentioned). Source quality is denoted by S0/S1 (Low/High), information quality: I0/I1 (Low/High). The analysis begins with the factors present across all experimental conditions.
Assessment of Role/Rank: Knowledge or Bias? All statements assessed by the participants featured an insider witness, a witness involved in the alleged perpetrator group, whose rank was explicitly mentioned. 208 (65%) responses mention insider profiles as a factor in their decision-making. As expected, responses demonstrate the dilemma of assessing the relevance of the information vis-à-vis potential bias of the witness: respondents ascribed two different, competing, meanings to it. Two-thirds (67.3%) of insider profile mentions were positive: identifying witnesses as an insider, an individual in a position of authority and in command of others, was found to suggest additional, unreported knowledge of the structures, hierarchy, and individuals involved in the crime, familiarity with military matters, and other privileged information, for instance: he is a high enough officer or in a position to have reliable information. (D24: S1I1 ID32
6
)
in particular because of his position within the DAF, where he could have been privy to information which may not be obtainable through other sources. (D113: S0I0 ID102)
The above suggest that the person would know the structure and membership of the armed group in question, have information about its modus operandi, insider knowledge about various military operations etc. (D6: S1I1 ID22)
While witness rank for many was indicative of additional, high-value information that the witness could provide, a third of insider profile mentions expressed concern regarding potential bias. For instance, the witness's membership in an armed group was seen as a direct indication of a lack of neutrality, and described in unequivocal terms: clearly not a neutral witness (D67: S0I0 ID67)
his account of the events is clearly one-sided. (D113: S0I0 ID102)
This confidence in profile-based inferences was a common feature in the responses concerning witness rank/role, though some respondents provided for the possibility that the witness was not objective, but genuine. This observation links back to the jurisprudence on witnesses being credible (perceived as willing to tell the truth), but unreliable (not having or being able to convey accurate or complete information): witness is a senior member of the ZTI, and therefore may have provided biased information
The recognition that the witness might be mistaken, or unconsciously biased, was a present, but infrequent occurrence in the responses. Furthermore, even where the witness was considered not objective, some respondents observed explanatory factors, or other reasons that would outweigh the possible bias. One of these was the transparency with which group membership was reported: assumptions are made in DAF favor (e.g., no real knowledge of prisoner fate, but willing to say he ‘doesn't think anyone was killed’) - but
Finally, the challenging task of assessing insider witnesses is epitomized by contradictory inferences by the same respondents. On multiple occasions, respondents mentioned the dual nature of identifying the witness as an insider: both as a positive, and as a negative factor. From the quantitative overview in Figure 2, the positive inference is visibly more prevalent across all conditions: inferences of high knowledge (N = 209) were more common than inferences of bias (N = 113). However, without further analyses, it is difficult to tell whether it also had a stronger influence on the assessment outcomes. Based on the responses, respondents differ in both the conclusions drawn from the information presented and the relative confidence in their conclusions. Similar patterns were found regarding other factors.
Personal Involvement: Risk of Self-Incrimination, Inferring Honesty, and Direct Knowledge. Personal involvement in the events, as reported by the witness, might be a good indicator that the witness's knowledge is direct. Hence, where the witness acknowledged personal involvement, some assessors saw it as an additional positive sign regarding the immediacy and authenticity of the witness's knowledge. In 56 responses (17.5% of all responses) this was interpreted as indicating further knowledge and additional information that could be acquired from the witness. Apparently, he was present for several hours and should be able to provide more details about the people involved, the precise conduct carried out and the identity of the victims. (D21: S0I1 ID31)
The witness was commanding a unit involved in fighting in NESRIDE and can provide a direct account of the events. (D123: S0I1 ID108)
The other side of reporting involvement in potentially criminal events was the inference of self-incrimination fear, which would reduce the likelihood that the witness would report the facts objectively and exhaustively. Though observed less commonly (6% of all responses), it provides another example of the duality of certain factors: since he also participated in the commission of the crimes, his statement should be taken with the utmost caution as he could be undermining the reality to undermine his own responsibility. (D146: S1I1 ID122)
Hence, like the assessment of witness's rank and membership in the organization, personal involvement in crimes was assessed inconsistently. To some respondents, it indicated extensive knowledge, thus potentially highly relevant information. To others, the quality of the information notwithstanding, personal involvement meant the witness was not likely to deliver a truthful account due to self-incrimination fears and must be approached with caution. Finally, some respondents expressed the possibility that both inferences could be correct, directly demonstrating the complexity of the decision-making in this area.
Personal Relationships. Parallel to the opposing meanings ascribed to witness rank and involvement in potentially criminal actions, the assessment of a personal relationship with a senior commander in the group 7 was twofold.
Out of 39 mentions of a personal relationship (12% of all responses), two-thirds (26/39) of the inferences were negative. For these respondents, a family/personal relationship with a commander of a group potentially under investigation clearly indicated bias and created an expectation that the witness will or may minimize the commander's responsibility for the events in question. Again, for some respondents, the relationship was clear and linear. Yet, the majority of the respondents tempered the conclusion of this type of bias, indicating a lower degree of confidence compared to the bias inferred from the witness's profile as an insider in the group and potential culpability for the crimes. He is very close to LEFBEN, and would certainly be inclined to defend him. (D161: S0I0 ID134)
LEFBEN is the witness’ uncle, and the very reason he joined the military. He may be loyal to a person who is both his boss and a family member, and not inclined to incriminate the commander. (D221: S0I1 ID180)
One-third of all mentions inferred a different meaning from the personal relationship. First, it indicated that the witness was forthcoming, as the assessor would not expect the witness to volunteer information about family relations: voluntary disclosure of family relations to Brig. LEFBEN (aka LETO) positively influences the score. (D99: S0I0 ID94)
Secondly, the relationship might lead the witness to be in a better position to acquire valuable information, and thus positively influence the assessment of the witness's knowledge: I do think that this could be a relevant witness as his relationship to MOSO and role as an area chief places him in a very good position to know what happened. (ID125: S0I1 ID109)
Again, the same factor: a personal relationship with the accused, depending on the respondent, was seen as an indicator of the witness's honesty, knowledge of relevant facts, or potential bias, parallel to the findings concerning other witness credibility-related indicators.
Information Provided: Who Sets the Standard?
This sub-section presents the results related to the assessment of information quality, specifically where the quality of the statement contents is linked to conclusions about the source (witness): admissions of being personally involved in the commission of a crime, lack and type of detail provided, hesitations (expressing uncertainty in the statement), and implausible or unclear assertions. Figure 3 below presents the distribution of factors mentioned by the respondents across experimental conditions (S0/S1—Source quality Low/High, I0/I1—Information quality Low/High).
Admission of Involvement in the Commission of Crime. As described above, indications that the witness was personally involved in the events were perceived as indicative of high-quality knowledge or a possible cause for lack of objectivity. However, a third inference was found where the witness went beyond indicating personal involvement and into admitting that crimes were committed. Where the witness recounted events in first person language and/or provided detail of ordering, planning, or otherwise being directly involved in criminal events, some assessors (in 61/320, or 19% of all responses) inferred that the witness was forthcoming, “willing to talk about the events” (D68: S1I1 ID67), and that his recollection was “sincere” (D152: S1I1 ID127). This inference of willingness to admit involvement in crime, though manifestly included only in S1 (high source quality) scenarios, was observed across all four conditions: He also
Noticeably, the respondents’ suspicion of witness's dishonesty increased with decreasing quality of information, from readily taking the self-incriminatory information as a voluntary admission or truthfulness in the high source and information quality condition, to “accidental” admission in the low source and information quality condition. Again, this indicates a link between the contents of the statement and the perceived credibility of the witness.
Assessing Hesitation. An interesting phenomenon occurred when respondents were faced with knowledge limitations and hesitations expressed by the witness. Certain qualities of speech, such as passive voice, correcting oneself, and qualifying the confidence in the information shared resulted in twofold impressions. Hesitations were mentioned in 35 responses (11% of total responses). For some respondents (N = 9), hesitations indicated nuance or admissions that the person is unsure: witness openly admits own uncertainty (D216: S1I0 ID176)
Recognises limitations in language terminology (think/maybe) (D142: S1I1 ID120)
nuance provided by terms such as (‘I think’) (D99: S0I0 ID94)
When interpreted in this light, uncertainty and limitations were seen to help the assessor know which parts of the reported facts the witness was confident of and which ones he was more hesitant about. Conversely, some respondents inferred hesitations to mean speculation and being passive about the events witnessed: Sometimes he speculates – ‘I think…’ and ‘I don't think’ which I do not like in a statement which should be about things he knows (D107: S0I1 ID99)
he uses phrases such as ‘I think’ and ‘maybe’ which leads to uncertain in regard to what he is saying (D118: S1I0 ID105).
Why does he use the word 'maybe' twice in the last two sentences? To be explored (D221: S0I1 ID181)
Once again, this demonstrates the divergent inferences, where the same words are ascribed quite different meanings, with contradictory consequences. It also reflects the subjectivity in assessing the language and contents of the testimony. Furthermore, since these divergent conclusions were found across experimental conditions, the direction of this inference appears to depend on the individual decision maker.
Detail and Plausibility: Evasive or Genuine? Mentions of the level of detail, especially the lack of it, were relatively frequent across the responses (147/320, or 46% of total responses). Importantly, indications that the statement lacks detail were found across conditions, though the actual detail mentioned differed only across high and low information quality conditions (I0, I1). It indicates that the respondents might be considering additional factors when assessing the extent of detail provided or assess the same extent of detail differently. Similarly, information was determined to be implausible across conditions, again pointing toward the subjective interpretation of what is “implausible.” Beyond observing the presence of these issues, the respondents also considered whether they were intentional or genuine: He
information provided by the witness is unclear –
Deciding whether omissions are genuine or intentional based on only the information provided is a difficult task. Hence, respondents sometimes relied on the witness's profile: role/rank, relationship, and involvement to make a decision. Some respondents (in 29/320, or 9% of all responses), faced with a shortcoming in the information provided by the witness, clearly searched for the explanation elsewhere, and combined the information quality with the source characteristics to decide between genuine or deliberate causes: Most likely, the witness's family ties with Brig. LEFBEN and self-apprehensiveness of being involved in the commission of heinous crimes in April 2014
Witness is the direct participant in the events, so he is clearly trying to protect himself/his superiors and subordinates,
While it could be true that the witness does not possess the requisite knowledge and details, the respondents found the source profile sufficiently informative to make conclusions about the information provided. However, the difficulties did not end here. Even after deciding that the observed shortcomings in the information provided were likely due to deliberate actions by the witness, the assessors had to decide whether all information was to be dismissed, or whether some degree of truth remained: It cannot be discounted that some of the account is true even if he is minimising his role or withholding some incriminating information. (D241: S0I1 ID198)
No further explanation as to how this assessment—determining which parts of the statement are true despite the witness's deliberate omissions—was found in the responses. While the experimental conditions did not allow for additional information to be consulted, this is one of the instances where, without supplementary sources, an accurate decision regarding the reasons for the lack of detail is doubtful.
Meeting the Assessor's Expectations
The sections above demonstrate that assessments of witness statements go beyond analyzing the information on the page. The profile of the witness, characterized by their rank/role in the group, personal relationships, and involvement in the events appear to inform a “standard” for the quality of information expected of the witness. Hence, it can be cautiously concluded that there is a relationship between the assessment of the witness and the assessment of information, in line with the model depicted in Figure 1. Some respondents, perhaps based on their prior experience, demonstrated clear expectations of the type and amount of detail to be delivered by a witness with an insider profile: Statement does not seem
Witness
Should he not be fully aware, in his position, who was in charge? (D221: S0I1 ID180)
At times, this standard appeared to be based on the profile alone, while other respondents combined the rank/role with reported involvement. The extent of detail expected from the witnesses also differed across the responses. Some respondents expected quite extensive, verifiable evidence: He was overseeing the arrest and transmitted the order to his men, mentioned some beatings, his brigade also helped with the interrogation,
Other respondents did not define this information in such strict terms, but rather expressed dissatisfaction with the level of detail provided and indicated the need for more. For instance, some respondents felt that a witness related to a commander should have provided more information about their relationship, the commander himself, and his actions, failing which, the witness appeared in a more negative light: He is a commanding officer. He
His uncle is a commander in the area and
This combination, or direct mentions, of the witness's profile and the assessor's expectations is a rather problematic aspect of witness assessments. The responses above indicate a lack of a standard for what is considered to be a “detailed” statement overall, and such assessment appears to be related to the profile of the witness providing the information. The flexibility is understandable, and likely necessary, as each witness will have different abilities, the extent of memory, and the capability to testify in detail. However, basing expectations on a subjective evaluation of a witness is of questionable accuracy and likely inconsistent across different assessors. Finally, as touched upon above, the divergences in what the respondents considered to be a vague or low detail statement, show that different assessors might evaluate similarly detailed statements differently, based on their subjective expectations and tendencies (Bond & Depaulo, 2008).
Insider Witness Evidence Assessment Model
The insider witness evidence assessment process, including the inferential relations between the different assessment factors found in the vignette responses, is expressed visually in Figure 4. This model builds upon and incorporates prior quantitative analyses of judicial insider witness assessments (Chlevickaitė et al., 2021) and the formal inferential models described in the introduction (De Smet, 2020; Schum, 2009). Importantly, this model elucidates the process assessors seem to follow when determining whether the limitations observed in witness statements are genuine or not by introducing the “expectations of quality” determinant. While prior models have identified the distinct factors constituting witness objectivity (truthfulness) or competence, they have not explained the process of evidence assessment, thus missing an area key to understanding decision-making.

Model of insider witness evidence assessments.
Based on the study findings, insider witness evidence assessment starts either directly with the assessment of statement quality or with witness objectivity and competence. In the former process, any issues identified in the statement are followed by the determination of the reasons for these deficiencies: the qualities of the witness. In the latter process, witness objectivity and competence are assessed first. Then, based on assessing witness objectivity and competence, the assessors determine, whether explicitly or not, their expectations of statement quality and compare these expectations with the information provided by the witness. The expectations of quality also feed back into the determination of whether there are issues with quality or not, as demonstrated by the lowest arrow in the model. For instance, where initially lack of detail is observed (issue with quality), but the assessment of the witness indicates to the assessor that more detail cannot be expected of the particular witness (e.g., due to indirect observation, memory issues), the lack of detail is not taken to be an issue that needs to be addressed. Hence, expectations of quality form an integral part of witness evidence assessment. However, these expectations also introduce a layer of subjectivity, as everyone might have personal expectations based on their interpretation of witness objectivity and competence, as was demonstrated throughout this paper.
The model also demonstrates the links between the assessment of the witness and the assessment of the information provided. In the instances where no issues with the evidence are found (where evidence is of high quality), this is shown to positively impact the assessment of witness objectivity and competence and/or directly lead to the acceptance of witness evidence without explicit assessment of the witness qualities. The other way around, where issues are found with witness objectivity and/or competence, it links back to the determination of whether the evidence provided has deficiencies via the expectations of quality.
This model explains the process of witness assessments based on the data available to date. Understanding the process allows for its modification in practice, but it does not determine the approach that would lead to the most accurate, consistent, or reliable decision-making. Further studies exploring the causal relationships between the distinct factors and their clusters, as well as the comparative weight assigned to them, would be useful in determining which steps in the process are the most predictive of the final acceptance of witness evidence.
Discussion
The assessments of insider witness statements illuminate the challenging task practitioners face in reconciling their concerns of witness motivation, or potentially tainted credibility, with the quality of information contained in the testimony and vice versa. This study found clear indications of a bi-directional relationship between the quality of the witness (credibility) and the quality of the information provided (reliability) in the respondents’ decision-making. Inferences were also drawn from certain (observed) qualities of the witness to other (unobserved) qualities of the witness.
Regarding decision-makers’ focus, frequent mentions of potential motives or self-incrimination bias were found, which is unsurprising considering the profile of the witnesses and prior research findings (Chlevickaitė et al., 2021; Combs, 2017). Not only respondents focused on the witness's insider profile and personal involvement, but these factors were also assigned multiple, contrasting meanings. Insider profile, personal involvement, and personal relationships were chiefly interpreted to imply bias, but also: high-level knowledge or demonstrated honesty, uncovering the complexity obscured in jurisprudence analyses.
Similar observations were found with regard to the assessment of statement contents. Lack of detail and hesitations were assessed not only as indicators of lower statement quality but also linked to witness credibility. Most prominently, respondents inferred either evasiveness or genuinely deficient knowledge from a lack of detail and hesitations. Here, the inference appeared to depend on the prior assessment of the witness's objectivity: where the witness presented objectivity concerns to the respondent, shortcomings in the statement could be attributed directly to bias and evasiveness. Related, what constituted shortcomings was also informed by the respondents’ assessments of witness profile. The indications of the witness's rank in the group and role in the events were linked to a certain (individual) standard or expectation for the quality of the information to follow. Where witness statements failed to meet that standard, they tended to be attributed to evasiveness.
These findings tend to support the assertion that the assessments of the source (witness) and information (testimony) are not independent and have a complex relationship. While some legal decision-making models include witness trustworthiness as a mediating factor between a fact asserted and a fact confirmed (“if witness A trustworthy, information trustworthy”) (De Smet, 2020; Schum, 2009), the inferences appear to go the other way around as well: if information is trustworthy, witness A is trustworthy. This confirms prior research in communications studies, demonstrating that the quality of the source influences the persuasiveness of the message and vice versa (Pornpitakpan, 2004; Smith et al., 2013). It further confirms prior studies on judicial assessments of insider witnesses, where insiders of questionable credibility were relied upon due to the high quality of the information provided (Chlevickaitė & Holá, 2016; Kelsall, 2007). Importantly, these findings also support Sobel's theory of credibility, whereby “someone becomes credible by consistently providing valuable information” (Sobel, 1985, p. 557). Furthermore, the different approaches demonstrate that the formal analytical guidelines or formal, unidirectional, inferential networks, with the source being either a separate or a mediating factor in the assessment of information, might appear rather different in applied settings.
The reliance on inferences, as well as a certain extent of subjectivity, is an inevitable and largely unproblematic feature of legal reasoning. However, it becomes problematic where the assessment of the same information is widely inconsistent, and inferences are drawn in multiple directions. Whichever approach is taken by the assessors (source and content evaluated separately or not), like evidence should be treated alike, and different evidence should be treated differently. Likewise, certain factors, for instance, witness background, should have similar implications across the assessors. In other words, the expectation is of little subjectivity and of a discernible pattern of decision-making, which was not found regarding several salient assessment factors: for example, profile, involvement, personal relationships, hesitations, and lack of detail. This diversity of assessments indicates a certain degree of subjectivity and noise, which in the model is accounted for by “expectations of quality” criteria, hence formalizing the subjectivity inherent to the process. This finding is in line with prior research demonstrating that where information is complex, and decision-making is unstructured, it is vulnerable to bias and noise (Greene & Ellis, 2007; Kahneman et al., 2019; Sagana, 2018). “Noisy” decisions are observed in the diversity, or spread, of decision outcomes when different individuals are faced with a similar problem (Kahneman et al., 2016). Such decisions are prone to heuristic, subjective thinking, especially where multiple pieces of information have to be assessed concurrently (Dunstall & Reeson, 2009; Kahneman, 2003; Kahneman et al., 2016, 2019). Considering the complexity of the task and the diversity of ways the practitioners approached it, it appears that assessments of witnesses and their statements might be a perfect setting for sub-optimal decision-making (Brehmer, 1992; Chermack, 2004; Edland & Svenson, 1993).
This research has two main practical implications. First, the results show that assessments of the source and the information are, to an extent, not independent. Thus, the standard operating procedures or analytical guidelines used by organizations should either explicitly instruct the assessors on the extent to which such inferences are acceptable and in what circumstances they can be relied upon, or implement procedures where source and information can be assessed independently, if that is desirable. 8 Ignoring the intrinsic links between the assessments of the witnesses and their statement is not the solution, as the assessors are still likely to base their decisions on both, in an inter-related manner, without making it explicit and thus not subject to review.
A second practical implication relates to the diversity of decisions and inferences observed. Based on the analyses presented above, it appears that witness statement assessments might be worryingly “noisy.” Such a range of inferences is not desirable in situations where consistent and discernible decision-making is expected and might also be a precursor for individual subjectivity and bias. It could be useful to conduct an audit of diversity, or spread, of decision outcomes when different individuals are faced with a similar problem, to assess the consistency in their judgments (Kahneman et al., 2016). Unlike for the assessment of bias, no ground truth establishment is necessary for the assessment of noise, thus it should be available in many criminal justice settings. The outcomes of the audit could inform the development of improved source evaluation guidelines and training techniques.
Finally, this research was not without limitations. First, the statement excerpts presented to the respondents were necessarily shorter than most of the real-world international criminal justice witness statements. This might have made the manipulated factors more explicit and easier to spot, though all effort was put into maintaining realism. It is also likely that the respondents did not pay as much attention to or consider all the factors to the extent they would in the real-world task of witness evidence assessment. The shorter attention span was mitigated by assigning respondents just two vignettes and including a progress bar to reduce anxiety about the length of the process. Even with the mitigation of these risks, some authors suggest that vignettes are artificial and may lack external and ecological validity (Sauer et al., 2014; Taylor, 2006). However, vignettes also allowed for manipulation and control of the factors presented to the respondents, which should outweigh the possible concerns with realism. To evaluate the extent to which these experimental conditions were realistic, it would be ideal to conduct further research into real-world witness assessments across parties and organs of the ICCTs. It would also allow uncovering whether there are systematic differences between diverse types of practitioners or the parties they represent. Finally, it is also possible that the diversity of responses was due to the respondents’ cultural and linguistic profiles. While this may be the case, this limitation is also present at the ICCTs, which is the context of interest. Additional research with homogeneous samples of ICCT practitioners could be conducted to assess whether cultural diversity indeed affects respondents’ perceptions.
Footnotes
Acknowledgements
I would like to thank my PhD supervisors Dr Barbora Hola and Prof Catrien Bijleveld for their extensive support in conducting this research. I am also grateful to all the respondents who took part in the study, the experts, and the NSCR colleagues who participated in the pilot of the vignettes, and everyone who helped reach the participants. Finally, my gratitude goes to the three anonymous reviewers for their considerate, insightful, and by all means, helpful, suggestions and comments.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek, (grant number 406.17.519).
Notes
Appendix Sample Vignettes
Author Biography
), an interdisciplinary research center at the VU Amsterdam and a fellow at the Netherlands Institute for the Study of Crime and Law Enforcement (NSCR) in Amsterdam, where she conducted her NWO-funded PhD research in 2017–2021. In 2013–2017, Gabriele was an analysis assistant at the International Criminal Court.
