Abstract
The testimonies of insider witnesses are often key to prosecutions of international crimes, despite significant trustworthiness concerns. However, we know little about the practice of judicial assessments of insider testimonies, that is, which factors the judges consider relevant to relying on insider testimony. With this article, we set out to provide a comprehensive, explorative examination of the insider witness assessment factors used by the trial judges at the International Criminal Tribunal for the former Yugoslavia, the International Criminal Tribunal for Rwanda and the International Criminal Court in 1996–2019. By using multiple correspondence analysis, we show that the factors related to insider witness assessment outcomes are generally similar across the tribunals and tend to focus on the contents of the testimonies, with less attention given to credibility or competence concerns. This research constitutes the first systematic quantitative analysis and cross-institutional comparison of insider witness assessment practice at an international level.
Introduction
International criminal courts and tribunals (ICCTs) are set up to investigate and prosecute cases of large-scale, systematic criminality organized and committed by extraordinarily complex, unconventionally structured groups (Rusch, 2014). 1 Deciphering the organizational ties and identifying those most responsible for the criminal acts is a formidable task for any judicial body, and even more so for ICCTs, which largely lack the contemporaneous investigative tools available to domestic law enforcement (Whiting, 2009). Thus, documentary or forensic evidence is rarely available, or its collection in conflict and post-conflict settings proves to be too challenging (Sluiter, 2005; Wald, 2002). Identifying the individuals who had planned or ordered the crimes in this context necessitates the involvement of witnesses from the inside of the criminal organization, privy to information on the acts and omissions of the group’s leadership. Insider witnesses are in a unique position to provide evidence linking the organizers, order-givers or planners of the atrocities to the crimes committed on the ground, which is essential for assessing the individual responsibility of high-ranking officials (Combs, 2018: 259). 2 This unique knowledge makes insiders indispensable to international criminal investigations and prosecutions (Chlevickaite and Hola, 2016).
Because of their links with the criminal organizations, insiders often present special concerns regarding their credibility and the reliability of their testimony (Chlevickaite and Hola, 2016: 675; Chlevickaite et al., 2020; Combs 2017). Furthermore, insiders occupy a particular position as quasi-experts, providing overview testimony and placing various individuals and their actions into context, which increases the chances that an insider can influence the overall narrative of the case (Roth, 2016: 782). Hence, heavy reliance on insiders may lead to increased factual uncertainty, a well-documented issue in both domestic and international criminal proceedings (Cryer, 2014; Kelsall, 2009; Roth, 2016). The difficulties in both procuring credible insiders and effectively assessing their testimonies constitute a fundamental problem at the ICCTs: major cases have fallen apart at least partly owing to insider credibility concerns, hindering the ability of the courts to get to the truth and hold those most responsible to account (for example, Prosecutor v. Kenyatta, 2014; Chlevickaite and Hola, 2016).
The key role of insider witness testimony and the prevalence of credibility and reliability concerns stand in contrast to the limited empirical research into the practice of international witness assessments. How do the judges determine whether an insider is providing an honest version of events? Which factors do they consider indicative of trustworthiness, and which ones prompt doubt? Existing scholarship, though constrained to individual institutions, cases and specific aspects of witness testimony, has identified the challenges faced by those evaluating international witnesses, for example, cultural barriers, linguistic misunderstandings, time lapses, a lack of material for independent corroboration (Combs, 2018; Kelsall, 2009). Further, researchers have found indications of a lack of attention by the judges to factors specific to international crime (such as trauma and time lapse) in relation to insider witnesses (Chlevickaite and Hola, 2016) and a strong focus on testimonial consistency (Combs, 2017). In relation to the general approach to witness assessments at the ICCTs, a recent examination uncovered a relatively consistent framework of criteria emphasized in the judgments at the modern ICCTs, containing factors related to witness truthfulness, competence and the quality of the information provided (Chlevickaite et al., 2020). However, because no systematic study of the actual witness assessments and their outcomes has been conducted so far, we do not know how this framework is operationalized during judicial decision-making.
Hence, this article provides a comprehensive, explorative examination of the insider witness assessment factors used by the trial judges at the International Criminal Tribunal for the former Yugoslavia (ICTY), the International Criminal Tribunal for Rwanda (ICTR) and the International Criminal Court (ICC) in 1996–2019. We carry out in-depth quantitative analysis of trial judgments to identify the criteria employed by the judges to assess the credibility of insider witnesses and the reliability of their evidence. By using multiple correspondence analysis, we uncover the relationships between these criteria and the assessment outcomes, that is, the eventual (lack of) reliance on a particular witness. We evaluate whether the practice of insider witness assessments aligns with the approach spelled out by the trial judges in the judgments and whether the practice has been consistent across the modern ICCTs.
In the following section, we describe the law governing witness assessments at the ICCTs and the principles that have emerged in relation to insider witnesses in the jurisprudence of these institutions. We then present our methodology and findings. In the conclusion, we discuss the results in the context of the broader discussion of fact-finding at the ICCTs, and the role of insider witnesses in criminal prosecutions.
Assessments of (insider) witnesses at the ICCTs
Witness assessments during international criminal trials are firmly in the hands of the judiciary. Because proceedings at the ICCTs are considerably adversarial, the parties bring the witnesses to the stand and elicit their testimony, primarily orally in court (Bonomy, 2007: 350; Doherty, 2013: 943). The trial judges are responsible for evidence assessment and the consequent assignment of evidentiary weight to make findings of fact. In this process, the judges adhere to the principle of free evaluation of evidence. They are allowed to take into consideration any factors deemed important for determining the evidentiary weight of a particular witness’s testimony, unburdened by formal rules or procedures (Boas, 2001: 84; Caianiello, 2011: 302). The statutory frameworks of the ICCTs provide remarkably little guidance, and they are silent on the matter of evidence evaluation apart from the instruction to take into account witness credibility and the reliability of their evidence (ICC Rules, 2013: Rule 140(2)(b)). As pointedly observed by Stone (1991), ‘in principle and in practice, fact-finding, including the assessment of credibility, does not depend on legal rules; it begins where the law ends’. This flexibility has been criticized for compromising legal predictability (Behrens, 2011: 661) and encouraging judges’ overconfidence in their ability to weigh competing pieces of evidence in their specific context (Tilley, 2011). However, judicial freedom is formally constrained by the requirement to provide a reasoned opinion, thereby allowing for a review on appeal (McDermott, 2017).
General witness assessments
Despite the lack of formal rules, analysis of the ICCTs’ jurisprudence reveals a relatively consistent approach to witness evidence evaluation, as presented in simplified form in Figure 1. Prior to or after a witness’s testimony in court, the judges may rule on the admissibility of the testimony. The admissibility assessment, in content and procedure, varies across institutions (Klamberg, 2013: 375–80) but it commonly refers to potential prejudice, (prima facie) relevance, probative value and authenticity (ADC-ICTY, 2011: 76–80). During and after the testimony, in order to assign evidentiary weight, the judiciary generally pays attention to three sets of interrelated factors: the quality of the information the witness is providing (reliability), whether the witness is objective and truthful (credibility), and whether the witness is competent to provide the evidence (see the Methodology section for an in-depth description). A systematic exploratory review of the ICTY, ICTR and ICC judgments revealed general consistency in the factors that the judges include in the testimonial evaluations of credibility, reliability and competence (Chlevickaite et al., 2020).

The process of witness assessments at the ICCTs.
Approach to insiders
Regardless of the relative uniformity of the general factors to be taken into consideration in any witness evaluation, the approach to assessing insider, or accomplice, evidence is poorly defined. Even though one ICTR judge has confidently stated that the ICTR chambers have ‘developed a detailed approach to determining the reliability of accomplice evidence’ (McIntyre, 2014: 7), this approach was neither clearly delineated nor consistently applied in subsequent case law. At the core of the ICTR approach to insiders was the requirement of ‘caution’, which obliged the trial chamber to question whether the witness had a motive or incentive to implicate the accused, in which case their testimony was to be checked for corroboration (McIntyre, 2014: 8). Uncorroborated accomplice testimony, although prohibited in some domestic systems (Acconcia et al., 2014), can, and is, used at the ICCTs. However, the judges are supposed to approach it with more scrutiny (Prosecutor v Kordić & Čerkez, 2001: §630; Prosecutor v Niyitegeka, 2003: §48; Prosecutor v. Nchamihigo, 2010: §§39–49). This ‘cautious’ approach in assessing insiders has been mentioned, to a varying extent, in a majority of subsequent ICTY, ICTR and ICC judgments, to emphasize that more care was to be taken with regard to accomplices. Yet, neither investigating the motives nor checking for corroboration is a strict requirement for any trial chamber (Buisman, 2012: 338; Prosecutor v Katanga, Minority Opinion, 2014: §133, §152; Prosecutor v. Lubanga Dyilo, Dissenting Opinion, 2014: §§40–44).
Institutional contexts
Apart from the varied approaches to caution, we should remain conscious of the fact that the ICTY, the ICTR and the ICC differ in substantial ways depending on their mandates, geopolitical circumstances and prosecutorial strategies, which may affect the way in which insider witnesses are assessed at trial. Although all three institutions were set up to investigate and prosecute war crimes, crimes against humanity and war crimes, they did so in different time periods and locations, and they dealt with a vastly diverging number of cases (Chlevickaite et al., 2020: 824–6). Therefore, despite the fact that the three International Criminal Tribunals are comparable as different versions of the same phenomenon, we do not consider them to be identical.
Methodology
We analysed all the ICTY, ICTR and ICC trial judgments up to 1 September 2019 (N = 93). 3 Trial judgments are lengthy documents, some exceeding thousands of pages, covering both the legal and factual findings of the case. Witness assessments are not constrained to any particular section of the judgments and are dispersed through the main text and footnotes. We thus first read the judgments to identify insiders from all the witnesses mentioned, based on the witness’s position during the events in question, the nature of their evidence, and the subject areas covered in their testimonies. Our database includes all the publicly assessed insider witnesses (N = 1359). Next, we captured all the assessment factors mentioned regarding the witness or their testimony in a relational database. The coding instrument was developed iteratively based on case law. Initially, we captured all the factors as they were mentioned in the text. This resulted in a large number of sometimes closely related concepts, which we later categorized by their substantive meaning. Using this process, we arrived at 32 witness assessment factors, which we coded per insider witness as 0 (not mentioned) or 1 (mentioned). An overview of the variables and their frequencies is presented in Table 1 in the Results section. In addition, we collected available biographical and case-related data with respect to each witness, such as gender, affiliation (civilian/military) and detention status. 4
Frequencies of variables in the trial judgments (percentage of all insider assessments mentioning the factor).
a. Includes bias: background and bias: personal relationships.
b. Includes positive case status/plea agreement, no motive.
c. Includes bias: involvement, negative case status/plea agreement, motive/self-interest.
Witness assessment outcomes were analysed at the level of an individual insider witness. We coded the witness assessment outcome as a variable with three categories: negative, partial and positive. 5 The negative category comprises witnesses whom judges found to be completely non-credible and unreliable, as well as those whose credibility and reliability were damaged to such an extent that their evidence was to be accepted only if corroborated, and major parts of the testimony were dismissed or relied upon only for details not relating to the conduct of the accused. The partial category encompasses witnesses who were found to be credible and reliable for portions of their testimony (including its central elements and the conduct of the accused), but where some of the information they provided was dismissed. Finally, the positive category consists of witnesses who were found to be wholly credible and reliable.
We strove to stay as close as possible to the language employed by the judges for the precise meaning of the indicators, but some inference was necessary at times in relation to specific variables. 6 Furthermore, the judgments were not uniform in the amount of information they contained, owing to either a lack of transparency or a lack of attention to witness assessment issues, of which we cannot be certain. However, bearing in mind the trial judges’ duty to provide a reasoned opinion, we consider the judgments to be relatively exhaustive documents, as they would be the basis for any subsequent appeal proceedings. As such, we expect no major gaps in the assessments, especially in relation to such significant witnesses as insiders.
Assessment indicators
We categorized the 32 witness assessment indicators by their focus: witness objectivity, competence and testimonial quality.
Witness objectivity
The assessment of witness objectivity, in essence, is the evaluation of whether a witness has a reason to be dishonest. It rests upon the witness’s behaviour on the stand (for example, demeanour, attitude) and any background information that could suggest motives for not telling the whole truth.
The preference to hear witnesses in court is premised, in part, on the belief that observing an individual’s behaviour helps with determining whether they are telling the truth. This preference has been reiterated throughout the case law of the ICCTs (for example, Prosecutor v. Mucić et al., 1998: §597; Prosecutor v. Šainović et al., 2009: §60). The practice of assessing truthfulness on the basis of demeanour is widely criticized by cognitive psychologists, who have repeatedly demonstrated the lack of diagnostic power in most of the behavioural cues (Gravett, 2018; Snook et al., 2017; Vrij et al., 2019). Recent research concluded that there are no gestures, facial expressions or bodily movements that would be conveyed only by a lying individual (Vrij and Turgeon, 2018). The issue is even more complex when we consider the cross-cultural, multilingual, interpreter-assisted nature of most witness assessments at the ICCTs, since cultural or linguistic distance increases the risk of misunderstandings and erroneous impressions of honesty (EASO, 2015: 11, 42; Leal et al., 2018; UNHCR, 2013: 189).
Objectivity assessments also pertain to the individual circumstances of each witness, potentially revealing the perceived reasons to be untruthful. Among them, motives are singled out as the most prominent (Prosecutor v. Kordić & Čerkez, 2001: §629; Prosecutor v. Nchamihigo, 2008: §17; Prosecutor v. Ndahimana, 2011: §51). Insiders, owing to their involvement in or familiarity with the criminal activities, are perceived as likely to be motivated by the avoidance of self-incrimination, loyalty to the armed forces or political groups to which they belonged, or discontent towards their former comrades (Chlevickaite and Hola, 2016). Further, owing to their involvement and relationships, insiders may be more vulnerable to pressure from third parties (Cryer, 2014: 195; Mahony, 2010: 26) or the fear of retaliation by their former comrades (Combs, 2010: 137).
Although the judges have identified potentially relevant factors, the challenge lies in deciding how to interpret their effects, that is, saying that a witness has a reason to lie cannot be directly taken to mean that a witness is incapable of telling the truth (Prosecutor v. Kordić & Čerkez, 2001: §629). Thus, the ICCT chambers found it appropriate to accept parts of the insider testimony and to dismiss other parts, to require corroboration or entirely to dismiss the testimony, depending on the extent of their concerns about witness truthfulness (Chlevickaite and Hola, 2016: 696).
Witness competence
Witness competence is the ability to observe, remember and communicate, which is out of the witness’s control (owing, for example, to their physical and mental state, or the trauma or stress caused by the events). Just like witness objectivity, competence can be assessed based both on what is observed on the stand, for example linguistic difficulties, and on the information that is available to the fact-finders, for example a medical condition.
Researchers have uncovered several prominent competence issues in relation to witness testimonies at the ICCTs: trauma, time lapse, language and interpretation, memory, culture and educational status (Combs, 2010; Kelsall, 2009; Perrin, 2016; Swigart, 2017). In addition, the jurisprudence contains references to a witness’s age or vulnerability, the circumstances of the observation, the circumstances of the interview and their medical condition as potentially distorting a witness’s testimony or reducing its reliability (Chlevickaite et al., 2020). Judges have recognized the widespread prevalence of, for example, trauma and time lapse and have ensured the allowance of minor discrepancies or other minor issues to be accepted without diminishing the probative value of the account (Prosecutor v. Prlić et al., 2013: §285; Prosecutor v. Mrkšić et al., 2007: §14).
These ‘innocent causes’ of issues in the testimony (Combs, 2010: 189) may reduce the apparent credibility of the witness or the reliability of the evidence, especially if they go unnoticed by the judiciary (for example, if a traumatized witness is unable to provide a high level of detail; Prosecutor v Lubanga, 2012: §105).
Testimonial quality
The factors of testimonial quality are employed to assess whether the evidence provided is accurate and supported by other evidence in the case. Principally, the quality factors can be divided into internal and external ones.
Internal quality depends on the knowledge demonstrated and the coherence of the testimony. When assessing knowledge, the judges consider its immediacy (whether the account is based on direct observation or secondary sources) and its extent, which is based on, inter alia, the witness’s familiarity with and length of involvement in the events in question. Internally coherent testimony is logical, clear, articulate and internally consistent. Furthermore, precision, or the amount of detail, is taken into consideration to assess the witness’s familiarity with the events, to help decision-makers find supporting or contradicting evidence, or to indicate whether the witness was indeed a direct observer.
External quality factors, on the other hand, relate to consistency with prior statements/testimony by the same witness, as well as with other evidence in the case. Inconsistencies with prior statements are claimed to be both the most prevalent and most serious impediment to fact-finding at the ICCTs (Combs, 2017: 50). Their reported prevalence might be related to the fact that, by the time witnesses appear at an international criminal trial, many of them have already provided multiple statements on different occasions (Combs, 2017: 49–50). The difficulty of providing a consistent testimony is further confounded by the time lapse between the events in question, the provision of the statements and the eventual trial testimony (Combs, 2018: 276). Moreover, consistency with other evidence in the case, or corroboration, is another prominent feature of testimonial assessments. As discussed above, demonstrating a ‘cautious’ approach to potentially biased witnesses initially included the assessment of corroboration, though it is not a requirement in the rules of the ICCTs. The final, and perhaps most speculative, aspect of external validation is the assessment of plausibility, or whether a witness’s account is believable, compelling and reasonable on the face of it (Prosecutor v Ndindabahizi, 2004: §23; Prosecutor v Bemba, 2016: §230). Plausibility as an indicator of reliability has been highly criticized for its subjectivity and potential to be based on the decision-makers’ intuition, personal frames of references or gut feelings (Granhag et al., 2017: 49–50; Maegherman et al., 2018: 38–39). It is even more questionable when the witnesses and their assessors come from different backgrounds and experiences (Fujii, 2010; Vrij et al., 2016: 279), leading a number of international organizations to warn against relying on perceived plausibility for assessing individuals from different cultural backgrounds (Gyulai et al., 2013; UNHCR, 2013).
Methods
In order to explore the relationships among the criteria the judges claim to apply in witness evaluations and their outcomes, we used multiple correspondence analysis (MCA). We chose MCA because it is explorative, data driven and assumption free but it can reveal patterns in complex datasets, which fitted both our research aims and the data that we have collected. First, our data consist of a large number of categorical variables that do not meet the distributional assumptions underlying parametric statistical methods. Second, we sought to explore all the salient relationships among the variables simultaneously, and without pre-assumptions or hypotheses influencing the choice of variable pairs to compare, which is the main advantage of this method. MCA groups variables according to their co-occurrence (similarity) and identifies multiple interactions/relationships well (Greenacre and Blasius, 2006). The results are represented in a two-dimensional plot and interpreted based on the relative positions of the variables and their distribution along the axes: as categories co-occur more often, they are placed closer to each other in the solution (Bijleveld and Smit, 2006: 201–2). All 1359 insider witness assessments were included in the MCA to investigate the relationships between 23 assessment factors, 7 as well as the party calling the witness. We included the three assessment outcomes as supplementary variables, so that they would not influence the solution. 8 Further, we accompanied the MCA with bivariate correlations to check the significance of any relationships uncovered. The models for the three institutions, the ICTY, the ICTR and the ICC, were built separately.
Results
Before delving into the relationships between the assessment factors and the outcomes, we offer several general observations based on the frequencies of the factors found in the trial judgments, presented in Table 1.
First, the three ICCTs under examination differ widely in the number of factors mentioned in relation to the witnesses, reflecting variations in either transparency or attention to witness assessment issues. In this regard, the ICTY scores the lowest, with an average of 3.2 factors per insider witness, as compared with 6.2 at the ICC and 4.3 at the ICTR. Thus, the ICC stands out as either the most transparent or the most attentive to insider credibility and reliability issues. Secondly, the ICC judgments contain the greatest number of positive factors, such as high levels of knowledge or detail, positive corroboration, or coherence of the testimony, which were the least often mentioned by the ICTY trial chambers, suggesting a negative framework of assessment. In contrast, the proportion of wholly unreliable and non-credible insiders was also the highest at the ICC (47.0% of all insiders, compared with 28.2% at the ICTR and 17.5% at the ICTY). For the ICTR and ICTY, the most common assessment outcome is partial (55.6% ICTR, 71.5% ICTY), demonstrating a less categorical approach or less problematic insider witnesses. Finally, we note the lack of witness competence issues, most of which are barely mentioned in the insider assessments. Overall, the ICC appears to be the most cognizant of competence concerns, with at least some attention paid to interview-related difficulties, memory issues, time lapse and trauma. With this backdrop, we move to the analysis of how all these factors relate to each other and the assessment outcomes.
Exploring the factors and the outcomes of international witness assessments
Figures 2–4 demonstrate the results of the MCA: a spatial distribution of the witness assessment factors and outcomes at the ICTY, ICTR and the ICC. 9 The colours denote whether an indicator is positive (green), negative (orange) or neutral (black), with purple representing the assessment outcomes. The closer the factors are to each other on the plots, the more often they appear together in the assessments, and the more closely related they are. Furthermore, in all our plots, Dimension 1 (horizontal) is dominant, thus the distances between the variables on the first dimension are more representative of the relationships in the dataset. 10

ICC multiple correspondence analysis plot.

ICTR multiple correspondence analysis plot.

ICTY multiple correspondence analysis plot.
First, the plots show a certain separation of positive and negative assessment indicators as well as of the assessment outcomes. In general, witnesses who receive positive assessments have different and contrasting characteristics from those with negative outcomes. The ICC figure appears to be the most spread out, though the horizontal spacing from the most negative indicators on the left (bias: relations, performance: weak) to the more positive ones on the right is still quite clear. Furthermore, regarding the partial outcomes, at the ICTY/ICTR they appear to be distinct from the positive or negative ones, demonstrating that there are identifiable differences between testimonies that are considered completely reliable/unreliable and those that are considered to be so only for certain sections. However, at the ICC, partial and positive outcomes appear closer in space and thus signal less distinction between the two. Finally, in all three figures we find defence and prosecution relatively far away from each other, indicating differences in assessments of prosecution and defence insiders.
Quality indicators: Consistency and corroboration matter
As seen in Figures 2–4, the MCA revealed relatively similar patterns between testimonial quality indicators and assessment outcomes at the ICTY/ICTR, and a somewhat different approach at the ICC. However, the indicators clustered together are similar at all three ICCTs, signalling relatively similar patterns in the quality factors associated with positive, negative or partial outcomes. 11
Regarding the negative outcomes, the MCA plots and bivariate correlation analyses show that insiders are commonly found non-credible and unreliable owing to inconsistencies between their statements 12 and to the availability of contradictory evidence (ICC, ICTR). 13 Inconsistencies and contradictions are also the most frequently mentioned factors in the testimonial assessments. As expected, contradiction is most prevalent at the ICTY (69.78% of all assessments), which had the most extensive collection of documentary and forensic evidence to check the testimonies against, as well as the highest number of witnesses to provide contradicting evidence (Chlevickaite et al., 2019). The ICC assessments also frequently refer to contradiction (56.00%). However, the most commonly mentioned factor at the ICC is inconsistencies with prior statements (62.00%). At the ICTR, a lack of consistency or contradiction by other evidence were found in about a third of testimonies (34.36%; 30.96%). Surprisingly, the assessments do not frequently refer to a lack of corroboration (ICTY: 7.32%; ICTR: 24.64%; ICC: 19.00%), which has a significant relationship with negative outcomes only at the ICTR. 14 As seen in the ICTY and ICC MCA plots, the uncorroborated category is positioned in between the positive, negative and partial outcomes, indicating that this factor can be found in all three assessments.
Insiders receiving a positive evaluation demonstrate a cluster of quality-related factors that largely overlap across the tribunals (see the green circles in Figures 2–4). Insiders who demonstrate high levels of knowledge, provide plausible, coherent and detailed accounts and are strong on the stand appear to have the most positive outcomes at all three ICCTs, though some variations are present. The ICC judges more frequently mention the positive aspects of the testimonies, though that does not automatically mean correlation with the outcomes. As seen in Figure 2, the witnesses positively evaluated in the ICC sample are less distinct and may overlap with the partial ones. When it comes to the ICTY and ICTR, both the MCA plots and correlation analyses show a clearer picture. Although positive factors are mentioned less frequently here than at the ICC, a number of them are significantly correlated with positive assessment outcomes: high knowledge, high coherence, strong performance, plausibility and corroboration. 15
Objectivity indicators: Cross-institutional (in-)consistencies
In contrast to the relative similarities in the treatment of quality indicators. we observe more cross-institutional differences in the patterns of objectivity indicators. First, based on the MCA, objectivity indicators appear to be more closely associated with the outcomes at the ICTY and the ICTR compared with the ICC. Second, the most common objectivity issues differ between the institutions. Whereas self-interest is by far the most common objectivity factor at the ICTY (29.00% of the assessments) and the ICTR (50.08%), it is not frequent at the ICC (17.00%). Both indications of self-interest and a lack of it are significantly related to assessment outcomes at the ICTY/ICTR, 16 as is also visible in Figures 3–4. Finally, the analysis reveals that demeanour is an important indicator of a witness’s truthfulness, or a lack of it. The MCA and bivariate correlations reveal that demeanour, both positive and negative, is related to assessment outcomes at all three institutions, 17 but it is especially prevalent at the ICC. Here, a negative impression of the witness’s behaviour on the stand is both the most common objectivity factor and the one with the strongest relation to negative assessment outcomes. 18 The second most common factor significantly related to negative outcomes at the ICC is contamination, 19 which is rarely present in the dataset of the ICTY (2.65%) and the ICTR (4.54%).
Another noteworthy finding is that several objectivity indicators tend to co-occur with certain quality factors. For instance, as we have already discussed, demeanour does not appear to be a useful indicator of whether a witness is telling the truth and thus whether the testimony should be relied upon. However, we find negative demeanour very close to implausible (ICTY/ICTR) or low coherence, contradicted, inconsistent (ICC) testimony, and positive demeanour close to high coherence, no self-interest (ICTY) indicators. This points to the possibility that the judicial decision-makers are influenced by the subjective impression of the witness’s behaviour on the stand, which carries over into the assessment of the quality of the testimony, or the other way around: poor quality of the testimony may affect the appraisal of a witness’s behaviour on the stand. Moreover, indicators of knowledge commonly co-occur with evaluating testimony as corroborated or high quality (ICTR, ICC), which shows the interactions between the external and internal quality factors.
Prosecution and defence – same standards, different outcomes?
In order to explore whether the approaches towards insiders for the prosecution and insiders for the defence are similar, we included the calling party in the MCA. As mentioned before, all three MCA plots show the variables defence/prosecution on opposing sides of the axes, prosecution falling closer than defence to positive assessment indicators.
Looking at the numbers, prosecution (OTP) witnesses are indeed assessed positively more often at all three institutions, as demonstrated in Figure 5. We see that the difference is most pronounced at the ICTY, where OTP insiders were assessed positively four times more frequently compared with those for the defence (DEF). The difference is smaller at the ICTR and the ICC, where OTP witnesses are found credible and reliable 1.3 to 1.6 times more often than those for the defence. In terms of negative outcomes, the difference is the largest at the ICTR, where OTP insiders receive a negative determination twice as often as the DEF ones. For the ICC, the ratio is around 1.3, and at the ICTY negative outcomes are similar in relation to both parties. Overall, the assessments at the ICC appear to be the most balanced and independent from the party that is calling the witness.

Insider assessment outcomes per calling party: ICTR, ICTY, ICC.
Seeing the relatively high numbers of prosecution insiders found wholly credible and reliable by the ICTY judges in comparison with defence insiders may raise concerns about fair and equal treatment, although our calculations cannot show the factual quality of the witnesses presented. Considering the factors mentioned in relation to these outcomes, we find that insiders called by the defence at the ICTY are more frequently contradicted by other evidence (77.53% DEF, 56.90% OTP), which could be related to the fact the prosecution carries the burden of proving the case and thus collects more evidence than the defence. However, we also find that the ICTY judges are more often concerned with the behaviour of defence witnesses (33.1% DEF, 9.0% OTP), which is a more subjective factor. Comparing that with the ICTR, where OTP witnesses are more frequently dismissed as non-credible and unreliable, we find more contradictions by other evidence (26.07% DEF, 35.67% OTP) and inconsistencies (17.82% DEF, 50.32% OTP), which are serious quality deficiencies. We find no differences of such magnitude in the truthfulness factors. Thus, at the two institutions where differences exist between the approach to the prosecution witnesses and to the defence witnesses, it seems that the negative outcomes correspond to the number of negative factors cited by the judges, such as contradictions with other evidence or inconsistencies.
Discussion and conclusions
Determination of the truthfulness and accuracy of witnesses in criminal proceedings is by definition a balancing act, a result of an interplay of various factors relating to each individual witness, assessed in the context of the evidentiary record as a whole. With this study, we set out to uncover precisely which factors the judges of three major criminal courts and tribunals – the ICTY, ICTR and ICC – mention in determining the reliability and trustworthiness of insider witnesses, and how the factors relate to the outcomes of judicial assessments. Our findings highlight the significance of the issue: close to one-third of insiders called by the ICTR and almost half of those called by the ICC were deemed seriously lacking in credibility and reliability and thus were either dismissed, used in a limited manner or used only if corroborated. This high rate counteracts the assumption held by some observers that ‘international courts are often willing to give international witnesses the benefit of the doubt with respect to meeting relevance and credibility requirements’ (Fyfe, 2018: 163), at least as far as it relates to insider witnesses.
The overall practice of assessing insider witnesses at the three ICCTs emerges as broadly consistent, especially as regards the negative assessments. Negative factors are mentioned more frequently, and have more significant relationships with the assessment outcomes, than do positive factors. This is found both for objectivity and for quality indicators. In all three plots, the negative outcomes are closely clustered together with the negative witness and testimonial characteristics, in contrast to the positive and partial outcomes. Further, factors that could be seen as two sides of the same coin (for example, positive and negative demeanour) do not seem to have mirror effects on the outcomes. It appears that positive outcomes are to an extent defined by the lack of negative factors, rather than by an abundance of positive ones. As such, the judges seem to have a stronger understanding of an insider witness who is ‘beyond belief’, but less of a clear-cut approach towards making a positive determination.
Besides this, it is clear that judicial decision-makers value external validation above internal quality and focus on verifiable aspects, such as (lack of) corroboration, contradiction by other evidence or inconsistencies with prior statements. This approach is consistent with holistic means of evidence assessment (Byrne 2007: 637–8; McDermott 2017: 688) and with prior research findings on the prominence of inconsistencies in witness assessments (Combs, 2017). The variations in the extent to which the chambers engaged in external validation may also depend on the different context of the institutions, not all of which had significant amounts of evidence to compare insider testimonies against. We find contradiction by other evidence has a more prominent role at the ICTY/ICC than at the ICTR. This could be the result of the ICTR’s lack of access to documentary or an otherwise diverse evidence base, compared with the ICTY or the ICC (Combs, 2018: 235). Moreover, we note the comparatively lower prominence of objectivity concerns as compared with quality factors, especially at the ICC. This finding confirms prior research on the topic showing that the content of the message is more important than the credibility of the source, and that positive indicators of quality may outweigh potential truthfulness concerns (Chlevickaite and Hola, 2016: 698; Mondak, 1990). In other words, where the judges decide to rely on a particular insider, in whole or in part, it is mainly because of the quality of the contents of the testimony itself, and not the (lack of) objectivity, which is somewhat in contrast to the cautious approach proclaimed by the ICCTs. This finding indicates that judges follow the logic of legal fact-finding: in situations where witness testimony presents a content-based concern (for example, inconsistency or contradiction), the assessor is prompted to consider whether the reasons behind this issue are honest or malicious. Namely, is the witness honestly mistaken, are they confused or are they misleading the court, since this may have implications for how the rest of the testimony is appraised. In such a situation, the decision-maker may be aided by the presence or absence of objectivity concerns: are there any known reasons for the witness to be motivated to lie? This question is less likely to arise where no quality issues are detected and therefore is useful in explaining the primacy of quality indicators found in the analysis.
The most unexpected findings, in contrast to both previous research on the topic and the witness assessment framework set out by the judges, concern competence indicators. First and foremost, they were mentioned with such low frequency that it was difficult to account for them in the calculations overall. Hence, although international criminal justice researchers highlight the potential for cross-cultural mistakes and issues with trauma and witness memory (Fyfe, 2018; Zahar, 2010), or even claim that serious testimonial deficiencies or lying are attributed ‘to innocent causes that do not impact the witness’s credibility’ (Combs, 2010: 189), it appears that such excuses are disregarded in relation to insiders. This relative disdain towards competence factors demonstrates a lack of attention to witnesses’ background and experiences and a rather narrow focus on obtaining information useful for ascertaining the facts of the case. Our findings confirm the view that trauma in international criminal proceedings ‘is an experience that belongs to victims’ (Mohamed, 2015: 1157) or, at most, to child soldiers (Mohamed, 2015: 1189–90). We also see that, where we find mentions of trauma at the ICC, it is exclusively related to former child soldiers appearing as witnesses. Further, not only trauma, but concerns about time lapse, language or even memory barely appear in the judgments in relation to insiders. Clearly, these issues, if present for crimes-based witnesses, are unlikely to be completely inconsequential for insiders – individuals coming from the same backgrounds and talking about events from the same time period. Ignoring competence-related issues undermines the fact that, when it comes to evidence assessment, this may hinder the understanding of the credibility of the witness, or explain certain shortcomings or omissions in their testimony.
Our findings contain several areas of concern for the practice of international criminal law. First, they underline the need for more rigorous external validation, thorough examination of the confirmatory but, even more so, of the contradictory evidence, and utmost prudence in re-interviewing witnesses, as inconsistencies are still a major impediment to testimonial reliability in the eyes of the judges. We also need to acknowledge that the root of this problem might be double sided. On the one hand, it could be that investigative and prosecutorial strategies, mentioned above, need improving – to focus extensively on training and developing procedures to reduce re-interviewing and, insofar as possible, to introduce transcript instead of summary statements. Furthermore, training and procedures to avoid tunnel vision/confirmation bias (de-biasing techniques) could contribute to more rigorous examination of contradictory evidence. On the other hand, it could be that the judges’ approach is overly scrupulous and does not permit inconsistencies that are a natural product of human memory (Lacy and Stark, 2013), though some research indicates that incidences of inconsistency are serious (Combs, 2018). Second, the findings uncovered a serious gap in taking witness competence factors into consideration. Again, the reasons for this may be manifold – inter alia, the (adversarial) courtroom style, a lack of training on the vulnerability of witnesses on the part of the lawyers (who then do not demonstrate its significance in court), or a lack of training of judges. A better understanding, through appropriate training, of the role that witness competence plays in recall and communications could be another step towards improving the assessment of insider witnesses.
This research constitutes the first systematic analysis and cross-institutional comparison of insider witness assessment practice at an international level. We acknowledge the limitations inherent in our research. First, we are relying on the ‘reasoned opinions’ of the trial chambers and, as such, we can assess only what the judges say they do. This approach cannot uncover the biases inherent in the decision-making process or factors that are not explicitly mentioned in the judgments. Further, some aspects of witness assessments, especially factors related to a witness’s own criminal involvement in the events or their relationships with the accused or other witnesses, are sometimes redacted for security purposes and thus omitted from the analysis. However, regarding both of these issues we expect the trial judgments to be true to their purpose and provide a reasoned, transparent opinion to inform both the public and the potential appeals proceedings. Moreover, quantitative analysis detaches the individual factors mentioned in relation to witness testimonies from the broader setting of the judicial argumentation in a specific case. A more narrow, in-depth, qualitative research into judicial narrative construction as it relates to fact-finding has the potential to add another layer of explanations to our findings. Finally, we did not enquire into individual or type-based differences among insider assessments, nor did we look into who the insider witnesses under examination were. A deeper analysis of the characteristics of insider witnesses (based on their authority and position within the organization or their role, for example) across the tribunals could provide a more nuanced understanding of the patterns observed in their assessments.
Despite these limitations, we have shown that the factors related to insider witness assessment outcomes are generally similar across the tribunals, and they are in line both with the approach set out by the judges and with findings from other research. Further studies that we are planning to conduct will involve an experimental approach to elicit the views of practitioners; this could be useful in explaining the findings and the ‘invisible’ side of witness assessments, as well as comparing the approach towards witness assessments between different organs of the courts: investigations, prosecutions and the judiciary.
Supplemental Material
sj-pdf-1-euc-10.1177_1477370821997343 – Supplemental material for Suspicious minds? Empirical analysis of insider witness assessments at the ICTY, ICTR and ICC
Supplemental material, sj-pdf-1-euc-10.1177_1477370821997343 for Suspicious minds? Empirical analysis of insider witness assessments at the ICTY, ICTR and ICC by Gabrielė Chlevickaitė, Barbora Holá and Catrien Bijleveld in European Journal of Criminology
Footnotes
Appendix
Contribution from assessment indicators to total inertia and to Dimensions 1–2.
| Overall inertia (percent) |
Dimension 1 (percent) |
Dimension 2 (percent) |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ICTR | ICTY | ICC | ICTR | ICTY | ICC | ICTR | ICTY | ICC | ||
| Objectivity | Bias: relationships | .010* | .039 | .038 | .000 | .013 | .043 | .016 | .157 | .000 |
| Character | .015 | .034 | .010* | .013 | .038 | .000 | .019 | .042 | .001 | |
| Contamination | .013 | .011* | .036 | .015 | .004 | .015 | .009 | .022 | .009 | |
| Demeanour negative | .011 | .049 | .011 | .001 | .050 | .001 | .025 | .073 | .025 | |
| Demeanour positive | .014 | .057 | .014 | .010 | .071 | .010 | .029 | .019 | .029 | |
| No self-interest | .021 | .033 | .016 | .031 | .034 | .015 | .018 | .014 | .024 | |
| Self-interest | .036 | .045 | .024 | .014 | .011 | .006 | .062 | .212 | .052 | |
| Competence | Competence | .048 | .022 | .038 | .095 | .028 | .052 | .000 | .017 | .005 |
| Quality | Corroborated | .049 | .057 | .040 | .100 | .086 | .057 | .012 | .001 | .024 |
| Contradicted | .029 | .026 | .020 | .000 | .030 | .003 | .066 | .020 | .064 | |
| Detail high | .033 | .015 | .027 | .057 | .016 | .030 | .022 | .001 | .033 | |
| Detail low | .018* | .011* | .012 | .000 | .006 | .001 | .011 | .003 | .010 | |
| Implausible | .023 | .034 | .016 | .001 | .026 | .001 | .054 | .064 | .054 | |
| Inconsistent | .079 | .028 | .066 | .108 | .000 | .087 | .069 | .137 | .044 | |
| Knowledge high | .038 | .025 | .030 | .068 | .034 | .038 | .015 | .003 | .026 | |
| Knowledge low | .018* | .016* | .014* | .029 | .000 | .002 | .006 | .013 | .007 | |
| Performance strong | .058 | .034 | .045 | .105 | .041 | .056 | .025 | .004 | .042 | |
| Performance weak | .040 | .023 | .031 | .010 | .002 | .019 | .083 | .113 | .073 | |
| Plausible | .030 | .025 | .023 | .050 | .026 | .025 | .021 | .003 | .031 | |
| Coherence high | .054 | .067 | .045 | .103 | .085 | .061 | .027 | .022 | .044 | |
| Coherence low | .029 | .021 | .022 | .024 | .011 | .022 | .045 | .054 | .035 | |
| Uncorroborated | .041 | .012 | .048 | .031 | .011 | .053 | .073 | .000 | .060 | |
| Party | Prosecution | .114 | .158 | .096 | .165 | .186 | .121 | .000 | .000 | .060 |
| Defence | .114 | .160 | .088 | .165 | .190 | .107 | .000 | .000 | .063 | |
Note: *indicates variables that are poorly represented in a given dimension (sqcorr < .5) and thus need to be interpreted with caution.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is funded by the NWO (Dutch Organisation for Scientific Research) Research Talent grant for Gabriele Chlevickaite, grant number 406.17.519.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
