Sage Journals: Discover world-class research

Abstract

Objective: The criminal justice system relies on the opinions of expert witness to assist in decisions about fitness to stand trial (FST) and verdicts of not guilty by reason of mental illness (NGMI). The aim of the present study was to assess the level of agreement between experts about these legal issues using a consecutive series of serious criminal matters in New South Wales.

Methods: Pairs of reports from 110 consecutive criminal matters completed by the New South Wales Office of the Director of Public Prosecutions between 2005 and 2007 were examined. The opinions of experts about FST and NGMI were recorded.

Results: Agreement about FST was fair–moderate (experts engaged by opposite sides, κ = 0.293; experts engaged by the same side, κ = 0.471), although there was a higher level of agreement in homicide matters. Agreement about NGMI was moderate–good (experts engaged by opposite sides, κ = 0.508; experts engaged by the same side, κ = 0.644) and there was a higher level of agreement when the experts also agreed about the diagnosis of schizophrenia. Further analysis using generalized estimating equations did not find a higher level of agreement about FST or NGMI in pairs of reports containing the opinion of experts from the same side.

Conclusions: Little evidence was found for bias in expert opinions about either FST or NGMI, but the comparatively low level of agreement about FST suggests the need for reform in the way that FST is assessed.

Keywords

agreement bias expert evidence fitness to stand trial not guilty due to mental illness

In most jurisdictions, criminal courts rely on evidence from mental health experts to determine whether a person is fit to stand trial (FST) or not guilty due to mental illness (NGMI) [1]. An incorrect decision about fitness (also referred to as competence or capacity) to stand trial can deny a defendant the opportunity to put their case, or conversely can result in an unfit defendant receiving an unfair trail. An incorrect finding about NGMI can result in a criminal conviction instead of appropriate hospital treatment, or allow a malingerer to escape an appropriate verdict.

FST and NGMI are legal issues to be decided by the courts, but in practice courts rely on the opinion of mental health experts [2–4]. Hence, in many jurisdictions expert witnesses are allowed to express opinions about the ‘ultimate issue’ before the court, of whether the accused is FST or might be NGMI.

There are only a few studies that have examined the reliability of the opinions of mental health professionals about FST or NGMI [5–7], and two of those used opinions about hypothetical cases [6, 7]. The lack of empirical studies is surprising because it has often been alleged that expert evidence is biased [8–12], and in many jurisdictions the opinions of experts are presented in the form of written reports that can be readily compared [13, 14].

The accusation that opinions about FST are biased may have stemmed from the finding of an association between demographic variables and FST [15], although subsequent studies found that clinical factors, rather than sex or race, were the main predictors of opinion about FST [2, 16]. None of these studies went on to assess the level of agreement between the experts. Skeem et al. found that agreement between experts about FST in Utah was satisfactory (κ = 0.64) but did not report if there was evidence of bias arising from whether the expert was engaged by the defence or by the prosecution [5].

Although the level of capacity required to be considered fit to participate in a trial is unlikely to vary much between English-language jurisdictions, there are differences in the way FST is determined. In some North American jurisdictions and in the UK semi-structured interviews have been introduced in an attempt to overcome a perceived lack of reliability in opinions about FST [17–19]. These instruments, however, may not be able to capture all the relevant clinical information and do not address variations in capacity required to participate in proceedings with varying levels of complexity, and hence are unlikely to replace an expert's opinion [20]. Semi-structured interviews for the assessment of NGMI have not been developed, probably because the effect of delusional beliefs on moral reasoning is difficult to quantify. Hence in every jurisdiction the opinion of experts is considered when making decisions about NGMI.

In Australia FST is assessed according to the common law standard in the Victorian case of R v Presser, which describes what an accused needs to understand in order to be tried [21], and R v Kesavarajah, in which the High Court of Australia determined that a person must remain fit for the duration of the trial [22]. Assessments for fitness using the Presser criteria examine the defendant's knowledge of the procedure followed in adversarial trials, intellectual function and communication skills. FST is often assessed some months prior to a trial, which requires the expert to make a prediction about the defendant's mental performance in the future.

By contrast, an opinion about whether an accused is NGMI is based on a retrospective assessment of a person's state of mind at the time of an offence and is usually assisted by documents such as medical records and contemporaneous descriptions of the person's behaviour. In most English-language jurisdictions the way that mental illness defence is defined is derived from the 19th century English case of R v McNaghten, which turns on the extent to which a psychotic illness affected the person's ability to understand that their actions were wrong [23]. In Australia the McNaghton rules were extended by the decision of R v Porter, which considered the effect of disorganized thinking on moral reasoning [24]. Hence in common law, or in the criminal codes of some States of Australia, a defence of mental illness is based on (i) the diagnosis of a severe form of mental illness that (ii) resulted in a false belief or grossly disorganized thinking, which (iii) prevented the person from understanding that their behaviour was morally wrong.

In the present study we examined the level of agreement in expert opinions about FST and NGMI using a consecutive series of criminal matters in which there were written reports by two or more mental health experts. We developed three a priori hypotheses.

The first hypothesis was that agreement about FST and NGMI would be greater in pairs of experts engaged by the same side when compared to experts from opposite sides. A higher level of agreement about FST or NGMI between experts engaged by the same sides would suggest that the role of experts in adversarial proceedings was a source of bias.

The second hypothesis was that there would be an association between agreement about FST and agreement about the diagnosis of disorders that often affect FST. If confirmed, this would suggest that opinions about FST were associated with clinical factors.

The third hypothesis was that there would be an association between agreement about NGMI and pairs of reports in which there was also agreement about the diagnosis of schizophrenia. If confirmed, this would suggest that agreement about NGMI was associated with the diagnosis of serious mental illness.

Methods

Sample of reports

Copies of reports from a consecutive series of 110 criminal cases concluded between 2005 and 2007, in which there were two or more reports written by psychiatrists or psychologists, were made available by the Office of the Director of Public Prosecutions (ODPP) in New South Wales (NSW), Australia. There were a total of 270 reports, 226 by 30 psychiatrists and 44 by 15 psychologists. Defence experts wrote 148 reports and prosecution experts wrote 122 of the reports.

The 110 cases all involved serious offences that were dealt with in the higher courts. They included 30 charges of murder or attempted murder (referred to as homicide offences), 35 charges of wounding or serious assault, 14 charges of sexual assault, 12 serious property offences, 10 drug cases and the remaining eight cases included fraud, kidnapping, arson and firearms offences. Permission to perform the study was obtained from the Justice Health Research and Ethics Committee and the NSW Director of Public Prosecutions.

Data collection

The data points collected were: (i) whether the expert was engaged by the defence or prosecution; (ii) whether the expert was a psychiatrist or a psychologist; (iii) the defendants’ age, gender, marital status, occupation and criminal convictions; (iv) whether the charge was of a homicide offence; (v) whether the main diagnosis was intellectual disability, acquired brain injury or schizophrenia spectrum psychosis (defined as schizophrenia, schizophreniform disorder, schizoaffective disorder, delusional disorder or psychotic disorder not otherwise specified); and (vi) the expert's opinion about FST and NGMI.

All three authors independently rated reports from 28 cases, with one disagreement (about NGMI in a case in which automatism was raised). Each of the remaining 82 cases were rated by two authors, with no disagreements.

Statistical analysis

Analysis of agreement between three or more raters presents a methodological dilemma arising from the number of degrees of statistical freedom [25]. For example, when a defendant is assessed by three experts (A, B and C) there are three pairs (AB, AC and BC). If two pairs of reports agree then the third pair must also agree.

There are statistical techniques to allow for multiple raters, but in the present study there were varying numbers of assessors per case. Solutions that were considered included examining only some of the reports in each case or excluding cases with three or more reports [26, 27]. The arbitrary omission of cases or report pairs, however, could alter the level of agreement. Uncertainty about the effect of degrees of statistical freedom prevented an assessment of the overall level of agreement about FST and NGMI in the present study, but did not effect the examination of report pairs from opposing sides. For example, if report writers A and B are both engaged by the defendant and the prosecution engages C, the pairs AC and BC are independent because AB is omitted. A lack of statistical independence does occur in a few cases with three reports by experts from the same side, causing an increase in overall agreement [27], and a second analysis of agreement was performed after the omission of the third report pair (based on the date of the interview) in those cases.

K statistics [28] were used as an initial measure the level of agreement about FST and NGMI. The level of agreement defined by kappa has been classified as poor, <0.2; fair, 0.2–0.4; moderate, 0.4–0.6; good, 0.6–0.8; and very good, 0.8–1.0 [29].

The proportions of the experts providing an opinion about FST or NGMI were examined using χ² test. Definitive conclusions about the tendency of experts to make opinions about NGMI or FST, however, cannot be drawn using this method, or from the kappa analysis because some offenders were seen by multiple experts from the same side.

Univariate and multivariate generalized estimating equations (GEEs) were used to allow the inclusion of clustered data (in this case clustered by case) with independent variables that lack statistical independence. The defendant was used as the subject variable and the report pair was the within-subject variable. GEEs were used to assess the association between report pairs with the same expert role and the dependant variables of agreement about FST or NGMI. Demographic variables, expert's profession, defendant's criminal record and presence of homicide charge were controlled for in a multivariate GEE main effects model if they had a significance of below p < 0.1 determined by univariate GEE. Computational options included a logit link function, binomial probability distribution and an unstructured correlation matrix.

A power analysis was performed prospectively to estimate the sample size needed for the GEEs [30]. We assumed that (i) one-third of report pairs would disagree about FST and one-third would disagree about NGMI; (ii) two-thirds of report pairs that agreed about FST and two-thirds of report pairs that agreed about NGMI would contain report pairs written by experts from the same side; (iii) one-third of report pairs that disagreed about FST and one-third of report pairs that disagreed about NGMI would contain report pairs written by experts from the opposite sides. This determined that the study would require 26 report pairs with disagreement and 79 pairs with agreement to have an 80% chance of finding an association between disagreement or agreement and the roles of the experts at a probability of 0.05.

The statistics were performed using SPSS for Windows version 15.0 (SPSS, Chicago, IL, USA).

Results

Opinions regarding FST

An opinion about FST was found in 198 reports about 82 defendants. In 54 cases there were two reports, in 22 cases there were three, and in six cases there were four reports. Hence there were 156 report pairs (54+[22×3]+[6×6]).

Defence experts were less likely to find that the accused was FST (58%, 59/101 reports) than prosecution experts (77%, 75/97 reports, χ²=8.08, p = 0.004). There were four pairs of reports in which prosecution experts found a defendant to be unfit for trial after the defence expert had found the defendant to be FST, and 10 pairs of reports in which the defence expert found that the defendant was not FST but the prosecution expert determined that they were FST. Although these results appear to suggest that defence experts were more likely to find the defendant FST, they might be explained by the opinions about defendants who were assessed by more than one expert from the same side.

Agreement about FST in the reports pairs from opposite sides was fair and agreement in reports written by experts from the same side was moderate. Kappas for experts from the same side were modestly increased by four cases with three expert opinions from the same side (Table 1). GEE indicated that experts from the same side were not more likely to agree about FST than experts from opposite sides. There was a higher level of agreement about FST, in homicide matters and in report pairs written by experts from the same profession (Table 2), and a multivariate GEE found that the only factor independently associated with agreement about FST was presence of homicide charge.

Table 1.

Agreement between experts from the same and opposite adversarial sides about FST and availability of an NGMI defence

	N	Agree	Agree absent	Disagree	κ (95%CI)
Same side FST†	48	20	14	14	0.471 (0.142–0.690)
Opposite side FST	108	61	14	33	0.293 (0.134–0.451)
Same side NGMI‡	35	11	18	6	0.644 (0.315–0.973)
Opposite side NGMI	80	30	30	20	0.508 (0.295–0.720)

CI, confidence interval; FTS, fitness to stand trial; NGMI, not guilty by reason of mental illness. †Kappa 0.352 (95%CI = 0.056–0.647) after the removal of a report pair from four cases with three pair; ‡kappa 0.577 (95%CI = 0.225–0.929) after the removal of a report pairs from four cases with three pairs

Table 2.

GEE analysis of factors associated with agreement about FST

Factor	B	SE	95% Wald CI		Hypothesis test
			Lower	Upper	Wald χ²	df	p
Univariate
Age	0.000	0.0025	−0.005	0.005	0.001	1	0.970
Male	−0.191	0.0942	−0.376	−0.006	4.116	1	0.042
Employed	−0.067	0.0923	−0.248	0.114	0.532	1	0.466
Married	0.063	0.0997	−0.133	0.258	0.397	1	0.529
Prior Convictions	−0.021	0.0931	−0.203	0.162	0.050	1	0.823
Homicide matter	0.394	0.0619	0.273	0.516	40.553	1	0.000
Same adversarial role	0.021	0.0791	−0.134	0.176	0.070	1	0.971
Same profession	0.241	0.0748	0.095	0.388	10.402	1	0.001
Schizophrenia spectrum psychosis	−0.081	0.1352	−0.346	0.184	0.358	1	0.550
Acquired brain injury	−0.167	0.0978	−0.358	0.025	2.906	1	0.088
Intellectual disability	0.375	0.1275	0.125	0.625	8.646	1	0.003

Multivariate
intercept	0.923	0.573	0.811	1.036	259.391	1	0.000
Same adversarial role	0.058	0.0794	−0.098	0.213	0.530	1	0.467
Male	−0.068	0.0578	−0.181	0.045	1.381	1	0.240
Homicide	0.327	0.0723	0.185	0.468	20.430	1	0.000
Same profession	0.112	0.0735	−0.032	0.256	2.340	1	0.126
Acquired brain injury	−0.095	0.0762	−0.244	0.054	1.562	1	0.211
Intellectual disability	0.216	0.1327	−0.044	0.477	2.659	1	0.103

CI, confidence interval; FST, fitness to stand trial; GEE, generalized estimating equation.

Opinions regarding NGMI

An opinion about NGMI was found in 146 reports about 61 defendants. There were 43 cases with two reports, 12 cases with three reports, and six cases in which there were four reports, a total of 115 report pairs, (43+[12×3]+[6×6]).

Defence experts found that the accused was NGMI in 39 of 79 reports (49%) and prosecution experts concluded that the defendant was NGMI in 38 of 67 reports (57%, χ²=0.789, p = 0.375).

Agreement about NGMI between experts from opposite sides was moderate, and there was good agreement between experts from the same side. Kappas for experts from the same side were increased by four cases with three expert opinions from the same side (Table 1). A GEE showed that experts from the same side were not more likely to agree about NGMI, and agreement about the diagnosis of schizophrenia-related psychosis was the only factor that predicted agreement between experts about NGMI (Table 3).

Table 3.

GEE analysis of factors associated with agreement about NGMI

Factor	B	SE	95% Wald CI		Hypothesis test
			Lower	Upper	Wald χ²	Df	p
Univariate
Age	0.003	0.0022	−0.002	0.007	1.606	1	0.205
Male gender	0.193	0.1212	−0.045	0.431	2.537	1	0.111
Employed	−0.116	0.0933	−0.067	0.299	1.542	1	0.214
Married	−0.085	0.1163	−0.313	0.143	0.530	1	0.466
Prior convictions	0.079	0.0959	−0.109	0.267	0.685	1	0.408
Homicide matter	−0.265	0.1445	−0.549	0.018	3.337	1	0.066
Same adversarial role	0.086	0.0863	−0.083	0.256	1.005	1	0.361
Same profession	−0.136	0.0757	−0.254	0.013	3.206	1	0.073
Schizophrenia spectrum psychosis	0.507	0.1654	0.182	0.831	9.382	1	0.002

Multivariate
Intercept	0.718	0.1247	0.473	0.962	33.119	1	0.000
Same adversarial role	0.050	0.0732	−0.093	0.184	0.474	1	0.491
Schizophrenia spectrum psychosis	0.460	0.1625	0.142	0.779	8.020	1	0.005
Homicide matter	−0.125	0.1308	−0.382	0.131	0.918	1	0.338
Same profession	−0.033	0.0799	−0.189	0.124	0.166	1	0.583

CI, confidence interval; FST, fitness to stand trial; NGMI, not guilty by reason of mental illness.

Discussion

The main findings were as follows: (i) agreement about FST was fair or moderate and agreement about NGMI was moderate or good; (ii) there was little evidence of bias arising from the expert's adversarial role; (iii) agreement about FST was higher in homicide matters; and (iv) agreement about NGMI was higher when there was agreement about the diagnosis of schizophrenia-spectrum psychosis

Although the kappa tests suggested that experts engaged by the same side were more likely to agree about FST and NGMI than experts from opposite sides, the kappas fell well within the 95% confidence intervals. A further analysis using a GEE found that the adversarial role of the experts was not a significant factor determining agreement about FST or NGMI, and being engaged by the same side was a weak predictor of agreement about FST and NGMI, with B coefficients well below 0.1. Hence, the negative finding was unlikely to be a type II error and a larger study would be unlikely to find evidence of significant bias in opinion about FST or NGMI.

The non-significantly increased agreement in experts from the same side found in the kappa and GEE analysis might be explained by some experts tending to agree with earlier experts from the same side in pairs from cases with three or more reports This raises a limitation of the present study, which was that the experts were aware of the opinions of previous experts and the prosecution usually ordered a report only after being served with a report by the defence. The lack of blinding to the opinion of other raters is an unavoidable consequence of the use of real, rather than hypothetical cases. It did not seem to affect the opinions of experts in a study of reports prepared for civil matters, however, in which there was a low level of agreement between experts [27]. A second limitation of the present observational study is that some defence reports may not have been served on the prosecution. Reports withheld by the defence, however, would be more likely to have agreed with prosecution reports and hence the inclusion of those reports would probably have lowered agreement between experts from the same side and increased agreement between experts from opposite sides.

The second hypothesis was partially confirmed because report pairs in which there was agreement about the presence of intellectual disability were more likely to agree about FST. Agreement about the diagnosis of schizophrenia-spectrum psychosis and acquired brain injury, however, was not associated with agreement about FST. In addition, the association between agreement about the diagnosis of intellectual disability and FST was not independent of the association between homicide and FST.

The third hypothesis was confirmed, because agreement about NGMI was associated with agreement about the diagnosis of schizophrenia-spectrum psychosis. The higher level of agreement about FST in homicide matters could have been due to the use of more experienced experts in those matters. The 14 experts who wrote reports in the homicide matters wrote an average of 13 of reports in the present study, whereas the remaining 31 experts wrote an average of fewer than three reports each. Furthermore, the experts who provided fewer than three reports were less likely to refer to the Presser criteria, suggesting that they were less familiar with the legal test for FST.

The lower level of agreement about FST compared with opinions about NGMI may be due to the unsatisfactory nature of the Presser criteria to determine FST, which emphasize the defendant's legal knowledge and do not attempt to set a uniform threshold for FST. Moreover, assessments of FST are often performed many months apart, and disagreement about FST could be due to fluctuation in the patient's condition or the difficulty experienced by the expert in predicting the person's mental state at some future date.

Conclusions

After taking into consideration the limitations arising from an observational study of expert opinion, we found little evidence of bias in expert opinions about FST or NGMI. The high level of agreement about NGMI suggests that although the legal test seems to have little relationship to modern concepts of mental illness, the standard is well understood by mental health experts and is applied in a reliable way. The level of agreement about FST, however, was only modest and suggested the need for reform to the procedure for assessing FST, both in the criteria to be applied and the way in which the assessments are performed.

The present findings demonstrate the value of using quantitative methods to establish the extent to which courts in similar jurisdictions can rely on expert opinion.

Footnotes

Acknowledgements

We would like to thank the New South Wales Director of Public Prosecutions, Mr Nicholas Cowdrey QC, for permission to conduct the study, Mr Craig Hyland at the ODPP, for his assistance in collating the reports, and Dr Peter Arnold for his editorial assistance in the preparation of this paper. Dr Nielssen has provided expert opinion for the prosecution and defence in criminal cases in NSW.

References

1. Stone

. Mental health and law: a system in transition. National Institute of Mental Health, Rockville, MD 1976.

2. Hart

Hare

. Predicting fitness to stand trial: the relative power of demographic, criminal, and clinical variables. Forensic Rep 1992; 5: 53–65.

3. Reich

Tookey

. Disagreements between court and psychiatrist on competency to stand trial. J Clin Psychiatry 1986; 47: 29–30.

4. Williams

Miller

. The processing and disposition of incompetent mentally ill offenders Law Hum Behav 1981; 5: 245–261.

5. Skeem

Golding

Cohn

Berge

. Logic and reliability of evaluations of competence to stand trial. Law Hum Behav 1998; 22: 519–547.

6. Beckham

Annis

Gustafson

. Decision making and examiner bias in forensic expert recommendations for not guilty by reason of insanity. Law Hum Behav 1989; 13: 79–87.

7. Plotnick

Porter

Bagby

. Is there bias in the evaluation of fitness to stand trial?. Int J Law Psychiatry 1998; 21: 291–304.

8. Dattilio

Commons

Adams

Gutheil

Sadoff

. A pilot Rasch scaling of lawyers’ perceptions of expert bias. J Am Acad Psychiatry Law 2006; 34: 482–491.

9. Freckelton

Reddy

Selby

. Australian judicial perspectives on expert evidence: an empirical study. Australian Institute of Judicial Administration, Melbourne 1999.

10.

10. Faust

Ziskin

. The expert witness in psychology and psychiatry. Science 1988; 241: 31–35.

11.

11. Beck

. The hired gun expert witness. Mo Med 1994; 91: 179–182.

12.

12. Mossman

. “Hired guns,” “whores,” and “prostitutes”: case law references to clinicians of ill repute. J Am Acad Psychiatry Law 1999; 27: 414–425.

13.

13. Steadman

. Beating a rap: defendants found incompetent to stand trial. Chicago University Press, Chicago, IL 1979; 1979.

14.

14. Gudjonsson

. Results from the 1995 survey: psychological evidence in court. Psychologist 1996; 9: 213–217.

15.

15. Rogers

Gillis

McMain

Dickens

. Fitness evaluations: a retrospective study of clinical, criminal and sociodemographic characteristics. Can J Behav Sci 1988; 20: 192–200.

16.

16. Cooper

Zapf

. Predictor variables in competency to stand trial decisions. Law Hum Behav 2003; 27: 423–436.

17.

17. McDonald

Nussbaum

Bagby

. Reliability, validity and utility of the Fitness Interview Test. Can J Psychiatry 1991; 36: 480–484.

18.

18. Pinals

Tillbrook

Mumley

. Practical application of the MacArthur competence assessment tool-criminal adjudication (MacCAT-CA) in a public sector forensic setting. J Am Acad Psychiatry Law 2006; 34: 179–188.

19.

19. Akinkunmi

. The MacArthur Competence Assessment Tool–Fitness to Plead: a preliminary evaluation of a research instrument for assessing fitness to plead in England and Wales. J Am Acad Psychiatry Law 2002; 30: 476–482.

20.

20. Hoge

Bonnie

Poythress

Monahan

. The MacArthur competence assessment tool: criminal adjudication. Psychological Assessment Resources, Odessa, FL 1999.

21.

21. R v Presser. [1958], VR 45.

22.

22. R v Kesavarajah. [1994], 181 CLR 230.

23.

23. R v McNaghten. [1843], 10 CL & Fin. 200.

24.

24. R v Porter. [1933], 55 CLR 182.

25.

25. Fleiss

. Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76: 378–382.

26.

26. Large

Nielssen

. An audit of medico-legal reports prepared for claims of psychiatric injury following motor vehicle accidents. Aust N Z J Psychiatry 2001; 35: 535–540.

27.

27. Large

Nielssen

. Factors associated with agreement between experts in evidence about psychiatric injury. J Am Acad Psychiatry Law 2008; 36: 515–521.

28.

28. Sim

Wright

. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Physical Ther 2005; 85: 257–268.

29.

29. Landis

Koch

. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159–174.

30.

30. Fleiss

. Statistical methods for rates and proportions2nd edn. John Wiley and Sons, New York 1981.