Abstract
For more than 100 years courts have accepted the concept of psychiatric injury following traumatic events such as transport accidents, injuries at work and violent crime [1]. The shortcomings of psychiatric opinions have been reviewed by Ziskin [2] and Shea [3] and their criticisms include the brevity of the assessments, the reliance on recollections rather than corroborated information, the motivations of the subject and the conscious and unconscious bias of the assessor. A recent survey of judicial views revealed concern about bias in most types of expert evidence [4]. Psychiatric and psychological evidence has been subjected to particular criticism in the medical defence literature [5] and the popular press [6].
The role of experts in the adversarial legal system isto ‘present inferences and conclusions from the facts which the judge and jury, for lack of specialised knowledge, cannot draw themselves’ [7]. In its recent practice directions for expert witnesses, the Federal Court of Australia has restated the previously expressed view that the first duty of an expert witness or a witness to fact is to the court, rather than to either party [8]. The duties of a treating practitioner include a duty to provide evidence if necessary [9], despite the possibility of ethical conflict arising from a treating practitioner being asked to give evidence about the condition of a patient [10].
The most serious criticism of experts in the adversarial system is that their opinion may be swayed by the financial rewards of writing opinions helpful to the referring solicitor. Media coverage of expert evidence has tended to equate differences in opinion with conscious bias. However, there are a number of factors other than conscious bias that may produce divergence in the opinions of treating practitioners (TP), plaintiffs' experts (PE) and defendants' experts (DE).
It is likely that the doctor–patient relationship influences the treating doctor to act as an advocate for their patient [3]. Patients may make informed choices about who would be a suitable practitioner to see for treatment, and hence who should be the author of any treating practitioners' reports. Once a patient is receiving treatment, the treating practitioner has duties to the patient that include duties of care, consent and confidentiality. In addition, treating practitioners may be less aware of the extent of their duty to the courts.
Experts and practitioners may also form different opinions because they are provided with different sets of documents, or are asked different questions by the instructing solicitors. Lawyers may select experts on the basis of opinions expressed in previous cases, as psychiatric reports are expensive, and reports prepared for the defendant are subject to discovery by the plaintiff. It is also possible that patients provide different information to a practitioner in whom they trust when compared with that provided to an expert who they may see only once or who they know has been engaged by a solicitor acting for either party.
It has been observed that the adversarial legal system relies on testing the opinions of experts in order to settle disputes [11]. Cases in which there are marked differences of expert opinion are more likely to go to trial, as claims in which the experts agree are often settled without litigation. Under the current rules, plaintiffs' reports that are unhelpful to them need not be served. Hence it would be expected that differences of opinion between practitioners acting in different roles would be produced by the adversarial system. There are very few published studies of the content of medico-legal reports. A review of publications in English found only one relevant study [12] in which Cornes and Aitken demonstrated major deficiencies in a high proportion of medico-legal reports. Their study only included five reports written by psychiatrists and did not address the influence of the role of the report writer on the opinions expressed.
The aim of this study was to determine whether differences in the content of reports were associated with the role of the practitioner preparing the report.
Method
Data collected
The archived documents from 559 sequentially examined personal injury claims following motor vehicle accidents in New South Wales between 1989 and 1994 were made available by a compulsory thirdparty insurer (NRMA Insurance). The study excluded routine clinical examinations and neuropsychological reports and sequence of 25 claims was missing.
The following data were collected from each report: (i) role of the writer (TP, PE or DE); (ii) delay between the accident and the date on the report in months; (iii) length of the report in pages; (iv) ratings from a novel 20-item scale (the RRS); (v) psychiatric diagnoses attributed to the accident; and (vi) the qualifications of the report writer.
The principal diagnosis was defined using a hierarchical system. Brain damage was selected over posttraumatic stress disorder (PTSD), PTSD over depressive disorders and depression over adjustment, somatoform and other anxiety disorders. Terms such as major depression, dysthymia, depression and reactive depression were all included as depressive disorders. Malingering was included in the categoryof ‘no diagnosis’ while grief was included with adjustment disorders.
The report rating scale
A report rating scale (RRS) was designed to assess the content of medico-legal reports written with respect to psychiatric injury following motor vehicle accidents. The items were compiled after the first author (ML) examined a separate sample of reports and attempted to define items that were present in more complete reports. The scale contained 22 items, which were scored 1 if present and 0 if absent. The raters were given the general instruction to score doubtful items as 1.
Item one was scored as present if the process of referral was clearly documented and item two was scored as present if the report contained any statement about ethical duties or consent to prepare the report. Items three to seven rated, respectively, the adequacy of the history of the accident, the injuries received, the past medical history, a history of previous accidents, and a history of previous psychiatric disorder and substance use.
Item eight related to the use of primary documentation such as police and ambulance reports and item nine to secondary documentation such as other expert opinions. Items 10–13 rated the presence of an account of the current social circumstances, a personal history, an account of the patient's psychological symptoms and the presence of a number of observations indicating an adequate mental state examination. Items 14–16 assessed the evidence for the psychiatric diagnosis used, the discussion of a differential diagnosis and whether the report writer demonstrated that the patient had met criteria set out in either the DSM or ICD classification system.
Items 17–19 related to the use of rating scales, the presence of an opinion regarding prognosis and treatment and the report writer's assessment of the veracity or corroboration of the patient's history. The last three items (20–22) assessed the presence of an estimate of the degree of impairment of functional capacity, an opinion regarding the cause of any psychiatric condition found to be present, and the overall balance and internal consistency of the report.
Steps were taken to minimize the effect of patient's history on the RSS score. For example, a report about a patient who could not recall the accident could obtain a score of 1 in item three relating to the history of the accident if the amnesia was documented. In items oneto 19 the information required in order to score the item as present was specified in some detail and in some cases with examples. Items 20–22 proved more difficult to define succinctly and were included without a specific definition.
A study of interrater reliability was performed using a selected sample 42 reports. In this study, the second rater was blinded to the report writer's identity and role in the claim.
The statistical analysis was performed using the Statistical Program for the Social Sciences (SPSS; SPSS, Chicago, IL, USA) [13].
Results
Interrater reliability
The interrater reliability was assessed for only 21 items, as item one was deleted to facilitate blind rating. The interrater reliability was estimated by the proportion of reports receiving the same rating for each item, and a kappa score (Table 1). Nineteen of the 21 items had kappa scores of greater than 0.20, defined by Landis and Koch [14] as indicating fair agreement, and 13 of these had a kappa of greater than 0.40 indicating moderate agreement. The two items relating to causation and balance were excluded from the scale because of a kappa value of less that 0.2. The sum of the remaining 19 items was calculated for each report. The correlation between the raters for the sum of 19 items was 0.77 (Spearman rank) indicating a high level of agreement about report completeness.
Interrater reliability of RRS items (42 reports)
227 reports
The 559 files included 129 claims that contained a report written by a psychologist or a psychiatrist. There were 227 reports (Table 2) completed by 113 practitioners. Eight experts acted as PE and DE, four acted as PE and TP, one acted as TP and DE, and one as a TP, PE and DE. Psychiatrists wrote 162 reports and psychologists wrote 65 reports. Fifty-four reports were written by psychologists with Masters degrees and 31 reports were written by nine psychologists or two psychiatrist holding doctorate qualifications.
Report characteristics, RRS scores and Diagnosis (227 reports)
Treating practitioners wrote significantly shorter reports (Table 2) than PEs while DEs wrote the longer reports. There was an average delay of greater than 2 years between the date of the accident and the writing of the reports for all groups but the delay was significantly longer for defendants' experts (Table 2).
RRS scores
There were significant differences in the ranked RRS scores received by TP, PE and DE groups which was mainly due to less complete TP reports. This finding was significant for both psychiatrist and psychologist reports although reports by psychologists in the role of TP received the lowest median RRS score and reports by psychologists acting as DE the highest (Table 2).
The percentage of reports for which an item received a score of 1 was calculated for each item in each group. There were significant differences between groups in 11 of the 20 items and eight of these were significant at p = 0.001 or less indicating the results were not false positives.
The differences were mainly due to TPs omitting important parts of the history, examination and opinion (Table 2), although not all TP reports received low scores, and not all experts' reports received high scores.
Diagnoses
Posttraumatic stress disorder, depression and additional diagnoses were made significantly more frequently by TPs and PEs than DEs. The most frequent category used by DEs was ‘no diagnosis’ (Table 2). These findings were significant with p-values well below 0.01. The odds ratios (OR) for a TP or PE making a diagnosis was 1.82 for PTSD, 2.64 for depression and 0.20 for ‘no diagnosis’ when compared with the DE group.
Measures to control for sample bias
In order to ascertain if the findings of a lower level of completeness by treating practitioners or the significant differences in the diagnoses used were a result of sampling bias, three subsamples of claimants who had seen two report writers were analysed.
The first subsample was selected in order to determine whether the difference in total RRS scores between the treating practitioners and experts was due to bias associated with patient selection. The RRS scores of 43 TP reports and 63 PE or DE reports about the same claimants were compared. In this subsample the median RRS score for the treating practitioners was six which was lower than the median total RRS score of 10 for the experts. This difference was significant using a Mann–Whitney U-test. (Z = −4.39, p = 0.000).
The second subsample examined 65 pairs of reports written by practitioners engaged by opposing sides in the adversarial system using a kappa score to assess the level of agreement about the principal diagnosis. The DE reports were paired with one of 23 TP or 42 PE reports.
The third subsample examined 37 pairs of reports written by practitioners engaged by the same side using the same method to assess the level of agreement about the principal diagnosis. The 37 pairs consisted of nine pairs of DE reports, 10 PE pairs, 12 mixed PE and TP pairs and six pairs of TP reports. Cases in which two reports could be potentially be matched with a third were paired by giving priority firstly to matching experts with experts and secondly to matching the profession of the author.
There was poor agreement about the category ‘no diagnosis’ between practitioners engaged by opposing sides (κ = 0.111, p = 0.144), but when the experts engaged by the same side were considered there was good agreement (κ = 0.604, p = 0.000). In contrast there was moderate agreement about brain damage between experts engaged both by opposing sides (κ = 0.373, p = 0.001) and the same side (κ = 0.473, p = 0.002). When the diagnosis of PTSD was considered, the agreement between experts from opposing sides was fair (κ = 0.254, p = 0.027), but no better if the experts were engaged by the same side (κ = 0.224, p = 0.173). The diagnosis of depression was rarely agreed on by experts from opposing sides (κ = 0.041, p = 0.715). In contrast, the kappa value for agreement about depressive disorders between experts from the same side was moderate (κ = 0.431, p = 0.004).
An analysis of the reliability of the diagnosis of depression and PTSD across adversarial roles was performed to control for the possible confounding effects of the use of the diagnostic hierarchy. When both the principal and any additional diagnoses of depression were considered, the kappa value for PTSD was κ = 0.309 (p = 0.01) and for depression was κ = 0.131 (p = 0.236) confirming a low level of agreement about both diagnoses. In the paired sample of practitioners engaged by opposite sides, the OR for TP or PE making a diagnosis was 1.8 for PTSD, 2.14 for depression and 0.15 for ‘no diagnosis’ which were similar to the ratios derived from the unmatched sample.
Analysis of role and qualifications
Multi-variate analysis (logistic regression) was used to determine if the influence of role on the RRS score and the use of the category ‘no diagnosis’ was due to differences in training. All the information that had been collected about the report writers' qualifications and role were entered in the form of five independent categorical variables. These were (i) expert witness (non-treating); (ii) defendant's expert; (iii) Masters qualification in psychology; (iv) psychiatrist; and (v) doctoral level degree.
Logistic regression was used to examine the influence of role and training on the probability of the report achieving an RRS score equal or greater than the median score of 10. Using this form of analysis only expert status (OR = 4.4, 95% CI = 1.9–10.0, p = 0.001) and holding a doctoral qualification (OR = 4.7, 95% CI = 1.7–13.1, p = 0.003) were associated with a greater probability of completing a report with an RRS score of 10 or more. The other factors: psychiatrist qualification (OR = 1.64, 95% CI = 0.45–6.0, p = 0.45), Masters degree in psychology (OR 0.67, 95% CI 0.15–2.9, p = 0.59) and DE (OR = 1.4, 95% CI = 0.78–2.7, p = 0.23), had no significant independent influence on the probability of the report scoring equal or above the median RRS score of 10.
A second logistic regression was used to examine the influence of role and training on the probability of the report using the category ‘no diagnosis’. Only defendant expert status (OR = 6.6, 95% CI = 2.9–15.0, p = 0.000) was associated with a greater probability of using this category. The other factors; psychiatrist qualification (OR = 1.9, 95% CI = 0.22–17.2, p = 0.55), a Masters in psychology (OR = 1.1, 95% CI = 0.09–11.9, p = 0.95), having a doctorate (OR = 0.50, 95% CI = 1.2–2.0, p = 0.33) or being in the role of an expert (OR = 2.46, 95% CI = 0.50–12.1, p = 0.26), had no significant independent influence on the probability of the report using the category ‘no diagnosis’. Reports using the category ‘no diagnosis’ were written 26.6 months after the accident while other reports were written 28.6 months after the accident. This non-significant difference indicates that the greater use of this category by defendants' experts was not due to the greater delay between the accident and the report.
There was a greater proportion of psychologists in the TP group. Treating psychologists and treating psychiatrists were equally likely to use the diagnosis of PTSD (χ2 = 0.310, df = 1, p = NS) but treating psychiatrists were more likely to diagnose depression (χ2 = 4.92, df = 1, p = 0.026).
Discussion
This study demonstrates that it is possible to reliably measure aspects of the content of reports regarding psychological injury. However, the low kappa scores for aspects of the conclusion indicate that the RRS is a measure of the completeness, rather than a direct measure of the quality of a report.
The RRS scores in this series reveal that many of the reports written by treating practitioners, and some of those who purport to be medico-legal experts, were incomplete. This is consistent with the findings of Cornes and Aitken [12] and the conclusions of Mendelson [15]. The greater completeness of reports prepared by experts and those with doctorate qualifications suggests an association between the duration of training and professional experience of the author, and the completeness of medico-legal reports. This finding supports Mendelson's recommendation for specialized training in report writing [15].
An important finding of the study is that although the reports prepared by PEs and DEs were similar in completeness, the diagnoses made were significantly different. Hence further training and the introduction of guidelines for report writing may not necessarily result in a greater agreement between experts acting for the parties in adversarial disputes.
The analysis of the reliability of diagnosis was based on a subset of claims in which a number of opinions were sought. In addition, some PE reports in which no diagnosis was made have not been available. Thus the results should be viewed with some caution. With that caveat, the study suggests that brain damage is diagnosed with moderate reliability by experts engaged by either side. It also suggests that when experts are engaged by the same side there is considerable agreement about the presence of depressive disorders and the category ‘no diagnosis’ which is not evident when the experts are engaged by opposing sides. In contrast, there seems to be disagreement about the presence of PTSD between experts, irrespective of the side that has engaged them, suggesting that the disorder is not be reliably diagnosed in compensation cases.
This study was unable to determine which factors associated with the report writer's role (conscious or unconscious bias, ethical conflict, patient behaviour or the selective use of experts or reports) were important in producing the differences in the diagnoses. However, it does provide a quantitative measure of the extent to which the sum of these factors influenced the diagnosis made by experts in the roles of TP, PE and DE. Treating practitioners and PEs were twice as likely as DEs to diagnose PTSD or a depressive disorder, while DEs were up to eight times more likely than a TP or PE to find no psychiatric disorder in the same patient.
The small degree of overlap among the experts appearing for the plaintiffs and the defendants in this study is troubling, as it suggests that most of the experts only act for one side. A study that examined claims documentation from more than one defendant could confirm this.
A review of the reform of rules for expert evidence in the Federal Court of Australia and the NSW Supreme Court may show that measures such as the compulsory disclosure of reports, guidelines for the content of written evidence, and a requirement for experts to refer to their findings to justify their opinions leads to a convergence of opinion between experts. A requirement for formal training, accreditation and peer review for report writers may also lead to an improvement in the quality of reports. Nevertheless, experts will continue to be engaged and instructed by the parties from what appears to be separate pools of experts for plaintiffs and defendants.
The US Supreme Court, after the case of Daubert vs Merrell Dow Pharmaceuticals [16], approached the perceived problem of unsubstantiated expert testimony by replacing the standard of ‘general acceptance’ for the admissibility of such evidence with rules requiring that expert evidence be of ‘an adequate scientific standard’. This takes into account factors such as reliability, error rate, and publication in peer-reviewed journals. These measures do not directly address the issue of bias, but set a higher standard for the acceptance of expert opinion.
The degree to which bias influences opinion could be tested in a review of the system recently adopted in the UK for civil cases [17], in which experts are agreed to by both parties but are instructed by the court. Differences in expert evidence should then be more likely to be a result of legitimate differences in professional opinion. A discussion of the relative merits of the use of expert evidence in the adversarial and inquisitorial legal systems is beyond the scope of this paper. However, the main concern about experts being appointed by the court is that instead of being influenced by the parties who engage them, the experts may instead be influenced by the perceived wishes of the court.
Footnotes
Acknowledgements
The authors thank NRMA Insurance for making the reports available and Victor Kelly of Abbott Tout Solicitors for his organization of the files. We thank Timothy Heath for his statistical suggestions and Ian Freckelton for his review of the manuscript. The RRS is available from the first author.
