Abstract
New technologies in pathology and molecular biology have created opportunities for novel and powerful investigations into the neurobiology of psychiatric disorders [1]. The validity of these investigations relies upon a foundation of accurate clinical and psychopathological diagnosis. In the absence of clinicopathological correlates in psychiatry, the validation of diagnoses is difficult [2]. Several researchers have suggested that the stability of diagnosis over an individual's lifetime may indicate its validity, because stable diagnoses are likely to reflect basic and consistent pathological processes [3–5].
In the absence of prospective research interviews, the post-mortem assignment of psychiatric diagnoses presents considerable challenges [6]. These challenges arise primarily from the necessity of performing retrospective review of the clinical symptoms of a post-mortem population. Confirmation of the psychiatric diagnosis of brain tissues donated to research requires particular care, with two key approaches to post-mortem diagnosis predominating today. These approaches are (i) retrospective review of medical records, often used as the primary diagnostic technique within brain banking; and (ii) psychological autopsy interviews (i.e. post-mortem family interviews), used either in isolation, or in combination with medical record review, to provide more comprehensive data. Although current consensus indicates that both approaches are optimal in the absence of prospective research data, many brain banks use only one of these techniques, due to financial, ethical and/or time constraints involved in the clinical characterization of post-mortem tissue [7].
Where the evaluation of medical records is performed, a common approach involves the practice of consensus diagnosis by trained clinicians. Another less common approach entails the application of a diagnostic assessment instrument such as the Diagnostic Evaluation After Death (DEAD) [8], the Diagnostic Instrument for Brain Studies (DIBS) [9], or the Item Group Checklist (IGC) of the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) [10], to medical record narratives. A review of diagnostic methods used in >50 of the most recent post-mortem neurobiological studies of schizophrenia, bipolar disorder and major depressive disorder found that fewer than 25% of these had used one of the aforementioned instruments. The present review also found that post-mortem family interviews were used to supplement medical record review in only 27% of studies. Despite a growing recognition of the importance of standardization in diagnostic process, there is limited research on the reliability or validity of post-mortem clinical diagnosis. The primary aim of the present study was to investigate the degree of concordance between predominant ante-mortem psychiatric diagnoses indicated in medical records, and post-mortem diagnoses derived through structured diagnostic instruments such as the DIBS, and the IGC of the SCAN.
Methods
Subjects
All human brain specimens were obtained at autopsy, through the Department of Forensic Medicine (DOFM) in Sydney, Australia. Consents and authorizations were acquired in accordance with New South Wales State legislation, at the time of donation (Coroners Act 1980, Human Tissue Act 1983). Specimens received from May 2002 onwards were obtained through recorded verbal consent given during a telephone call by members of the New South Wales Tissue Resource Centre (TRC), to next of kin (NOK) on the day of autopsy, with subsequent written consent received from all NOK. The procedure for these calls is described in further detail by Azizi et al. [11]. Ethics approval for the current protocol was granted by Sydney South West Area Health Service.
Sixty-three cases with a post-mortem psychiatric diagnosis collected between July 1994 and December 2005 were initially reviewed for the present study. Extensive clinical history on five of these cases was no longer available at the time of review, therefore these were excluded from further analysis. A final sample of 58 cases was included, consisting of 42 men (mean age = 47.05 years, SD = 14.44, range = 19–76 years) and 16 women (mean age = 52.00 years, SD = 13.44, range = 31–73 years). Twenty-three of the 58 subjects (39.7%) had committed suicide and 28 (48.3%) died of natural causes. The median age of illness onset for the entire sample was 23.50 years (range = 12–58 years) and the average duration of illness was 22.63 years (SD = 14.28, range = 1–53 years).
Assessment
Post-mortem diagnosis
The post-mortem Axis 1 clinical diagnosis of each case was determined primarily through extensive review of medical records by one of two independent clinicians (two psychiatric nurses). Where available, data from pathology reports and neuropsychological test results were also collected. Information relating to the donor's psychiatric history, developmental history, family history of mental illness, drug, alcohol and medical treatment history was collated into a formatted, structured treatment summary, as previously described [12]. Where possible, interviews with general practitioners and psychiatric specialists were also conducted.
In 24 of the 58 cases (characterized prior to 2001), the IGC of the SCAN (Version 2.0) was systematically applied to each treatment summary to confirm the psychiatric diagnosis of the case. The SCAN is a semi-structured instrument developed by the World Health Organization (WHO), and designed for administration by a trained clinician, using classifications of psychopathology according to DSM-III-R [13] and ICD-10 [14] criteria. Post-mortem diagnoses in this study assigned through the IGC of the SCAN were generated specifically through the application of DSM-III-R criteria, in view of its wide current use by researchers. Previous studies have used the SCAN as a gold standard in psychiatric diagnosis and its reliability and validity have been favourably reported [15–17].
In the remaining 34 cases (characterized from 2001), this process was revised to incorporate the DIBS instead. The DIBS is a newer diagnostic instrument, also semi-structured in design, and intended specifically for post-mortem psychiatric assessment using medical records and informants where available. The diagnostic summary of the DIBS generates a diagnosis of schizophrenia on ICD-10, DSM-III-R, DSM-IV [18], Research Diagnostic Criteria [19], Schneider [20] and Feighner et al.[21] criteria. Mood disorders and other psychoses are diagnosed using DSM-IV and ICD-10 criteria [22]. In the present study post-mortem diagnoses generated through the DIBS were assigned using DSM-IV criteria only, in light of its wide diagnostic use in Australia. Other advantages of the DIBS include demonstrated reliability [23], and the enablement of diagnosis at a sub-syndrome and symptom-based level [24].
Predominant ante-mortem diagnosis
The predominant ante-mortem diagnosis of each case was selected by an independent clinician blind to the post-mortem diagnosis assigned to each case, through careful examination of all available medical records. This diagnosis was defined as the Axis 1 disorder that best represented the individual's lifetime illness course. Decisions were based on the number of times the diagnosis was assigned during the individual's lifetime, and/or the length of medical observation or review prior to the assignment.
Statistical analysis
Kappa coefficients [25] were calculated to determine the level of agreement between predominant ante-mortem psychiatric diagnoses indicated in clinical records, and post-mortem diagnoses derived through the structured diagnostic instruments of the DIBS and the IGC of the SCAN. Statistical analyses were conducted using SPSS 11.5 (SPSS, Chicago, IL, USA).
Results
The kappa coefficient for inter-rater reliability across all diagnostic categories (n = 58) was 0.66. Cases with a comorbid substance use diagnosis were collapsed into their individual Axis 1 psychiatric disorder category, for the purpose of calculating agreement. The data were also adjusted to accommodate one case with an ante-mortem diagnosis of delusional disorder (post-mortem diagnosis of schizophrenia), as recommended by DeCoster [26]. Kappa coefficients are displayed in Table 1.
Diagnoses within diagnostic categories for combined DIBS and SCAN cases (n = 58)
AM, ante-mortem; DIBS, Diagnostic Instrument for Brain Studies; PM, post-mortem; SCAN, Schedules for Clinical Assessment in Neuropsychiatry
†The schizophrenia cohort included one subject with a comorbid diagnosis of schizophrenia and alcohol abuse disorder (ante-mortem and post-mortem), and a second subject with the same diagnosis ante-mortem only.
‡The major depressive disorder cohort included one subject with a comorbid ante-mortem diagnosis of major depressive disorder and alcohol dependence.
§An adjustment was made to the data to accommodate this subject, therefore the collapsed mood and psychotic disorder cohorts at the base of the table sum to 57 in the ante-mortem column [26].
Of the 39 cases comprising the schizophrenia cohort (with either an ante-mortem and/or post-mortem diagnosis of schizophrenia), there were 28 cases with diagnostic agreement (Table 1). Of the 11 cases with diagnostic disagreement in this category, one case was diagnosed with delusional disorder, seven cases with schizoaffective disorder and three cases with bipolar disorder.
Eleven subjects were assigned a diagnosis of schizoaffective disorder, and eight of these had diagnostic disagreement. Seven subjects were alternatively diagnosed with schizophrenia and one with major depressive disorder.
Of the 12 subjects diagnosed with major depressive disorder, only one had diagnostic disagreement, receiving a post-mortem diagnosis of schizoaffective disorder.
Finally, seven subjects were assigned a diagnosis of bipolar disorder and within this cohort there were three cases involving disagreement; each of these subjects received an alternate diagnosis of schizophrenia.
According to Fleiss, from a simple empirical perspective, kappa values between 0.40 and 0.75 represent moderate agreement; and values above 0.75 reflect excellent agreement [27]. These results indicate moderate–excellent inter-rater reliability for three of the four individual cohorts in this sample (Table 1). The lowest level of concordance was found within the schizoaffective disorder cohort, with a kappa of 0.35, reflecting poor inter-rater agreement. For the purpose of comparison with previous research, two additional kappa coefficients were calculated. The psychotic disorder cohort consisted of a collapsed group of all schizophrenia and schizoaffective disorder cases. The mood disorder cohort consisted of the bipolar disorder and major depressive disorder cohorts. As shown in Table 1, these kappas were both within the excellent range of agreement.
Furthermore, to compare the inter-rater reliability of cases characterized separately by the DIBS versus the IGC of the SCAN, four additional kappas were calculated. These included the collapsed psychotic and mood disorder cohorts reported here, divided by diagnostic instrument. For this analysis, collapsed cohorts were essential, due to the small number of cases in some diagnostic categories. As shown in Table 2, these kappas indicate excellent reliability for both the psychotic and mood disorder cohorts of cases characterized by the IGC of the SCAN, whereas cases characterized by the DIBS demonstrate only moderate reliability for both cohorts.
Diagnoses in DIBS and SCAN cases separately (n = 58)
AM, ante-mortem; DIBS, Diagnostic Instrument for Brain Studies; PM, post-mortem; SCAN, Schedules for Clinical Assessment in Neuropsychiatry.
Discussion
The primary aim of the present study was to examine the level of agreement between predominant ante-mortem psychiatric diagnoses obtained from medical records, and post-mortem diagnoses derived through the diagnostic instruments of the DIBS and the IGC of the SCAN. Differences were shown in the level of agreement between the four individual diagnostic categories examined, with a surprisingly high level of agreement evidenced by the major depressive disorder cohort, and a conversely low level of agreement for the schizoaffective disorder cohort. The collapsed mood and psychotic disorder cohorts (of combined DIBS and SCAN cases) each had excellent agreement.
These results replicate those of Keilp et al. [28], who compared consensus diagnoses with existing medical chart diagnoses and found a marginally higher kappa of 0.74, versus our kappa of 0.61, for the schizophrenia cohort. The reliability of another category in the Keilp et al. study entitled ‘other psychotic disorders’, which encompassed schizoaffective disorder, delusional disorder and psychotic disorder not otherwise specified, received a kappa of 0.40, only slightly higher than the present kappa of 0.35 for the schizoaffective disorder cohort on its own. A third category, of mood disorders, (including both major depressive disorder and bipolar disorder) received a kappa of 0.79 in the Keilp et al. study [28], comparable with the kappa for the present collapsed mood disorder cohort (of DIBS and SCAN cases combined), of 0.84.
Deep-Soboslay et al. compared the reliability of diagnoses obtained from psychiatric record reviews with post-mortem family interviews [7]. Although those findings are not directly comparable to the present results, it is interesting to note that an exceptionally high kappa (of 0.94) was obtained for the schizophrenia cohort (a collapsed category including all subtypes of schizophrenia and schizoaffective disorder), and only moderate kappas were obtained for the major depressive disorder and bipolar disorder cohorts.
Perhaps the most striking finding in the present study is the contrast between the individual kappas obtained for both the schizophrenia (0.61) and schizoaffective disorder (0.35) cohorts, with their collapsed kappa of 0.80. This collapsed value is of particular significance because it highlights the frequent diagnostic pairing of the two psychotic disorders within this sample. In seven of the eight instances of diagnostic mismatch for schizoaffective disorder, the alternate diagnosis was schizophrenia. Conversely, in seven of the eleven instances of diagnostic mismatch for schizophrenia, the alternate diagnosis was schizoaffective disorder. These results are unsurprising given previous research showing that diagnostic shifts between schizophrenia and schizoaffective disorder are far from rare [3], [6].
The poor kappa for the schizoaffective disorder cohort may be attributed to a number of additional factors. First, although the concept of ‘schizoaffective psychosis’ was first introduced by Kasanin in 1933, this category has remained somewhat controversial to the present day [3], [29]. Schizoaffective disorder was first operationalized with the publication of DSM-III-R, and prior to this it was retained without diagnostic criteria, ‘for those instances in which the clinician is unable to make a differential diagnosis with any degree of certainty between affective disorder and either schizophreniform disorder or schizophrenia’ (DSM-III, p.202) [30]. Research indicates that ambiguity remains in the application of the current criteria for schizoaffective disorder within DSM-IV and that the inter-rater reliability of present day criteria is unsatisfactory [31], [32].
Second, there is a likelihood that positive symptoms of psychosis, such as auditory hallucinations and paranoid delusions, are more easily identified and better documented than subtle affective symptoms [7], leading to difficulties with the post-mortem diagnosis of schizoaffective disorder, which often relies upon adequate recorded symptom description. This may explain why three subjects in the present study were assigned a predominant ante-mortem diagnosis of schizoaffective disorder, but did not meet criteria for this diagnosis during post-mortem analysis of medical records, leading to the alternate diagnosis of schizophrenia.
Third, we examined the possibility that the poor kappa for the schizoaffective disorder cohort was simply an artefact of the method used in the present study, in which post-mortem diagnoses were compared with predominant lifetime diagnoses (rather than final ante-mortem record diagnoses, as chosen by previous researchers; e.g. Deep-Soboslay et al. [7]). In other words, was it unfair to compare older diagnoses assigned using outdated (or undefined) criteria, with diagnoses derived from current DSM criteria at the time of post-mortem diagnosis? Close scrutiny of the dataset indicated two subjects assigned a predominant ante-mortem diagnosis of schizophrenia, consistently diagnosed with schizoaffective disorder in medical records, during only the latter years of their lifetime, subsequent to the operationalization of the schizoaffective disorder criteria. Re-evaluation of the data, with these two cases taken into consideration, would raise the kappa coefficient for the schizoaffective disorder cohort to the very low end of the moderate range, suggesting that late-arriving diagnostic criteria for schizoaffective disorder may have only marginally reduced the kappa coefficient for the original cohort. There were no other cases for which the late-arriving diagnostic criteria may have negatively impacted the kappa.
Within the data there were three additional instances of diagnostic mismatch between a predominant ante-mortem diagnosis of schizophrenia and a post-mortem diagnosis of schizoaffective disorder. It is possible that disagreement for these cases arose from the opportunity for more objective and comprehensive analysis of recorded mood symptoms after death, allowing their incorporation into the diagnostic process. It is interesting to note that in each of these three cases, depressive symptoms were often noted alongside ‘schizophrenia’ during the individual's lifetime, but these were never incorporated into the diagnosis as ‘schizoaffective disorder’, even though each of these patients survived for a significant period of time following the operationalization of schizoaffective disorder in DSM-III-R.
DSM criterion shifts have not only occurred for schizoaffective disorder, but also for schizophrenia, bipolar disorder and major depressive disorder, over various revisions of the DSM. It is thus important to explain the rationale behind the choice of the predominant (vs the final) ante-mortem diagnosis in the present study. As introduced earlier, previous research suggests that the stability of diagnosis over time may indicate its validity, with stable diagnoses reflecting fundamental pathological processes [3–5]. For the purpose of the present study it was thought that a longitudinal approach to the choice of ante-mortem diagnosis would result in comparison of diagnoses with greater potential validity and more reflectivity of the individual's lifetime illness course. Many of the present subjects had received two or more diagnostic changes throughout their lifetime, and it was considered more appropriate to compare ante-mortem diagnoses with demonstrated stability, to post-mortem diagnoses, than simply the final diagnosis indicated in records. This was particularly important because the symptoms contributing to post-mortem diagnoses obtained through the DIBS and the SCAN, were rated throughout the individual's lifespan, rather than simply during recent months prior to death.
The exceptionally high kappa of 0.95 for the major depressive disorder cohort also merits discussion. According to Deep-Soboslay et al., the post-mortem diagnosis of mood disorders may be more challenging than that of psychotic disorders, due to the fact that many patients with mood disorders tend to return to baseline functioning in between episodes, and have shorter inpatient hospital admissions, leading to brief inpatient records [7]. The severity of mood symptoms may also be less pronounced and thus more challenging for treatment professionals to reliably record [7]. A heightened awareness of these characteristics is likely to have led to greater caution from TRC staff in the selection of mood disorder cases pursued for brain donation. Subjects with a longer history of affective illness, and therefore anticipated better documentation (as indicated in police reports to the coroner's office), were more likely to be pursued, ultimately leading to the greater diagnostic reliability demonstrated for this cohort.
In addition to the central analysis discussed here, a further series of kappas was calculated to compare inter-rater reliabilities for the DIBS and SCAN cases separately, using collapsed cohorts. As indicated earlier, cases characterized by the SCAN demonstrated excellent inter-rater reliability, whereas cases characterized by the DIBS had only moderate reliability. This disparity may be accounted for by a number of factors, including higher thresholds required to attain positive ratings of symptoms for the SCAN; a more detailed and comprehensive symptom-base within the SCAN, as compared to the DIBS; and a well defined glossary, instructions manual and comprehensive training schedule at WHO-designated training centres for the SCAN. The SCAN system has been developed over several decades from work beginning in the late 1950s, and has also been used as a gold standard for psychiatric diagnosis in previous research [15–17]. In contrast, the DIBS is a newer diagnostic instrument, with less well-defined instructions and a smaller symptom base, possibly making it a less reliable instrument than the SCAN.
Given the established difficulty in validating both ante-mortem and post-mortem psychiatric diagnoses, it may be pertinent for future research to compare the reliability of diagnoses made through various instruments such as the DIBS, the DEAD and the IGC of the SCAN, with those of prospective donor assessments, conducted with living subjects. This approach may assist with the verification of diagnoses made after death [7]. Furthermore, research comparing the reliabilities of both the final and predominant ante-mortem diagnosis of donors with their corresponding post-mortem diagnoses, may provide further information about the potential influences of prevailing diagnostic criteria on patients. Although beyond the scope of the current paper, these issues could be explored in greater detail in future studies.
In conclusion, although the inter-rater reliability reported in this research is of a moderate–excellent level for three of the four individual diagnostic cohorts, there is sufficient disagreement, particularly in the schizoaffective cohort, to suggest the value of applying a reliable, standardized instrument such as the IGC of the SCAN to record actual clinical symptoms. A standardized approach would likely increase the reliability and validity of post-mortem diagnosis, and ultimately, prospective tissue-based research.
In addition, the use of data from multiple sources (i.e. prospective ante-mortem interviews, medical records, forensic autopsy reports, police reports and post-mortem interviews with medical professionals and family members) could provide a post-mortem research team with a clearer global picture of a person's psychiatric functioning and symptomatology, leading to more accurate diagnosis [7].
The present study also highlights the importance of accurate and detailed medical record keeping at a symptom-based level across all mental health professions. In the absence of clear and adequate symptom-based detail, the reliability of both ante-mortem and post-mortem diagnosis is likely to be compromised, jeopardizing the quality of care received by patients during their lifetime, and the ultimate validity of research outcomes.
Footnotes
Acknowledgements
This work was supported by the Schizophrenia Research Institute, utilizing infrastructure funding from NSW Health. The NSW TRC is supported by The University of Sydney, the Schizophrenia Research Institute, the National Institutes on Alcohol Abuse and Alcoholism (NIAAA), Sydney South West Area Health Service (SSWAHS) and the National Health and Medical Research Council (NHMRC). We wish to thank Donna Sheedy for conducting initial data extraction for this study from the TRC database, and Drs Cheryl Cordery and Antony Harding for their comments during manuscript development. We also extend our sincere appreciation to the Department of Forensic Medicine, Glebe, and the NSW State Coroner's Court. On behalf of the current members of the NSW TRC we would like to acknowledge the efforts of previous staff, and the kind support of family members who generously donated the tissue specimens of their loved ones.
