Sage Journals: Discover world-class research

Abstract

Objective: Deriving diagnoses from retrospective case note examination is a common practice in psychiatric research. The Operational Criteria (OPCRIT) diagnostic checklist is essentially a checklist built up of operational criteria defined by a comprehensive glossary and is designed to assign reliable diagnoses from case notes. However, the validity of such a procedure compared with procedures involving prospective assessment has never been tested. We examined the procedural validity of the OPCRIT diagnostic system in relation to four other diagnostic procedures mostly employing prospectively gathered information.

Method: Three experienced psychopathology raters rated the case notes and clinical abstracts, using the OPCRIT method of diagnostic assignment, of 50 subjects who had participated in an early procedural validity study as an adjunct to the DSM-IV Field Trial for psychotic disorders. The setting was the Early Psychosis Prevention and Intervention Centre (EPPIC), which focuses on first episode psychosis.

Results: The pairwise concordance with the other procedures for DSM-III-R diagnoses assigned by OPCRIT using ratings derived from either the clinical abstracts or the case notes was found to be only poor to moderate when compared with the pairwise concordance of the four other procedures. The per cent agreement between OPCRIT clinical abstracts diagnoses and the other procedures ranged from 49% to 60% with kappa values between 0.30 and 0.45, and for OPCRIT case note diagnoses and the other procedures the per cent agreement range was between 44% and 57% and the kappa values were between 0.35 and 0.49.

Conclusions: The procedural validity of diagnoses assigned via the application of checklists of operational criteria to case notes and clinical abstracts alone is unacceptably poor. Such sources need to be buttressed by other data, particularly direct patient interview and informant material.

Keywords

diagnosis DMS-III-R OPCRIT operational criteria procedural validity.

A not uncommon methodology in psychiatric research has been the retrospective assignment of diagnoses from case notes and other sources, often buttressed by statements that such methodologies provide a valid and reliable source of classification. This has been an attractive and potentially cost-effective approach which avoids the time and expense associated with case-finding and prospective psychopathological assessment. It can also be used with pre-existing data sets complemented by case records to assign diagnoses according to more modern diagnostic criteria. However, given the potential problems associated with rating psychopathology and assigning valid psychiatric diagnoses [1–4], a number of conditions need to be satisfied before accepting the validity of such methods, particularly when they are used in studies aiming to clarify the aetiology of, and risk factors for, specific disorders such as schizophrenia.

The first clear method for making the most of incomplete or patchy pre-existing clinical material was provided by the development of the PSE syndrome checklist as one of the elements of the 9th edition of the Present State Examination (PSE) [5]. This was a positive initial step, which developed for pragmatic reasons, but was complicated both by the idiosyncrasy of the data reduction rules underpinning the PSE and by the lack of careful examination of its validity as a method. More recently, a more systematic and soundly based clinical tool known as the Operational Criteria Checklist (OPCRIT) for psychotic illness has been developed [6] and used in a series of clinical and biological studies [7–11]. This instrument represents an attempt to develop a standardised polydiagnostic approach to existing clinical data sets. The authors sought to combine the ‘top-down’ approach, the major aim of which is to produce operational diagnoses, with the ‘bottom-up’ method, which involves a careful definition of a range of clinical phenomena. It has demonstrated good inter-rater reliability particularly at the full diagnostic level. Earlier checklists used by the authors on data from the Maudsley Schizophrenia Twin Series [12] produced polydiagnostic assignments which manifested different levels of heritability [13,14], hence providing some evidence of the external validity of the criteria sets, and in the process, the method itself.

A recent study by Craddock et al. [15] investigated the concurrent validity of OPCRIT with consensus best-estimate lifetime diagnoses and found good to excellent concordance. However, there were a number of serious methodological flaws associated with this study. First, the raters who rated the OPCRIT checklists also derived the consensus diagnosis and this may have influenced the diagnoses arising from the consensus procedure. Second, the study only compared OPCRIT to one other diagnostic procedure (the consensus, best-estimate lifetime diagnosis). The consensus procedure involved the two raters discussing available information, notably case note information, and then agreeing on a diagnosis. The conclusion that OPCRIT has good concurrent validity compared to other diagnostic procedures is thus somewhat premature, as it remains unknown how concordant OPCRIT is with interview-based methods of diagnostic assignment.

McGorry et al. [16] recently showed that alternative procedures to assign common sets of operational criteria have only moderate concordance. Hence it may be argued that the conclusions drawn from any particular study requiring rigorous diagnostic classification are critically dependent on the validity of the procedure employed. Unfortunately, this important limitation has been consistently overlooked by developers of new diagnostic procedures as well as the researchers using these procedures. The impact of diagnostic misclassification can be significant and may constitute a possible obstacle to progress in aetiological research [17].

The term ‘procedural validity’ was originally coined by Spitzer and Williams [18] and refers to the extent to which new diagnostic procedures yield results similar to existing diagnostic procedures. Failure to assess procedural validity is an important latent source of classification error, with a definite contribution to misclassification rates. Although the term ‘procedural validity’ assumes that some procedures are more valid than others, it remains unclear what criteria should be used to determine this.

The present study is an attempt to examine the procedural validity of the most rigorous retrospective case note diagnostic method: the OPCRIT system by separately rating the case notes and clinical abstracts of 50 patients admitted to the Early Psychosis Prevention and Intervention Centre (EPPIC) (Melbourne, Australia) who participated in the DSM-IV field trial for psychotic disorders. A previous study by McGorry et al. [16] examined the procedural validity of four methods of assigning a DSM-III-R diagnosis and the current study compares the OPCRIT method of assigning diagnoses with these methods. The original study examined the concordance of 2 validity-oriented interview-based methods of assigning with kappa values ranging from 0.53–0.67 [16]. The problem of misclassification even with kappa values which were moderate to good was highlighted. As a secondary focus, ‘historical’ diagnoses assigned by both OPCRIT and the Royal Park Multidiagnostic Instrument for Psychosis (RPMIP) are compared. It is hypothesised that the OPCRIT method of retrospective assignment of diagnoses using files or abstracts will show even more pairwise divergence than the other paired methods of diagnostic procedures, and hence be found to seriously lack validity.

Method

The case notes and clinical abstracts of 50 first episode patients consecutively admitted to EPPIC were requested. These 50 people were the same people who had also participated in the procedural validity study of the DSM-IV field trial instrument [19], the RPMIP [2,3], the Munich Diagnostic Checklists [20,21] and a consensus diagnostic procedure [16]. In this study 50 consecutively admitted patients with first-episode psychotic illness treated at the EPPIC [22], a specialist first-episode psychosis program, which has a catchment area of 800 000 people, were recruited during 1992. The mean age for the study group was 26.3 years (SD = 6.8, range = 18–45). There were more men (n = 31, 63%) than women (n = 19, 38%), the majority (n = 42, 84%) had never married, and 56% (n = 28) were unemployed at the time of index assessment. Mean number of years of education was 11.1 (SD = 2.2). Organic aetiology and mental retardation were exclusion criteria. Written, informed consent was obtained from all subjects. The study group approximated an incidence sample of first-episode psychosis for a defined area of Melbourne. The study formed part of the multicentre DSM-IV Field Trial for Schizophrenia and Related Psychotic Disorders [19] that examined the reliability and concordance of three sets of options for diagnosing DSM-IV psychotic disorders plus the criteria from DSM-III, DSM-III-R, and ICD-10. Forty-six abstracts and 45 sets of case notes were collected for the present study. The remaining abstracts and case notes were not located during the study period. An attrition rate of approximately 10% of case notes is comparable to other studies of this nature [6].

OPCRIT is described in detail by McGuffin et al. [6], but is essentially a checklist built up of operational criteria defined by a comprehensive glossary. The items are mainly psychopathology ratings with some historical and course ratings. A computer algorithm which assigns diagnostic criteria is then applied to all ratings. There are relatively few complex criteria, which is somewhat surprising, given the importance of rating the sequence and prominence of symptoms for accurate diagnostic assignment in modern diagnostic systems. OPCRIT is reported to be reliable when used with case notes, clinical abstracts, and even correspondence between clinicians. Hence, it is implied that it can be used as a stand-alone procedure though it is also characterised as an ‘accessory’ or a complementary tool [6].

Three raters (CM, CMcF and SR) shared the task of rating the case notes and clinical abstracts of the study sample. Before the study began, the raters achieved a baseline level of rater concordance by rating and subsequently discussing the case notes and abstracts of four patients, who were not part of the study sample. The rating of the case notes and abstracts of the study sample occurred in two stages. Stage 1 consisted of clinical abstract ratings alone. Twenty randomly selected abstracts were rated by all three raters for the purpose of measuring inter-rater reliability. The remaining 26 abstracts were then equally distributed between the raters. Each rater therefore rated approximately 30 abstracts.

In stage 2, each rater rated approximately 15 sets of case notes each. It was ensured that the case note and abstract diagnostic ratings for each subject used in the final pairwise comparisons were not completed by the same rater in order to maintain independence of ratings. The method used in the four other diagnostic procedures is described in McGorry et al. [16].

Results

The OPCRIT DSM-III-R diagnostic profile of the study sample is shown in Table 1.

Table 1.

Operational Criteria (OPCRIT) Diagnoses (DSM-III-R) for case notes and clinical abstracts

Diagnosis	Case notes		Clinical abstracts
	n	%	n	%
Schizophreniform	18	40	16	35
Schizophrenia	13	29	11	24
Mania with psychosis	6	13	6	13
Atypical psychosis	4	9	8	17
Depression with psychosis	2	5	0	0
No diagnosis	1	2	1	2
Schizoaffective: manic	1	2	1	2
Delusional disorder	0	0	3	7

Inter-rater reliabilities for item-by-item agreement between the three raters was assessed and found to be satisfactory (median κ = 0.84, semi-IQR = 0.19). Levels of agreement for the two data sources for assigning OPCRIT diagnosis using either the clinical abstract or case notes, plus a pairwise comparison of OPCRIT case note and clinical abstract derived diagnosis with each of the four DSM-III-R diagnostic procedures used in McGorry et al. [16] is shown in Table 2.

Table 2.

Pairwise agreement between the OPerational CRITeria (OPCRIT) procedure applied to clinical abstracts (n = 46) and case notes (n = 45), and four alternative methods of assigning DSM-III-R diagnoses in a sample of first episode psychosis

Procedure pair	Kappa	SE	Unadjusted agreement %
OPCRIT clinical abstract and OPCRIT case notes	0.32	0.09	49
OPCRIT clinical abstract and RPMIP	0.45	0.08	57
OPCRIT clinical abstract and field trial instrument	0.36	0.08	48
OPCRIT clinical abstract and Munich diagnostic checklists	0.33	0.08	46
OPCRIT clinical abstract and consensus procedure	0.30	0.08	44
OPCRIT case notes and RPMIP	0.42	0.09	56
OPCRIT case notes and field trial instrument	0.49	0.08	60
OPCRIT case notes and Munich diagnostic checklists	0.37	0.09	51
OPCRIT case notes and consensus procedure	0.35	0.09	49

RPMIP, Royal Park Multidiagnostic Instrument for the Diagnosis of Psychosis.

Pairwise comparison of OPCRIT and the four diagnostic procedures produced an 8 × 9 matrix for each pair of comparisons. Cohen's unweighted nominal kappa and associated standard errors [23] is presented as the index of agreement between the eight pairs of comparisons as well as the per cent agreement between the pair.

Pairwise kappa values as well as per cent agreement between OPCRIT and each of the four comparison diagnostic procedures ranged from poor to moderate. Full diagnostic concordance for DSM-III-R diagnosis between the OPCRIT case note procedure and all of the other four procedures occurred in only 35.6% of cases and full diagnostic concordance between the OPCRIT clinical abstract procedure and all of the other four procedures occurred in only 32.6% of cases. By contrast, the pairwise comparisons in our initial study [16] were considerably better than the current results. Pairwise kappa values obtained in the original study ranged from 0.53 to 0.67, unadjusted agreement values between the pairs of procedures ranged from 66% to 76% and full diagnostic concordance between the four procedures occurred in 54% of the cases. The level of concordance between the two sets of OPCRIT ratings applied to the case notes and the clinical abstracts was also poor to moderate. However, neither data source was associated with better concordance when OPCRIT was compared with the other diagnostic procedures.

Since OPCRIT and the RPMIP are both polydiagnostic procedures, kappa values for common historical diagnoses, including Feighner, RDC, Schneider, Taylor and Abrams (schizophrenia) and Bouffée Délirante (Licet-S) [2,3] were also derived. These again proved to be very poor, with the highest agreement occurring between the OPCRIT case note rating and the RPMIP for Schneider's classification of schizophrenia (κ = 0.38) SE = 0.14).

Discussion

This study has demonstrated that the pairwise concordance of DSM-III-R diagnoses assigned by OPCRIT alone using two forms of case material with DSM-III-R diagnoses assigned by four other methods is poor and substantially lower than the pairwise comparisons between the remaining four methods. It is acknowledged that one of the original four methods, the consensus procedure, was to some extent retrospective, like OPCRIT. However, there was always one clinician present who knew the patient well during this procedure and who could answer specific questions regarding missing data. The overall results are of some concern and have important implications.

The sources of discordance are relatively simple to identify and principally involve information variance. A case record or abstract is a clinical tool, not a research database, as emphasised by Lützoft et al. [24]. Features not recorded in the abstract or case notes are not necessarily absent. They may have been missed, not recorded, not felt to be important for diagnosis or management or interpreted differently. The sequence and prominence of symptoms, critical for diagnostic assignment, may not have been clearly recorded and those features that have been recorded may not have been rated according to strict glossary definitions.

A clue to the weakness of the data source and the method can be found in earlier OPCRIT studies, where 20–40% of cases meet criteria for atypical psychosis (DSM-III) or Psychotic Disorder NOS (DSM-III-R), although the rate in the present study was somewhat lower, especially where the complete case notes were used. In the earlier work however, the authors inappropriately blamed the diagnostic system rather than the retrospective methodology, which is compromised by missing or limited data [25]. This is particularly the case with clinical abstracts, though OPCRIT applied to detailed case notes did not achieve any higher concordance in this study. A further issue contributing to discordance could relate to the content of the algorithms, which often blend low-quality, elemental data (even when the data set is complete). This relates to the complexity and the quality of the checklist itself.

This raises the question of what level of quality of information (if any) would suffice for OPCRIT or any similar approach to be used as a stand-alone procedure. OPCRIT was developed in an academic unit which prides itself on producing exemplary case notes. However, the situation was confused somewhat in the original research [6], since the authors were not clear as to the extent to which case notes were augmented by the PSE ratings, and they included OPCRIT ratings made from correspondence to general practitioners. This suggests that the quality of the data set must have been extremely variable, yet buttressed in many cases. Our study was also conducted in a very active research environment with a 12-year focus on diagnostic research, including participation in both the ICD-10 and DSM-IV field trials for psychotic disorders. We believe the quality of our case notes and clinical abstracts to be also very good, and more than adequate for the purpose for which they were intended, namely clinical management. They have also functioned as a useful accessory source of information to contribute to psychopathological ratings in our research and feed into the RPMIP procedure [2,3].

In our opinion, the results of this study indicate that the role of case notes and related material in diagnostic assignment should be restricted to an adjunctive one. Consequently, procedures such as OPCRIT, as the authors of OPCRIT suggested as an option, (though they indicated it can be used on its own) [6], should function solely as accessories in a more global task of diagnostic assignment centred around direct subject interview. This appears to be the approach adopted in the schedule for clinical assessment in neuropsychiatry (SCAN) [26] where the item group checklist (IGC) functions in a complementary manner to the PSE 10. Given that the dominant research strategy in psychosis is based around the reduction of heterogeneity within psychotic disorders [27], the minimisation of the risk of misclassification seems essential. The findings of the present study must bring into question the conclusions of any studies that have relied solely on OPCRIT or similar procedures to assign diagnoses retrospectively using only case record material.

Acknowledgments

The authors would like to acknowledge the statistical support of Susan Harrigan and Paul Dudgeon and the financial support of VicHealth (Victorian Health Promotion Foundation) and the National Health and Medical Research Council.

References

McGorry

Copolov

Singh

. The validity of the assessment of psychopathology in the psychoses. Australian and New Zealand Journal of Psychiatry 1989; 23: 469–482

McGorry

Copolov

Singh

. Royal Park multidiagnostic instrument for psychosis, part I: rationale and review. Schizophrenia Bulletin 1990; 26: 501–515

McGorry

Singh

Copolov

Kaplan

Dossetor

Van Riel

. Royal Park multidiagnostic instrument for psychosis, part II: development, reliability and validity. Schizophrenia Bulletin 1990; 26: 517–536

Kendell

. Clinical validity. Psychological Medicine 1989; 19: 45–55

Wing

Cooper

Sartorius

. The measurement and classification of psychiatric symptoms. Cambridge University Press, Cambridge 1974

McGuffin

Farmer

Harvey

. A polydiagnostic application of operational criteria in studies of psychotic illness. Development and reliability of the OPCRIT system. Archives of General Psychiatry 1991; 48: 764–770

Castle

Wessely

Der

Murray

. The incidence of operationally defined schizophrenia in Camberwell, 1965–84. British Journal of Psychiatry 1991; 159: 790–794

Castle

Wessely

Murray

. Sex and schizophrenia: effects of diagnostic stringency, and associations with premorbid variables. British Journal of Psychiatry 1993; 162: 658–664

Howard

Castle

Wessely

Murray

. A comparative study of 470 cases of early-onset and late-onset schizophrenia. British Journal of Psychiatry 1993; 163: 352–357

10.

Kendall

Malcolm

Adams

. The problem in detecting changes in the incidence of schizophrenia. British Journal of Psychiatry 1993; 162: 212–218

11.

Williams

Farmer

Wessely

Castle

McGuffin

. Heterogeneity in schizophrenia: an extended replication of the hebephrenic-like and paranoid-like subtypes. Psychiatric Research 1993; 49: 199–210

12.

Gottesman

Shields

. Schizophrenia and genetics: a twin study vantage point. Academic Press, Orlando, FL 1972

13.

McGuffin

Farmer

Gottesman

Murray

Reveley

. Twin concordances for operational definitions of schizophrenia. Archives of General Psychiatry 1984; 41: 541–545

14.

Farmer

McGuffin

Spitznagel

. Heterogeneity in schizophrenia: a cluster analytic approach. Psychiatric Research 1983; 8: 1–12

15.

Craddock

Asherson

Owen

Williams

McGuffin

Farmer

. Concurrent validity of the opcrit diagnostic system. British Journal of Psychiatry 1996; 169: 58–63

16.

McGorry

Mihalopoulos

Henry

, et al. Spurious precision: procedural validity of diagnostic assessment in psychotic disorders. American Journal of Psychiatry 1995; 152: 220–223

17.

Kendler

. The impact of diagnostic misclassification on the pattern of familial aggregation and coaggregation of psychiatric illness. Journal of Psychiatric Research 1987; 21: 55–91

18.

Spitzer

Williams

JBW

. Classification of mental disorders and DSM-III. Comprehensive textbook of psychiatry3rd edn, Kaplan

Freedman

Sadock

. Williams & Wilkins, Baltimore, MD 1980; vol 1

19.

American Psychiatric Association . Report from the DSM-IV field trial for schizophrenia and related psychotic disorders. University of Iowa, Iowa City, IA 1992

20.

Hiller

von Bose

Dichtl

Agerer

. Reliability of check-list-guided diagnoses for DSM-III-R affective and anxiety disorders. Journal of Affective Disorders 1990; 20: 245–247

21.

Hiller

Zaudig

Mombour

. The development of diagnostic checklists for use in routine clinical care: a guideline designed to assess DSM-III-R diagnoses. Archives of General Psychiatry 1990; 47: 782–784

22.

McGorry

. Early psychosis prevention and intervention centre. Australasian Psychiatry 1993; 1: 32–34

23.

Cohen

. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 1960; 20: 37–46

24.

Lützhoft

Shadhede

Fätkenheuer

, et al. Symptom assessment in case notes and clinical diagnosis of schizophrenia. Psychopathology 1995; 28: 131–139

25.

Farmer

Wessely

Castle

McGuffin

. Methodological issues in using a polydiagnostic approach to define psychotic illness. British Journal of Psychiatry 1992; 161: 824–830

26.

Wing

Sartorius

Üstiin

. Diagnostic and clinical measurement in psychiaatry: an instruction manual for the SCAN system. Cambridge University Press, Cambridge 1996

27.

Carpenter

Buchanan

. Schizophrenia. New England Journal of Medicine 1994; 330: 681–690

The Procedural Validity of Retrospective Case Note Diagnosis

Abstract

Keywords

Method

Results

Discussion

Acknowledgments

References