Abstract
The diagnosis of personality disorders is among the most controversial and problematic of all psychiatric diagnostic categories [1]. Much of this controversy surrounds the issue of whether personality disorder diagnoses are even useful [2], and also whether there should be categorical or dimensional methods of classifying personality disorders [3]. Despite these controversies, the current classification system for personality disorders in psychiatry is categorical, and this study concerns itself with investigating the accuracy of categorical diagnosis in a clinical setting.
In clinical settings the standard assessment used for diagnosis is clinical interview; however, clinical interviews have been shown to have poor reliability, for example with median kappas of around 0.25 between interviewers [4]. Consequently, advancements in both research and clinical practice in the area of personality disorders is potentially hindered by the questionable accuracy of clinical diagnosis. This is important because in recent years there have been advancements in evidence-based interventions for particular personality disorders, for example, dialectical behaviour therapy has been shown to be effective in the treatment of borderline personality disorder [5]. Thus, accuracy of diagnosis has implications in both the advancement of research for particular disorders, and being able to tailor effective treatments for accurately diagnosed disorders.
Semistructured interviews represent an alternative assessment method for diagnosis of personality disorders and have the advantage of being systematic interviews and enable the clinician to assess criteria through a set of consistently applied questions that are scored in a replicable way. This study aimed to compare the diagnostic concordance between standard clinical assessment by clinical interview, and two structured interviews: the International Personality Disorder Examination (IPDE) [6] and the Mini International Neuropsychiatric Interview (MINI) [7]. The primary interest in the study was to investigate the reliability of personality disorder diagnosis, as the research was a part of improving clinical practice in a specialist personality disorder treatment programme. However, the MINI was included so that all disorders could be compared, as it is well known that individuals with personality disorders often have many other comorbid diagnoses [8]. The IPDE was chosen in this study as the diagnostic measure for personality disorders as it is an international tool that was tested in a large international WHO field trial, which provided an unusually exacting test of reliability to which no other interview for personality disorders has been subjected. The aim was to compare whether the use of structured measures would produce different diagnoses than those made as usual in a clinical setting.
Method
Participants
Thirty-three patients participated in the study as part of their attendance at a specialist outpatient group therapy programme for personality disorders, the CHANGES programme, at the Inner City Mental Health Service, Royal Perth Hospital. The average age of patients was 34 years, 73% were female, 76% had less than or equal to high school graduation, 79% were single and 91% were unemployed.
Measures
Mini International Neuropsychiatric Interview (MINI)
The MINI is a short structured interview that took an average of 30 min to administer, and screens for all psychiatric disorders except for personality disorders [7]. These psychiatric disorders will hence be referred to as ‘symptom disorders’ for ease of communication in comparison to personality disorders. The MINI has both DSM-IV and ICD-10 versions; the latter version was used because the ICD-10 is the official diagnostic manual that is used in clinical practice within the health department of Western Australia. It has 2–4 screening questions per disorder. Additional symptom questions are asked only if the screen questions are positively endorsed. The MINI has been shown to have good concordance with other diagnostic measures [9]. The MINI also has good inter-rater reliability with kappa coeffecients ranging between 0.88 and 1.0, and good test–retest reliability with coeffecients ranging between 0.76 and 0.93 [7].
International Personality Disorder Examination (IPDE)
The IPDE is a 65-item semistructured interview designed to diagnose personality disorders [6]. It was developed out of a joint initiative started in 1979 between the World Health Organization (WHO) and the US Alcohol, Drug Abuse and Mental Health Administration to develop a battery of standardized diagnostic instruments for use worldwide to increase diagnostic reliability. The IPDE was developed as the official WHO tool for personality disorder diagnosis. Reliability of the IPDE was tested through the interview being administered to over 700 patients in several different languages by a large number of clinicians across 11 countries in North America, Europe, Africa and Asia. The inter-rater reliability was good, ranging from 0.73 to 0.91, with a mean of 0.83 and good temporal stability ranging from 0.62–1.0, with a mean of 0.80 [10]. There are both DSM-IV and ICD-10 versions available: the latter was used. It took an average of 210 min to administer.
The interview aims to get an optimal balance between a spontaneous clinical interview and the requirements of standardization and objectivity. The questions are arranged under six thematic areas: work; self; interpersonal relationships; affects; reality testing; and impulse control. The examiner is required to use a standard set of probes to question interviewees beyond their initial positive responses to a question to get convincing examples to support it. Standard probes must also be asked about onset and duration of traits or behaviour as the IPDE requires that the trait or behaviour must be present for at least 5 years. It is also required that at least one criterion of a disorder must have been fulfilled prior to the age of 25 years to satisfy onset as being in childhood, adolescence or early adulthood. A standard set of probes about the frequency of the trait or behaviour must also be asked, and this forms the basis of scoring each item which is done during the interview.
Procedure
To determine diagnosis by structured tools, existing patients of the CHANGES programme were interviewed using the MINI and IPDE by two interviewers each with several years of experience in diagnosis. The first author, a clinical psychologist, conducted 26 of the IPDE interviews and seven were conducted jointly with the third author, a consultant psychiatrist. In the joint interviews each clinician asked an equal number of questions and scored each question independently and at the end of the interview calculated the rate of agreement on scores. There was an average of 93% agreement rate between the interviewers. Official training in the IPDE was given by a trainer from the WHO, and several practice interviews were conducted as a part of training prior to the study. The first author conducted all MINI interviews, and several practice interviews were conducted prior to the study.
To determine diagnoses on the basis of standard clinical assessment, diagnoses were gathered from the patients' referral forms to the programme. These were completed by the treating doctors referring the patient, who were primarily consultant psychiatrists or psychiatric registrars. The referral form requires that all ICD-10 diagnoses are 486 DIAGNOSTIC CONCORDANCE OF ICD-10 PERSONALITY AND COMORBID DISORDERS listed for the patient. Each patient's referral form was examined, and all diagnoses were recorded. These diagnoses were all made by clinical interview, and were conducted either while the patient was an inpatient on a psychiatric ward or as part of a routine outpatient psychiatry appointment.
Statistical analysis
Diagnostic concordance between the two assessment methods of structured interviews and standard clinical assessment was estimated in part by ‘diagnostic sensitivity’ and ‘diagnostic specificity’, following the methodology of Janca, et al.[11]. This involved using structured interview diagnosis as the standard, and clinical assessment diagnosis as the test. Diagnostic sensitivity was the percentage of all positive structured interview diagnoses with an identical diagnosis made by standard clinical assessment. Diagnostic specificity was the percentage of all negative structured interview diagnoses also identified as negative by standard clinical assessment. These calculations use diagnoses as the base rather than individuals [11]. Concordance between instruments was calculated for both symptom disorder and personality disorder diagnosis overall. For the symptom disorders, each individual was evaluated for 17 diagnoses on the MINI, so the potential number of diagnoses was 561, which was 17 · 33. For the personality disorders, each individual was evaluated for nine diagnoses on the IPDE, so the potential number of diagnoses was 297, which was 9 · 33.
In addition to calculating concordance overall, specific diagnostic groups within each of the symptom disorders and personality disorders were assessed. For symptom disorder diagnosis the groups were: substance use disorders (drug and alcohol dependence); mood disorders (major depression, bipolar disorder, dysthymia); neurotic disorders (generalized anxiety disorder, social phobia, post-traumatic stress disorder, agoraphobia, panic disorder, specific phobia, obsessive– compulsive disorder, body dysmorphic disorder, dissociative disorder, adjustment disorder); and eating disorders (bulimia nervosa, anorexia nervosa). For personality disorder diagnosis, the diagnoses were separated into: cluster A (paranoid, schizoid, dissocial); cluster B (emotionally unstable impulsive type and borderline type, histrionic); and cluster C (anankastic, anxious, dependent). Although the division of personality disorders into clusters was developed for DSM and results of studies validating this division have been varied [12], the cluster structure of personality disorders is still used in clinical practice. Thus, the division into clusters in this study was to enable a more meaningful calculation of concordance given that some specific personality disorders had a low incidence in the sample that precluded calculation of diagnostic sensitivity and specificity for each specific disorder. Examination of diagnostic concordance was also done by calculating kappa coefficients as a measure of concordance between standard clinical assessment and structured interview diagnoses.
Results
Symptom disorder diagnosis
Diagnostic concordance between clinical ICD-10 diagnosis and MINI diagnosis is shown in Table 1. There were more diagnoses made on the MINI than by standard clinical assessment. Diagnostic sensitivity of standard clinical assessment was very low, and diagnostic specificity was slightly higher though still poor. Diagnostic concordance between the MINI and standard clinical assessment was very poor.
Overall diagnostic concordance between clinical ICD-10 diagnosis and MINI†
When the specific diagnostic groups are compared in Table 2, it is apparent that the highest rate of agreement between the assessment measures was on mood disorders, with a high kappa value. Diagnostic sensitivity and specificity were also satisfactorily high on standard clinical assessment in comparison to the MINI for mood disorders. However, the other specific diagnostic groups had very poor rates of agreement between the measures, particularly substance use and neurotic disorders, which had extremely low kappa values.
Diagnostic concordance between clinical ICD-10 diagnosis and MINI† for specific ICD-10 categories
Personality disorder diagnosis
Diagnostic concordance between clinical ICD-10 diagnosis and IPDE diagnosis is shown in Table 3. Diagnostic sensitivity and specificity of standard clinical assessment on the presence or absence of any personality disorder was high. There was an excellent rate of agreement between standard clinical assessment and the IPDE examining the presence or absence of a personality disorder with a kappa value of 0.96.
Overall diagnostic concordance between clinical ICD-10 diagnosis and IPDE †
The diagnostic concordance between measures when compared for each personality disorder cluster as seen in Table 4 however, was not high. Concordance was poorest for cluster A personality disorders, with a very low kappa value and poor sensitivity of standard clinical assessment.
Diagnostic concordance between clinical ICD-10 diagnosis and IPDE † for personality disorder cluster categories
Diagnostic concordance was also poor for cluster B personality disorder diagnoses, with a low kappa value. However, diagnostic sensitivity and specificity of standard clinical assessment for cluster B personality disorders was excellent. Diagnostic concordance between measures for cluster C personality disorders was better than for cluster A and B; however, still relatively low (kappa = 0.66).
Although diagnostic numbers were too low to allow calculation of levels of diagnostic sensitivity and specificity between measures for all specific personality disorder diagnoses, those that were able to have kappa coefficients calculated are shown in Table 5.
Number of patients identified by clinical interview and IPDE ‡ with each ICD-10 category of personality disorder and level of agreement
Table 5 shows a very low rate of agreement between measures on the diagnoses of schizoid and emotionally unstable personality disorder, borderline type (BPD). However, agreement between measures on the diagnosis of dependent personality disorder was relatively good. Given the lowest rate of agreement was on the diagnosis of BPD, and that standard clinical assessment diagnosed this disorder far more frequently (20 diagnoses) than other personality disorders, it was examined further. It is interesting to note as seen in Table 5, that on the IPDE there were a large number of patients who fell one criteria short of meeting diagnosis for BPD. To further examine why this discrepancy may be occurring between the measures, the subgroup of patients receiving a ‘probable’ BPD diagnosis on the IPDE was examined on each specific criteria for diagnosis. It was found that the majority of these patients met certain diagnostic criteria, yet few met other criteria. For example, 93% met criteria for recurrent threats or acts of selfharm, yet only 7% met criteria for disturbances in self-image, aims or internal preferences.
Discussion
The results have shown that overall there was a poor rate of concordance between standard assessment and the structured interviews. There were also poor rates of diagnostic sensitivity with standard clinical assessment. In comparison to the diagnoses that were made through standard clinical assessment, the use of the structured interviews resulted in a more comprehensive diagnostic picture.
If the results of symptom disorders are considered first, it is evident that for the diagnosis of the presence or absence of any symptom disorder, standard clinical assessment had very low diagnostic sensitivity and specificity. There was also very poor agreement between standard clinical assessment and the MINI. These results are due to there being many more diagnoses made on the MINI than by standard assessment.
It was evident that while standard clinical assessment was good at diagnosing mood disorders as seen by the high diagnostic sensitivity, it was poor at diagnosing other disorders. This is particularly true of the neurotic and substance use disorders, where standard assessment showed diagnostic specificity of 100%. Given that diagnostic specificity was the percentage of all negative diagnoses on standard clinical assessment also made by the MINI, then this finding is due to an under-reporting and missing of substance abuse and neurotic disorders by standard clinical assessment. This is supported by the average number of diagnoses being much lower on standard assessment than by the MINI. Thus, standard assessment was good at diagnosing mood disorders such as major depression, but very poor at reporting any other disorders such as anxiety disorders or alcohol abuse.
However, one alternate explanation of these findings also worthy of noting is that while clinicians using clinical interviews were not providing certain diagnoses such as anxiety and substance use disorders, this also means that their false-positive rate was extremely low. Consequently the clinicians could not be criticised for falsely making a diagnosis when it was not present. This is important to note because one assumption of the study is that the structured tools are the ‘gold standard’ for diagnosis, and thus this alternate explanation for clinical interview results also needs to be considered.
In examining the diagnosis of personality disorders, standard clinical assessment was very good at stating whether a personality disorder overall was present or absent, with an excellent rate of agreement with the IPDE and good diagnostic sensitivity. However, it was obvious from the examination of personality disorder clusters that standard clinical assessment was much poorer at making distinctions between specific personality disorder categories. This was seen by the very low rates of concordance between the measures, and low rates of diagnostic sensitivity, especially for cluster A and C disorders. As seen with symptom disorder diagnosis, diagnostic specificity for these clusters was 100%; however, this again indicated that standard clinical assessment was missing diagnoses. For example, standard clinical assessment made no diagnoses of paranoid or anankastic personality disorder, yet these made up nine diagnoses on the IPDE.
Interestingly, diagnostic sensitivity of standard clinical assessment was high for cluster B personality disorders, being 100%. Standard clinical assessment made 20 diagnoses of BPD, compared to only five on the IPDE. There was however, a large number of patients who were one criteria short of meeting BPD on the IPDE. It was found that most of these patients met criteria for recurrent selfharm but not other criteria. These results suggest an over-diagnosis of BPD through standard clinical assessment. Thus, standard clinical assessment reported BPD but few other specific personality disorders.
These results highlight one of the major problems with clinical interviews where clinicians can fall into the trap of focusing on certain salient criteria such as recurrent self-harm in BPD. Clearly, the presence of repetitive self-harm is not sufficient criteria for a diagnosis of BPD. Indeed, it may be the case that the individual meets criteria for other personality disorders, as seen in these results where the largest diagnostic group made by the IPDE was anxious personality disorder. It has been shown in other studies that many BPD patients have clinically significant traits or meet full criteria for other personality disorders such as avoidant and dependent [13].
Clinical interviews are well recognized as being highly susceptible to idiosyncrasies, inaccurate assumptions, or gender, cultural or ethnic biases [14]. For example, there are more females than males diagnosed with borderline, dependent, and histrionic personality disorders, and this over-diagnosis may be due to sex bias [15]. These biases in clinical interviews are often due to heuristics that the interviewer uses. Heuristics are shortcuts that individuals use to reduce complex problem-solving tasks through the use of simple rules rather than evaluating all aspects of a situation [16]. In this study, those using clinical interviews may have been subject to what is known as the availability bias [16], where certain information becomes more readily available cognitively because it stands out more. The self-harm behaviour across most patients is certainly a salient piece of information, and it may have served to increase the availability and bias-seeking of information towards BPD criteria rather than other personality disorder categories with less salient criteria, such as avoidance of social situations.
The current findings indicate that using standard clinical assessment results most often in the diagnosis of depression and BPD, but misses many other important diagnoses such as alcohol and anxiety disorders as well as other personality disorders such as anxious personality disorder. One criticism that could be made of this conclusion from the results is that clinicians may have been aware of the complex psychopathology of their patients, but chose only to record the principal diagnosis on the referral form. This may have been the case but it is important to note as the ICD-10 has stated: ‘Modern clinical practice has moved away from the principle of selecting only one diagnosis from a hierarchically ordered list of disorders for a single patient’ [17]. Rather than choosing one diagnosis, clinicians could potentially utilize their time more effectively by providing all diagnoses rather than spending time forcing possibly misleading selections between diagnoses in patients who present a very complex diagnostic picture.
The other implication of choosing to record only one principal diagnosis for a patient is that it might affect subsequent treatment choices. This was certainly the case clinically within the personality disorder treatment programme that the study was based in, where referral diagnoses were used to make initial treatment choices. It was recognized over time however, that these choices were not always clinically sound on the basis of the referral diagnosis as other important diagnoses were missed, and thus the current study was designed to 490 DIAGNOSTIC CONCORDANCE OF ICD-10 PERSONALITY AND COMORBID DISORDERS improve diagnostic accuracy so that all diagnoses were recognized in the initial treatment planning stage. Consequently, all diagnoses warrant being recognized and treated, and as such need to be assessed thoroughly through reliable assessment tools like the MINI and IPDE.
There are three main limitations of the study. First, the low sample size meant that each personality disorder could not be examined separately to determine agreement on each specific diagnosis. The low sample size also meant that personality disorders needed to be grouped into clusters to enable calculation of statistics, and although the idea of cluster structures is used clinically the research evidence for this is variable. Finally, the sample was one of current patients in a personality disorders treatment programme, and not a random sample from a psychiatric clinic, so may not be representative of all patients with personality disorders.
Despite these limitations, the results have shown that the use of structured interviews was beneficial in providing a more comprehensive diagnostic picture in comparison to what would have otherwise been available through standard clinical assessment. This has highlighted the important clinical implications of this study in showing the advantage of using structured assessment tools for diagnosis, as it forces clinicians to examine closely all diagnostic criteria and assess these in a consistent and reliable way. The accuracy of diagnosis also has important implications for the effective treatment of individuals with a personality disorder. One of the defining features of personality disorders is that they are very difficult to treat. Treatment is always long-term, and patients typically show resistance to standard treatments for other disorders such as major depression. If all diagnoses that a patient meets are recognized, then treatment can be planned to target these. It is known that certain aspects of current treatments are particularly useful for certain disorders. For example, cognitive behaviour therapy is efficacious in the treatment of many neurotic disorders [18], and dialectical behaviour therapy [19] has been shown to be effective in the treatment of BPD [5]. Without accurate diagnosis and consequent treatment planning for all disorders, treatment may be less effective.
Of course, it must be acknowledged that diagnosis is only one of the aspects of case formulation in treatment planning. Traditional behavioural approaches suggest that formulation and treatment planning should not be based on diagnosis but instead on a functional analysis of behaviour that consists of predisposing, precipitating and in particular the maintaining factors for the individuals' problems [20]. However, current approaches suggest that including diagnosis in the overall formulation of the individuals' problems can be useful as a way of being a short-cut, and because there are well recognized efficacious treatments for certain disorders [21] as outlined previously. Despite acknowledging other approaches such as behavioural formulation, currently diagnosis is still used in psychiatry as the main method through which clinicians communicate their understanding of patients' problems in a summarized manner. Consequently, making an accurate diagnosis is one method that can help to enrich overall formulation to result in effective treatment planning.
Not only is accurate diagnosis crucial for effective treatment planning, but it is also important to understand the prevalence of personality disorders. Given the poor reliability of clinical interviews, it is important that accurate diagnostic tools such as the IPDE are utilized in epidemiological studies to understand the true prevalence of personality disorders. This is very important for further developing our understanding of the aetiology of personality disorders.
One criticism that could be made of this study is that too much emphasis is placed on distinguishing between and placing patients into discrete personality disorder categories. It has long been argued in the literature that the distinction between discrete personality disorder categories is flawed, with reviews finding little empirical data to support categorical approaches [22]. A more valid approach to diagnosis in the future would be one that provides both categorical and dimensional personality disorder diagnoses. Several authors have proposed models for dimensional approaches to personality disorder diagnosis [3, 23, 24]. Probably the best approach is one that rates the extent to which a patient displays each personality disorder. In this approach, dimensional ratings could be made based on the number of criteria that a patient met for the particular disorder. For example, based on a DSM-IV diagnosis for BPD this could range from: absent (0 criteria met); traits (1–3 criteria met); subthreshold (4 criteria met); threshold (5–6 criteria met); moderate (7–8 criteria met); and extreme (9 criteria met) [14]. This would mean that each patient could be given a measure of severity for each diagnosis; for example a patient may be ‘moderate avoidant with borderline traits’.
This type of system could be easily used to supplement current categorical approaches with assessment tools such as the IPDE, which provides dimensional ratings for each personality disorder category. This would aid the ease of communication of the very complex personality disorder cases that clinicians treat, and a dimensional system would also encourage the thorough assessment of all personality disorders. Dimensional approaches may also be able to offer advantages for treatment planning, and tailoring of empirically supported treatments for the particular disorders that the patient may meet at a trait through to a moderate or severe level.
In conclusion, this study has highlighted the importance and benefit of using reliable assessment tools for diagnosis. It challenges clinicians to review their current diagnostic practices and to increase their accuracy and comprehensiveness of diagnosis. This is a step towards important future aims for personality disorder assessment to become ‘objective, systematic, comprehensible, reproducible and consistent’ [14]. It is only with such reliable assessment and diagnosis that clinicians will be able to further develop their understanding and effective treatment of the complex clinical picture of personality disorders.
Acknowledgements
We acknowledge the contributions of Aleksander Janca and Clare Rees. The Health Department of Western Australia funded the project.
