Abstract
The purpose of this article is to provide a description and discussion of the evidence-based assessment of personality disorder. Considered herein is the assessment of the Section II personality disorders included within the fifth edition of the American Psychiatric Association’s (APA) Diagnostic and Statistical Manual of Mental Disorders (5th ed., text rev.; DSM-5-TR), within Section III of DSM-5-TR, and within the 11th edition of the World Health Organization’s International Classification of Diseases (WHO). The recommendation for an evidence-based assessment is for a multimethod approach: first administer a self-report inventory to alert the clinician to maladaptive personality functioning that might not have otherwise been anticipated, followed by a semi-structured interview to verify the personality disorder’s presence. The validity of this multimethod strategy can be improved further by considering the impact of other disorders on the assessment, documenting temporal stability, and establishing a compelling, empirical basis for cutoff points.
There is substantial literature on the development of evidence-based diagnosis and assessment (e.g., Bornstein, 2017; Hollon, 2017; Hunsley & Mash, 2020; Jensen-Doss & Hawley, 2010; Youngstrom et al., 2017). The purpose of this article is to provide a description and discussion of the evidence-based assessment of personality disorders. Included within will be the assessment of the Section II personality disorders included within fifth edition of the American Psychiatric Association’s (APA) Diagnostic and Statistical Manual of Mental Disorders (5th ed., text rev.; DSM-5-TR; American Psychiatric Association [APA], 2022), within Section III of DSM-5-TR, and within the 11th edition of the World Health Organization’s International Classification of Diseases (ICD-11; World Health Organization [WHO], 2019).
Multimethod Assessment
Multiple efforts have been made to improve the validity of personality disorder assessment. One notable achievement was the development of specific and explicit criterion sets (Kendler et al., 2010). Prior to their presence, personality disorder diagnosis was notoriously unreliable, with clinicians providing their subjective judgments in matching what they knew about a patient (on the basis of unstructured interviews) to a narrative, paragraph description of a prototypic case. Clinicians were free to focus on any particular part of the narrative when formulating a diagnosis.
Clinicians, however, do not closely follow the criterion sets in clinical practice (Hollon, 2017). They tend to diagnose personality disorders hierarchically, focusing on a subset of criteria and failing to assess additional symptoms once they reach a conclusion that a particular personality disorder is present (Miller et al., 2012). To ensure that each diagnostic criterion is assessed, researchers developed semi-structured interviews that require a set of explicit questions for each diagnostic criterion. Clinicians, however, do not administer semi-structured interviews in part because of the amount of time they require (Jensen-Doss & Hawley, 2010) (as well as concerns regarding potential undermining of clinical rapport). Clinicians cannot devote hours to assess the personality disorders. A proposal for Diagnostic and Statistical Manual of Mental Disorders (4th ed.; DSM-IV; APA, 1994) was to simplify the criterion sets to make them more feasible for clinical practice. However, in the end, this was done for only one personality disorder: antisocial, particularly with respect to its childhood variant, conduct disorder.
Another proposal for DSM-IV was to provide the personality disorder criteria in a descending order of diagnostic value (Widiger et al., 1995). Because clinicians confine their attention to just a few of the diagnostic criteria, it would be helpful to inform them as to the most informative criteria (Youngstrom et al., 2017). However, no mention of this ordering was provided in DSM-IV for a number of reasons. There was insufficient empirical research to rank the newly added diagnostic criteria, and the authors for two of the personality disorders (borderline and schizotypal) declined the proposal.
A proposal for Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5; APA, 2013) was to include structured assessments (self-report inventory or semi-structured interview) within the criterion sets: At present, results of psychological testing are not included in DSM-IV diagnostic criteria, with the exception of IQ testing and academic skills . . . [and] this exception points the way for research that could lead to incorporation of psychological test results as diagnostic criteria for other disorders. (Rounsaville et al., 2002, p. 24)
This proposal, however, was not implemented.
Meyer et al. (2001) distinguish between diagnosis (the process of assigning one or more formal labels to describe a patient’s primary psychopathology) and assessment (the process of administering a battery of psychological tests to understand a person’s functioning). Meyer et al. (2001) and Bornstein (2017) recommend that the optimal approach to assessment is through multiple methods, as different assessment methods can provide uniquely informative input. Indeed, Widiger and Samuel (2005) suggested that assessment in clinical practice include both a self-report inventory and a semi-structured interview. One first administers a self-report inventory to alert oneself to the potential presence of particular personality disorders followed by a semi-structured interview to verify their presence. One need not then administer the entire interview as one can focus just on the two to three personality disorders that were elevated on the self-report inventory.
One potential concern with regard to this multimethod approach is that self-report inventories and semi-structured interviews both rely heavily on self-report and are therefore not entirely distinct methods of assessment. However, semi-structured interviews can include considerable input and interpretation by the interviewer. In any case, there are no other methodologies for the assessment of personality disorder that have the empirical support comparable to self-report inventories and semi-structured interviews.
Semi-Structured Interviews and Self-Report Inventories
DSM-5-TR Section II Personality Disorders
There are currently five semi-structured interviews that assess the DSM-5-TR Section II personality disorders: (a) Diagnostic Interview for Personality Disorders (DIPD; Zanarini et al., 1987), (b) International Personality Disorder Examination (IPDE; Loranger, 1999), (c) Personality Disorder Interview-IV (PDI-IV; Widiger et al., 1995), (d) Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II; First & Gibbon, 2004), and (e) Structured Interview for DSM-IV Personality Disorders (SIDP-IV; Pfohl et al., 1997).
Each of the semi-structured interviews has advantages and disadvantages relative to one another (Miller et al., 2012). The IPDE includes a brief screening questionnaire that can be used to avoid the need to administer the entire interview (Loranger, 1999), consistent with the multimethod approach recommended in this article. The SIDP-IV, SCID-II, and IPDE have more empirical support than either the DIPD or the PDI-IV. The PDI-IV has been used in the least number of studies. The manuals for the SIDP-IV and SCID-II are relatively limited in the amount of information provided for scoring, whereas the manual for the PDI-IV is the most thorough and detailed, providing the history, rationale, and major assessment issues for each of the DSM-5-TR Section II personality disorder diagnostic criteria.
There are many alternative self-report inventories to choose from. The most popular inventory in clinical practice is the Millon Clinical Multiaxial Inventory-IV (MCMI-IV; Millon et al., 2015). Its popularity is due in part to the presence of Theodore Millon on the Diagnostic and Statistical Manual of Mental Disorders (3rd ed.; DSM-III; APA, 1980) personality disorder work group and the release of the first edition coordinated with the publication of DSM-III. However, the DSM-III criterion sets were not well coordinated with the MCMI scales. The MCMI-IV is also a commercially published measure that would be expensive to include in research.
The most popular self-report measure in research is the freely available Personality Diagnostic Questionnaire-4 (PDQ-4; Hyler et al., 1988). However, its popularity is also due to the fact that it is a relatively brief measure, including only one to two items for each diagnostic criterion, which may not be providing sufficient fidelity. There are many additional alternatives. The Wisconsin Personality Disorder Inventory-IV (WISPI; Klein et al., 1993) assesses the personality disorders from the perspective of psychodynamic theory. The Five Factor Model Personality Disorder scales (FFMPD; Widiger et al., 2012) provide an assessment from the perspective of the Five Factor Model of general personality structure. There are also relatively brief screening measures for individual personality disorders (such as the McLean Screening Instrument for Borderline Personality Disorder).
Another approach is to first administer a broad-band self-report measure that includes the assessment of other disorders (e.g., mood, anxiety), along with personality disorders, such as the Minnesota Multiphasic Personality Inventory-2 (MMPI-2) or the Psychological Assessment Inventory (PAI). The MMPI-2 instruments can be scored for the DSM-5-TR Section II personality disorders (e.g., Somwaru & Ben-Porath, 1995), although the PAI is confined to the borderline and antisocial personality disorders. A potential advantage of these broad-based measures is that many are already currently being used in clinical practice; a disadvantage is their substantial length.
DSM-5-TR Section III Personality Disorders
The above measures assess the DSM-5-TR Section II personality disorders, which are the officially recognized personality disorders of the APA and the personality disorders of primary interest to most clinicians. However, DSM-5-TR also includes Section III for “emerging measures and models” (APA, 2022, p. 837), which provides an Alternative Model for Personality Disorder (AMPD) consisting of a Level of Personality Functioning Scale (LPFS) and a five-domain dimensional trait model. The development of the AMPD is in recognition of the inadequate empirical support for the DSM-IV categorical syndromes (Clark, 2007; Widiger & Trull, 2007). Indeed, the future of personality disorder assessment may be concerned with the Section III personality disorders.
DSM-5-TR Section III AMPD retained six of the DSM-IV personality disorders (APA, 2022). The proposal is then conservative in that clinicians would still be assessing for six of the DSM-IV personality disorders. The innovation of Section III is arguably confined to replacing the specific and explicit criterion sets of DSM-IV with the components of the LPFS and maladaptive traits. Of interest for future research is whether the Section III criterion sets are assessed as reliably as the specific and explicit criterion sets of Section II. An additional concern is whether the LPFS and traits are sufficiently distinct from one another to warrant their joint inclusion (Widiger & Hines, 2022).
The dimensional trait model consists of five broad domains of negative affectivity, detachment, psychoticism, antagonism, and disinhibition, along with 25 underlying facets. The principal measure for the assessment of the trait model is provided by the Personality Inventory for DSM-5 (PID-5; Krueger et al., 2012). Additional self-report measures for the assessment of maladaptive trait models include the Computerized Adaptive Test-Personality Disorder Static Form (CAT-PD-SF; Wright & Simms, 2014) and the FFMPD scales which can be scored for the maladaptive trait domains of the FFM (Widiger et al., 2012). The MMPI-2 and PAI may also be keyed to include assessments of the AMPD trait model.
The LPFS was derived from psychodynamic theory that central to personality disorders are distortions in the sense of self and interpersonal relatedness (Sharp & Wall, 2021). The principal measure for the LPFS is provided by the Level of Personality Functioning Scale-Self Report (LPFS-SR; Morey, 2017). Additional measures include the DSM-5 Levels of Personality Functioning Questionnaire (DLOPFQ; Huprich et al., in press), the Level of Personality Functioning Scale-Brief Form (LPFS-BF; Hutsebaut et al., 2016), and the Anderson and Sellbom (2018) Criterion A scales. Deficits in the sense of self and interpersonal relatedness are also assessed by the General Assessment of Personality Disorders (GAPD; Livesley, 2006) and the Severity Indices of Personality Problems (SIPP-118; Verheul et al., 2008). Interview measures of the LPFS include the Structured Clinical Interview for the DSM-5 Alternative Model for Personality Disorders Module I (SCID-5-AMPD-I; Bender et al., 2018) and the Semi-Structured Interview for Personality Functioning (STiP-5.1; Hutsebaut et al., 2017).
ICD-11 Personality Disorders
A more radical shift than DSM-5-TR Section III is provided in ICD-11 (WHO, 2019). ICD-11 includes only one of the prior syndromes (borderline), placing more emphasis on the dimensional trait model for clinical description, consisting of the domains of negative affectivity, detachment, dissociality, disinhibition, and anankastia. The ICD-II domains of negative affectivity, detachment, dissociality, and disinhibition align closely with (albeit are not entirely equivalent to) the DSM-5-TR domains of negative affectivity, detachment, antagonism, and disinhibition, respectively.
Bach et al. (2017) identified 16 of the 25 PID-5 scales that aligned conceptually with the domains of the ICD-11 trait model. A limitation of the Bach et al. (2017) measure, however, is that it cannot be used to study empirically the convergence of the DSM-5-TR and ICD-11 trait models given its use of the PID-5 items (Bach et al. (2017), excluding six PID-5 scales due to presumed differences in how the respective domains were defined in ICD-11. Independent measures of the ICD-11 trait model are provided by the Personality Inventory for ICD-11 (PiCD; Oltmanns & Widiger, 2018), the Preliminary Scales for ICD-11 (Clark et al., in press), and the Personality Assessment Questionnaire for ICD-11 (PAQ-11; Kim et al., 2021). There is not yet an interview for the assessment of the ICD-11 trait model, but Tyrer et al. (2019) recommend using the Personality Assessment Form which was developed to assess the trait model of Tyrer from which the ICD-11 trait model was derived.
Validation of Assessment Instruments
There is an extensive research concerning the convergent validity among the many Section II self-report measures. The convergent validity typically yields large effect size relationships, with one exception provided by the assessment of obsessive-compulsive personality disorder (OCPD) by the MCMI-IV, which often obtains a negative correlation with other self-report measures (Miller et al., 2012). This finding is largely due to the inclusion of items assessing for adaptive conscientiousness within the MCMI-IV scale.
Thirty-one studies have reported the convergent validity of Section II self-report measures with semi-structured interviews (Miller et al., 2012). The results have typically yielded moderate effect sizes, lower than obtained when the method is confined to self-report inventories, with a median value ranging from .30 for histrionic to .56 for avoidant. The one exception occurred again for the MCMI-IV assessment of OCPD, which routinely obtained weak to insignificant convergence with the structured interviews.
Meta-analyses of PID-5 research have indicated strong coverage of the DSM-5-TR Section II personality disorders with the Section III trait model, with the exception of OCPD (Rojas & Widiger, 2017; Watters et al., 2019). The initial version of the trait model had included a domain of compulsivity. Its eventual deletion is the likely explanation for the inadequate PID-5 coverage of OCPD. Both the CAT-PD-SF and FFMPD scales have demonstrated strong convergence with respective PID-5 scales (Crego & Widiger, 2016; Wright & Simms, 2014). The FFMPD provides better coverage of OCPD (Crego et al., 2015).
Concerns have also been raised with respect to the discriminant validity of the PID-5 assessment of the trait domains (Crego et al., 2015), which is somewhat surprising given its empirical construction via factor analysis. However, the factor analytic construction of the PID-5 did not include a consideration of the discriminant validity among the domain scales.
There is strong convergent validity among measures of the LPFS. However, a common finding is a lack of discriminant validity for the assessment of the four components (Sleep et al., 2019). Morey (2017), for example, reported no discernible differences in the relationship of the four subscales of the LPFS-SR with a variety of criterion measures. McCabe et al. (in press) reported that the subscales of four alternative measures of the LPFS correlated more highly with the other subscales within the same measure than with the measures of the same construct within a different measure (e.g., LPFS-SR Identity correlated more highly with the LPFS-SR Intimacy, Empathy, and Self-Directedness than with the identity scales from three other measures of the LPFS). Morey (2017), however, suggests that a lack of discriminant validity supports the validity of the LPFS-SR because the four subscales “are all considered to be indicators of a single, global dimension of personality dysfunction” (Morey, 2017, p. 1306).
A major distinction between the PiCD and the PID-5 assessment of the ICD-11 is that the PiCD Anankastia and Disinhibition scales align on opposite poles of the same factor, consistent with how the trait model was defined by Mulder et al. (2016). PAQ-II Anankastia (Kim et al., 2021) likewise correlates positively with FFM conscientiousness and negatively with Disinhibition albeit the correlation was weak. When the DSM-5 trait model included compulsivity, it was also placed opposite to disinhibition (Skodol, 2012). The failure to obtain the bipolarity with the PID-5 probably reflects the relatively weak assessment of anankastia by the two PID-5 Rigid Perfectionism and Perseveration scales (Widiger & Crego, 2019).
The reproduction of the nomological network that should occur for a particular personality disorder provides a compelling validation of a respective measure. Miller and Lynam (2012) conducted a meta-analysis of 49 studies concerning the nomological network of the Personality Psychopathy Inventory (PPI; Lilienfeld & Widows, 2005). They reported a weak to poor relationship of the Fearless-Dominance factor of the PPI with central criterion variables, including (for example) criminal behavior. Similar results though would be obtained for the Emotional Stability factor of the Elemental Psychopathy Assessment (Lynam et al., 2011).
Issues for Further Research
There are a number of issues with regard to the assessment of personality disorder that should be addressed in further research. Three issues considered within this article are the impact of other disorders, temporal stability, and the development of cutoff points. Each will be discussed briefly in turn.
Impact of Other Disorders
There is a substantial body of research to indicate that the presence of a mood, psychotic, and other disorders can inflate artifactually the scores on personality disorder scales (Miller et al., 2012). This can result in quite anomalous results. For example, some personality disorder scales, such as narcissistic and histrionic, will increase over time secondary to the treatment of a mood or anxiety disorder due to the inclusion of adaptive self-confidence, assertation, self-esteem, and gregariousness items. Semi-structured interviews can be more resilient to the distortion in self-image secondary to these other disorders, but there are studies that have reported comparable increases in personality disorder scales over the course of treatment of anxiety and mood disorders (e.g., Loranger et al., 1991). The apparent decrease in personality disorder scores during the course of treatment for another disorder (or even personality disorder treatment) may also be comparably artifactual. In sum, it is advisable not to administer a self-report inventory at the beginning of a treatment for a mood or psychotic disorder, but it is at the beginning of treatment that most personality disorder measures are administered.
Temporal Stability
Personality disorders have an onset in late childhood or early adulthood, typically predating the onset of other disorders, typically decreasing over time through adulthood (Cooper et al., 2014). However, self-report inventories typically ignore the requirement to document the temporal stability since late childhood and, even if there is some attempt to do so, it is unlikely that the respondent adheres to this requirement. Semi-structured interviews make a more concerted and explicit effort to assess for the temporal stability, but they vary considerably in the extent and effort.
The least effective effort is provided by the DIPD (Zanarini et al., 1987) whose assessment is confined to the last 2 years. The DIPD was used in the widely published Collaborative Longitudinal Personality Disorders Study (CLPS; Gunderson et al., 2003). One of the more intriguing findings was the extent to which persons failed to maintain personality disorder symptomatology. For example, 23 of 160 persons (14%) diagnosed with borderline personality disoder (BPD) at the study’s baseline assessment met the criteria for two or fewer of the nine diagnostic criteria 6 months later (Gunderson et al., 2003). It is surprising that so many persons met the diagnostic criteria for BPD since late childhood, continuing to manifest these symptoms throughout their adult life, suddenly changing soon after the onset of the study. Few, if any, were in treatment for their personality disorder. The purportedly valid diagnoses include one person whose original symptoms were determined to be secondary to the use of a stimulant for weight reduction: the most dramatic improvement following a treatment intervention occurred when a subject discontinued a psychostimulant she had used the year prior to baseline for purposes of weight loss . . . Discontinuation was followed by a dramatic reduction of her depression, panic, abandonment fears, and self-destructiveness. (Gunderson et al., 2003, p. 116)
Much of the other changes reflected the fact that during the 2 years of the DIPD assessment, participants were within highly stressful but temporary life circumstances.
Cutoff Points
The semi-structured interviews provide an explicit assessment of the DSM-5-TR Section II personality disorder diagnostic criteria. As such, they have a straightforward basis for diagnosis, adhering to the diagnostic thresholds provided in DSM-5-TR. However, these diagnostic thresholds are themselves arbitrary, based on rational judgment rather than empirical research. The thresholds are typically one more than half of the diagnostic criteria.
Some self-report inventories (e.g., FFMPD scales) have no cutoff points, making them problematic for clinical use. Those inventories whose items assess for the diagnostic criteria (e.g., PDQ-4) can be scored in reference to the DSM-5-TR diagnostic thresholds, but their cutoff points will have the same arbitrary limitation of the semi-structured interviews (as well as suffering from the additional problem that persons tend to endorse more items on a self-report inventory than during the course of a semi-structured interview). Other self-report inventories, such as the MMPI-2, base their cutoff points on the basis of a deviation from a normative distribution. However, this option is sorely problematic as it presumes an equal prevalence rate for each personality disorder. The MCMI-IV cutoff points are coordinated with the prevalence rates that occurred in its derivation study but are not revised when the prevalence rate is different from the derivation sample. This has resulted in a substantial overdiagnosis in less dysfunctional settings, such as university counseling centers.
The optimal basis for the placement of a cutoff point should perhaps be that point on the scale that indicates the presence of a clinically significant impairment in social or occupational functioning or personal distress, consistent with the definition of a personality disorder in DSM-5-TR Section II. However, even during the course of construction of the DSM criterion sets, no self-report inventory or semi-structured interview has included a derivation of a cutoff point on this basis. The failure to do so likely reflects the difficulty in compiling an authoritative set of social and occupational impairments and then determining at what level is indicative of a clinically significant impairment. The context of an individual and their circumstances should always be considered in combination with self-report or interview-rated data—test scores are only one piece of the diagnostic puzzle.
Conclusions
Recommended herein for an evidence-based assessment of personality disorder in clinical practice is for the clinician to first administer a self-report inventory to alert the clinician to maladaptive personality functioning that might not have otherwise been anticipated, followed by a semi-structured interview to assess systematically the respective diagnostic criteria of the personality disorder(s) that were elevated on the self-report inventory. The validity of this strategy, however, can be improved by addressing further issues, including the impact of other disorders on the assessment, documenting temporal stability, and establishing a compelling, empirical basis for cutoff points.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
