Abstract
Diagnostic issues have become increasingly critical for mental health services and research. Revisions of the American Psychiatric Association's [1] Diagnostic and Statistical Manual (the DSM) and the World Health Organization's [2] International Classification of Disease (ICD) have brought new emphases on explicit diagnostic criteria. Increasing concerns for accountability in providing and evaluating services have mandated more explicit documentation of people's problems and of the appropriateness of interventions. Certain diagnoses, such as attention deficit hyperactivity disorder (ADHD) have become increasingly common. They have also spawned enormous controversy and have spread well beyond their original realms, as exemplified by the widespread application of ADHD diagnoses to adults. Whereas diagnosis was previously of only passing interest to mental health professionals and had little impact on their work, it is now one of the most publicized mental health topics.
Meanings of diagnosis
Confusion and controversy may arise from the multiple meanings of diagnosis, which are outlined in the following sections.
Diagnosis in the narrow sense: formal diagnosis
In its narrow meaning, diagnosis is ‘the medical term for classification’, according to Samuel Guze, a leading psychiatric diagnostician [3], p.53]. This is the meaning of diagnosis that is represented by the categories of nosologies such as the DSM and ICD. A diagnosis in the sense of assignment to nosological categories is a formal diagnosis.
Diagnosis in the broad sense: diagnostic formulations
In addition to referring to classification, diagnosis has a broader meaning which can be defined as ‘a statement or conclusion concerning the nature or cause of some phenomenon’ [4], p.313]. This is the meaning of diagnosis that is represented by diagnostic formulations, which are comprehensive integrations of data on which to base plans for services. Diagnostic formulations typically employ more diverse data about children, their families and other considerations than do formal diagnoses.
Diagnosis as data gathering: diagnostic processes
Formal diagnoses and diagnostic formulations depend on data that are gathered about cases. The gathering of data is also known as diagnosis, in the sense of diagnostic processes. Neither diagnostic classifications nor diagnostic formulations can be any better than the data on which they are based.
Distinguishing among the meanings of diagnosis
To avoid confusion among the multiple meanings of diagnosis, it is helpful to use other terms. In place of ‘diagnostic processes’, I will use assessment to refer to gathering data with which to identify the distinguishing features of each case. I will use the term taxonomy in reference to grouping cases according to their distinguishing features. And, I will use case formulation in reference to comprehensive summaries of cases.
Relations between assessment and taxonomy
If we view assessment as the identification of the distinguishing features of cases and taxonomy as the grouping of cases according to their distinguishing features, it is clear that assessment and taxonomy are interdependent in multiple ways. Assessment data are needed (i) to construct taxonomies that capture the important distinguishing features of individual cases, and (ii) to assign cases to the taxonomies that have been constructed.
Assessment is most meaningful when it is linked to taxonomies that (i) guide the selection of features to be assessed, and (ii) provide conceptual foci for assessment and its products.
To illustrate variations in how assessment and taxonomy are related to mental health services and research, I will outline two approaches. I will also outline ways in which the two approaches can jointly contribute to services and research. The first approach is embodied in the DSM paradigm, whereas the second is embodied in the empirically based paradigm for assessment and taxonomy of psychopathology.
The DSM paradigm
The DSM and other nosologies for psychopathology are constructed by committees of experts who pool their experience to formulate diagnostic concepts as a basis for diagnostic categories. The diagnostic categories are selected and refined through a process of negotiation and review. Diagnostic criteria are then generated for determining what features are required for cases to qualify for each diagnosis. DSM-III [1] introduced explicit diagnostic criteria and decision rules for determining who qualifies for each diagnosis. Field trials were used to evaluate some of the draft DSM-III-R and DSM-IV criteria for childhood disorders [1].
The ‘top-down’ approach
The DSM approach can be described as working from the ‘top down’, as illustrated in Figure 1. The top-down approach starts with diagnostic concepts as a basis for categories of disorders. Experts then select particular symptoms to define the disorders. Examples are ‘doesn't listen’ and ‘fidgets’ listed for ADHD in Figure 1. Uniform cut-off points for the number of required symptoms, as well as criteria for age of onset and duration, are the same for both genders and different ages.
The ‘top-down’ approach to assessment and taxonomy of psychopathology. (Copyright T.M. Achenbach. Reproduced with permission.)
As an example, for any individual to meet DSM-IV criteria for ADHD, six or more symptoms of inattention or hyperactivity-impulsivity must ‘have persisted for at least 6 months to a degree that is maladaptive’ [1]. The ADHD criteria also specify that ‘some hyperactive-impulsive or inattentive symptoms that caused impairment were present before age 7 years’, ‘some impairment from the symptoms is present in two or more settings (e.g. at school [or work] and at home)’, and there must be ‘clear evidence of clinically significant impairment in social, academic, or occupational functioning’ [1]. To qualify for a diagnosis of ADHD, a 4-year-old boy and a 17-year-old girl must both meet the same criteria. The inclusion of impairment in work settings and occupational functioning allows for diagnosing ADHD in adults, but the diagnostic criteria are otherwise the same for adults and children.
The DSM criteria and decision rules are quite explicit. However, the DSM does not specify assessment procedures for determining which symptoms are present and whether the symptoms cause impairment. Diagnosticians may therefore vary greatly in how they determine whether each symptom is present and whether it causes ‘clinically significant’ impairment in particular settings.
DSM assessment procedures
Respondent-based interviews
One approach to assessment of DSM criteria has been to develop structured interviews that ask respondents whether each symptom is present. These interviews are called ‘respondent-based’ because the respondents' affirmative and negative replies to questions about symptoms are the basis for determining whether diagnostic criteria are met. Adult respondent-based interviews, such as the Diagnostic Interview Schedule (DIS) [5], are administered primarily to the adults who are being assessed. Similar highly structured respondent-based interviews have been developed for children, including the Diagnostic Interview Schedule for Children (DISC) [6] and the Diagnostic Interview for Children and Adolescents (DICA) [7].
When such interviews were first tested, it was discovered that children's reports of their symptoms did not agree with clinical evaluations or with other data [8]. It was also discovered that children's reports had low test–retest reliability, especially owing to large declines in the number of symptoms reported from the first administration to the second administration of the same interview 1–2 weeks later [9]. This test–retest attenuation effect has also been found in adult interviews, but is typically smaller than in child interviews [10]. Another problem with the structured respondent-based interviews for children is that even normal children fail to understand large proportions of the questions, especially questions that concern the timing of symptoms [11].
Because it became clear that children's self-reports of symptoms do not provide an adequate basis for DSM diagnoses, structured interviews were developed to obtain parents' reports of their children's symptoms. It was then found that parents' reports of DSM symptoms often failed to agree with their children's reports [6, 8]. Low cross-informant agreement is not restricted to interviews, as it has been found in meta-analyses of many other assessment procedures as well [12]. However, low cross-informant agreement is especially challenging for assessment and taxonomy that depend on yes-or-no decision rules about the presence of each criterial feature, such as each symptom, the number of symptoms, age of onset, duration and impairment caused by symptoms.
Interviewer-based interviews
As an alternative to structured interviews that basically record the respondents' yes-or-no answers to diagnostic questions, an ‘interviewer-based’ protocol was developed in the UK for making DSM diagnoses [13]. Called the Child and Adolescent Psychiatric Assessment (CAPA), it resembles respondent-based structured interviews in using precisely specified questions to obtain reports of DSM symptoms. However, the CAPA is ‘interviewerbased’ in the sense that it requires the interviewer to ensure that subjects (i) understand the questions; (ii) provide clear information relevant to each symptom; and (iii) have each symptom at a level of severity specified in a 300-page glossary. The CAPA requires much more interviewer training and sophistication than ‘respondentbased’ interviews, such as the DISC, which tend to take respondents' answers at face value. However, like the DISC, the CAPA appears to be vulnerable to large test–retest attenuation effects and low agreement with data from other sources [14].
Rating forms for DSM symptoms
Another approach to assessing children for DSM diagnoses is to have raters fill out forms on which they make ratings of DSM symptoms. An example is the ADHD Rating Scale [15]. Parent and teacher versions request raters to score each of the 18 DSM ADHD symptoms on a four-point scale defined as: 0, never or rarely; 1, sometimes; 2, often; and 3, very often. These ratings are summed to yield a total score for ADHD. Cut-off points are provided on normative distributions of scores for boys and girls of different ages.
DSM rating scales can capture possible variations in the degree to which children manifest each symptom, as seen by relevant informants, such as parents and teachers. By providing age- and gender-based norms, they also take account of developmental and gender variations. However, no single cut-off point on the ADHD scale scores has been found to agree well with DSM diagnoses of ADHD [15].
In summary, the DSM paradigm starts with concepts of diagnostic categories and then defines criteria for identifying individuals who qualify for each category. Each criterial feature must be judged as present or absent. If all the requisite features are judged to be present, then the disorder is concluded to be present. Various approaches have been taken to standardizing assessment procedures for making DSM diagnoses. To be consistent with the DSM paradigm, the DSM-based assessment procedures are required to yield yes-or-no decisions about each criterial feature and diagnosis. This requirement makes it difficult to deal with methodological challenges such as test–retest attenuation and discrepancies among data from different sources, as well as developmental and gender variations in the base rates and clinical significance of symptoms.
The empirically based paradigm
The DSM paradigm represents the prevailing approach to official nosologies. Such nosologies must cover diverse conditions and must serve many masters. Because much remains to be learned and many different views are relevant, diagnostic categories and criteria inevitably involve numerous compromises. However, official nosologies such as the DSM tend to dictate thought and practice. When the diagnostic categories and criteria of such nosologies dominate mental health literature, training, research and billing, it may be difficult to view psychopathology from other perspectives. Nevertheless, to advance knowledge and services, we need to consider multiple approaches to assessing and conceptualizing psychopathology.
The empirically based paradigm has been developed to derive taxonomic constructs of psychopathology from assessment data on large samples of people. After taxonomic constructs have been derived in this way, they provide foci for assessment of new cases. The derivation of taxonomic constructs from assessment data and the use of such constructs to guide subsequent assessment promote a continuing interplay between assessment and taxonomy, as outlined in the following sections.
The ‘bottom-up’ approach
As illustrated in Figure 2, the empirically based paradigm starts with data on problems such as ‘can't concentrate’ and ‘can't sit still’. Many of these problems have counterparts among DSM criteria. However, rather than being selected to define predetermined diagnostic categories, the problem items are selected to span a broad spectrum of maladaptive functioning not restricted to predetermined categories.
The ‘bottom-up’ approach to assessment and taxonomy of psychopathology. (Copyright T.M. Achenbach. Reproduced with permission.)
Assessment instruments
The problem items are incorporated into assessment instruments completed by various informants on the basis of their own knowledge of the subjects' functioning. For ages 12–18 years, assessment instruments are available for completion by parents, teachers, daycare providers, classroom observers, clinical interviewers and psychological examiners, as well as for adolescents to complete about themselves. For ages 18–90, assessment instruments are available for obtaining self-reports and reports by significant others, such as family members, conjugal partners, and friends [15].
The assessment instruments request informants to rate problem items on scales such as 0, not true; 1, somewhat or sometimes true; and 2, very true or often true, based on a particular rating period, such as 2 months. Instruments that obtain ratings of specific samples of behaviour, such as forms completed by classroom observers, clinical interviewers and psychological examiners, are rated on four-point scales that assess very slight and ambiguous manifestations of problems, in addition to clear manifestations of problems, during the observation period.
Derivation of syndromes
To identify sets of problems that tend to occur together, ratings of large samples of individuals by different kinds of informants are subjected to multivariate statistical analyses. The sets of co-occurring problems thus identified are called syndromes in the descriptive sense of things that are found to go together. The syndromes derived from ratings by particular types of informants, such as parents, serve as operational definitions of taxonomic constructs. The constructs represent patterns of functioning, such as attention problems and aggressive behaviour, that may be measured somewhat differently by ratings obtained from different kinds of informants. The reasons why certain problems tend to co-occur may include genetic influences, physical abnormalities, learning, stressful experiences and other factors. Some of the syndromes may reflect disorders, while others reflect traits, reactions to stress or situationally specific modes of adaptation.
Some empirically based syndromes have clear counterparts among nosological categories. For example, statistical analyses of problem ratings have yielded a syndrome designated as attention problems that is analogous to the DSM ADHD diagnostic category and that correlates significantly with DSM diagnoses. Statistical analyses of the attention problems syndrome derived from teachers' ratings have yielded subgroups of problems analogous to the DSM-IV hyperactive-impulsive and inattentive types of ADHD [16]. The analyses that yielded the subgroups of attention problems also revealed that the following problems were strongly associated with both subgroups of attention problems: ‘can't concentrate, can't pay attention for long’; ‘difficulty following directions’; ‘messy work’; and ‘inattentive, easily distracted’. The finding that these four items were strongly associated with both the hyperactive-impulsive and inattentive patterns indicated that these kinds of problems underlie and link patterns of hyperactivity-impulsivity and inattention.
In addition to syndromes that resemble DSM categories, the empirically based approach has also yielded (i) syndromes that do not have counterparts among the DSM categories and (ii) syndromes of problems that are empirically found to be separate but that DSM combines. For example, Figure 2 refers to a syndrome designated as aggressive behaviour that includes ‘bullies’ and ‘fights’ as well as other overtly aggressive behaviours (not shown). Figure 2 also refers to a syndrome designated as delinquent behaviour that includes ‘lies’ and ‘steals’, as well as other unaggressive violations of social mores (not shown). Aggressive and delinquent behaviours have been found to form separate syndromes in many statistical analyses of children's behaviour problems [17]. However, both kinds of problems are combined in the DSM-IV conduct disorder (CD) category. Thus, for example, one child may qualify for CD on the basis of three overtly aggressive behaviours, such as bullying, fighting and cruelty. A second child may qualify for a diagnosis of CD on the basis of three unaggressive behaviours, such as lying, stealing and running away from home. A third child may qualify for a diagnosis of CD on the basis of both aggressive and unaggressive behaviour problems. The empirically based approach, by contrast, has shown that aggressive and unaggressive behaviour problems form two distinct syndromes.
Profiles for displaying empirically based findings
To help users quickly and easily see the results of empirically based assessment in terms of both the scores for individual items and scores for syndromes, the results are displayed on profiles, as illustrated in Figure 3. The computer-scored profile shown in Figure 3 is scored from the Child Behaviour Checklist for Ages 11/2–5 (CBCL/11/2–5) [18] completed for 5-year-old Alex by his mother. Hand-scored profiles are also available.
Profile of empirically based syndromes scored for 5-year-old Alex from CBCL/1.5–5 completed by his mother. B, borderline clinical range; C, clinical range. (Copyright T.M. Achenbach. Reproduced with permission.)
By looking at Figure 3, you can see the following seven syndrome scales: emotionally reactive, anxious/ depressed, somatic complaints, withdrawn, sleep problems, attention problems, and aggressive behaviour. These syndrome scales were derived from factor analyses of CBCL/11/2–5 forms completed for 1728 children by their parents or parent-surrogates. To identify syndromes that are evident outside the family context, the analyses were coordinated with analyses of the Caregiver-Teacher Report Form (C-TRF) completed for 11/2–5-year-old children by daycare providers and preschool teachers. All the CBCL/11/2–5 syndromes except sleep problems were found to have counterparts in analyses of the C-TRF. The C-TRF is scored on a profile that parallels the CBCL/11/2–5 profile shown in Figure 3, except for the sleep problems syndrome and some other problems that are specific to family versus daycare and preschool contexts.
As you can see in Figure 3, the problem items of each syndrome scale are listed in abbreviated form. For example, on the left side of the profile, the first item in the emotionally reactive syndrome is ‘21. Dist Change’. On the CBCL/11/2–5, this is item 21, whose full wording is ‘disturbed by any change in routine’. The number 1 to the left of item 21 in Figure 3 indicates that Alex's mother scored this item 1, indicating that it was somewhat or sometimes true of Alex. The total score for the emotionally reactive syndrome is obtained by summing the 0-1-2 scores for the items of the syndrome. The computer-scoring program automatically sums the item scores, prints the location of the syndrome scores in the graphic display in relation to scores for normative samples, and prints the percentile of the syndrome score based on normative samples. (The same information is displayed somewhat differently on hand-scored versions of the profiles.)
The broken lines printed across the profile in Figure 3 demarcate a borderline clinical range. Scores below the bottom broken line are in the normal range, whereas scores above the top broken line are in the clinical range. By looking at the profile in Figure 3, you can quickly see that Alex obtained scores in the clinical range (above the top broken line) on the emotionally reactive, sleep problems, and attention problems syndromes. He obtained scores in the borderline range (between the broken lines) on the anxious/depressed and aggressive behaviour syndromes. And he obtained scores in the normal range (below the bottom broken line) on the somatic complaints and withdrawn syndromes.
Cross-informant comparisons
Because children's functioning often varies from one context and interaction partner to another, no single informant can serve as a ‘gold standard’. To make use of the multiple perspectives that are needed for comprehensive assessment, the empirically based approach obtains ratings and other data from multiple informants. Each informant's ratings are displayed on profiles like the one shown in Figure 3. For example, comprehensive assessment of 5-year-old Alex could include a CBCL/11/2–5 completed by his father and one completed by his grandmother, as well as by his mother. It could also include a C-TRF completed by his preschool teacher and one completed by his daycare provider. Using either hand-scored or computer-scored profiles, you can compare the results from all the informants to identify particular problems and syndromes on which the various informants agree or disagree. The computer software facilitates more detailed comparisons by printing side-by-side comparisons of scores on each problem item and each scale, scored from up to eight forms per child. This enables you to quickly see which problems are reported by all informants compared with those that are reported by only one informant or a particular subset of informants.
DSM-oriented scales
Whereas the DSM paradigm starts at the top with concepts of diagnostic categories, the empirically based paradigm starts at the bottom with data from which syndromes are derived. DSM diagnoses involve categorical yes-or-no decisions, whereas the empirically based syndromes are scored quantitatively. However, categorical decisions can also be based on quantitative scores by using the cut-off points for normal, borderline and clinical ranges. The borderline range alerts users to cases that may not be clearly normal or clearly deviant but that should be evaluated further on the basis of additional data, either now or after enough time has elapsed to determine whether a child is getting worse, better or not changing. Users who require dichotomous decisions can include borderline cases in the clinical category.
Despite their differences, there are important points of contact between the DSM and empirically based paradigms. One point of contact is that they both assess taxonomic constructs according to lists of explicitly stated problems, such as attention problems, fighting and lying.
A second point of contact is that the symptom criteria for several DSM categories resemble the problems that comprise some of the empirically based syndromes. A related point of contact is that several studies have shown significant associations between DSM diagnoses and scores on empirically based syndromes [19–22].
To facilitate better coordination of DSM-based and empirically based assessment and taxonomy, the 21st century versions of the empirically based assessment instruments include profiles for scoring DSM-oriented scales from the same set of problem items as the empirically based syndromes [18]. The DSM-oriented scales were constructed by having highly experienced psychiatrists and psychologists from numerous cultures rate the degree to which empirically based problem items are consistent with particular DSM-IV diagnoses. Items that a substantial majority rated as very consistent with particular DSM diagnostic categories were used to construct scales analogous to those for the empirically based syndromes.
Profiles for displaying DSM-oriented scales
Figure 4 illustrates a profile of DSM-oriented scales scored from the C-TRF completed by a preschool teacher for 5-year-old Alex, whose profile of empirically based CBCL syndrome scales was shown in Figure 3. By looking at Figure 4, you can see that each DSM-oriented scale is scored quantitatively by summing the 0-1-2 scores of the items that comprise the scale. Like the profiles for empirically based syndromes, the profiles for DSMoriented scales display two broken lines that demarcate a borderline clinical range. As you can see from Figure 4, Alex's scores on the Attention Deficit/Hyperactivity Problems and Oppositional Defiant Problems scales were in the clinical range, above the top broken line. His scores on the Affective Problems and Anxiety Problems scales were in the borderline range, between the two broken lines. And his score on the Pervasive Developmental Problems scale was in the normal range, below the bottom broken line. To take account of significant differences that were found between DSM-oriented scale scores for boys and girls in large normative samples, the cut-off points, percentiles and standard scores (T-scores) differ for the two genders.
Profile of DSM-oriented scales scored for 5-year-old Alex from C-TRF completed by his teacher. (Copyright T.M. Achenbach. Reproduced with permission.)
Because empirically based and DSM-oriented scales are both scored from the same assessment instruments, users can quickly and efficiently evaluate individuals in terms of both kinds of scales. In addition, because the computer software prints side-by-side comparisons between empirically based syndrome scores and also between DSM-oriented scale scores for up to eight assessment forms per subject, users can quickly identify syndromes and DSM-oriented scales that show crossinformant consistency or inconsistency. However, users should note that high scale scores are not necessarily equivalent to DSM diagnoses. Instead, to make DSM diagnoses, users should consult the DSM to see whether the DSM criteria are met for particular diagnoses.
Implications for clinical services and research
To advance services and research, we need to make optimal use of assessment data (i) to identify the distinguishing features of each individual; (ii) to link individual patterns of functioning with taxonomic constructs that can help us apply previously accumulated knowledge to each case; and (iii) to make accurate case formulations. Nosological approaches work from the top down by having experts formulate diagnostic categories and criteria that are then used to classify individual cases. Empirically based approaches work from the bottom up by deriving syndromes from data on large samples of subjects rated by different kinds of informants. The nosological and empirically based approaches are not mutually incompatible. Instead, the time may be especially ripe for integrating concepts, methods and findings from both approaches. A possible route toward integration was outlined in terms of DSM-oriented scales that are scored from the same pool of assessment items as are used to derive empirically based syndromes. The DSM-oriented scales are scored quantitatively and normed by age, gender and type of informant. If desired, they can be used to make categorical decisions by employing clinical cutoff points on the distributions of scale scores.
