Abstract

A majority of Canadian mental health care professionals rely on the Diagnostic and Statistical Manual of Mental Disorders (DSM) to inform their diagnostic formulations. The American nosology is taught across Canada in undergraduate and graduate programs, including in residencies in psychiatry and clinical psychology. The DSM rose to prominence in Canada and elsewhere with the introduction of the DSM-III, 1 which offered the first glimpse of a scientifically informed classification and provided clinical researchers with a reliable way to form homogeneous participant groups. However, the DSM has become so entrenched in our way of conceptualising psychopathology that despite well-documented concerns about the reliability, validity, and safety of its latest revision, 2 Canadian clinicians appear to accept without question that it is the only viable system of classification for the purposes of clinical care as well as medical and allied health care professional education. Canadians actually have an excellent alternative, the World Health Organization’s (WHO’s) International Classification of Diseases and Related Health Problems (ICD), 3 which is currently being revised and comprehensively field-tested in multiple languages for an anticipated approval by the World Health Assembly in 2018.
The purpose of the present perspective is to highlight several important distinctions between the DSM-5 and the forthcoming ICD-11 to argue for adoption of the ICD in Canada for teaching, training, clinical practice, and research. The choice of a classification system has important implications for patient care, health policy, and other civil and forensic decisions. Unfortunately, the current state of knowledge about mental disorders does not permit a fully scientifically validated or etiologically grounded approach to classification. Until such a time that we are able to reliably categorise disorders according to, for example, specific biomarkers, psychiatric diagnosis must rely on descriptive phenomenological nosologies such as the DSM-5 and ICD-11 to accomplish the pragmatic tasks of compiling health statistics, making treatment decisions, communicating effectively, and allocating resources. We argue that Canadians should be using and teaching the ICD-11, particularly in light of emerging evidence that it is a valid, reliable, safe, and clinically useful classification. Importantly, the ICD, unlike the DSM, is a nosology conceived and developed for the betterment of public health that is available to users at minimal cost.
Background on the ICD
The WHO is the international authority for directing and coordinating health, including mental health, within the United Nations. A WHO constitutional responsibility is publication of classifications used by its 194 member states, including Canada, as a framework for health information, policy, funding, and reporting. Last published in 1992, the ICD ensures a shared understanding of morbidity and mortality case definitions. In Canada, use of the ICD for morbidity statistics is supported by the Canadian Institute for Health Information (CIHI), a pan-national organisation with a mandate to collect and disseminate standardised, comparable health system information. 4 CIHI developed and oversaw the implementation of a Canadian-specific clinical modification of the ICD-10 (ICD-10-CA). The WHO hopes that the conceptual architecture and functionality of the ICD-11 will remove the need for national modifications of the classification in the future. Although CIHI is currently evaluating the ICD-11, at the time of writing, it is not known if or when it will be adopted in Canada. In 2015, the United States implemented a country-specific adaptation of the ICD-10 for health statistics reporting, thereby requiring clinicians and health systems to use ICD codes. In anticipation of this policy change, the American Psychiatric Association (APA) included ICD-11 equivalent codes for each disorder category in their classification. Thus, ICD, and not the DSM, is the current standard for health information coding and reporting in Canada and the United States. Notwithstanding the possibility of a ‘cross-walk’ between ICD and DSM at the coding level, case definitions between the classifications are considerably different. These differences have real-world implications such as whether a patient is deemed to have a mental disorder and, if so, which one. In other words, case definitions affect treatment decisions and ultimately efficient stewardship of precious health sector resources.
There are 4 separate versions of the ICD to address the different needs for health classifications: a statistical version for health information coding, a primary care version, a research version, and a comprehensive specialist version for mental and behavioural disorders. The latter version, known as the Clinical Descriptions and Diagnostic Guidelines (CDDG) and the focus of the present article, offers guidance on the essential features of each disorder, information on differentiating disorder from normality and other disorders, and additional information to assist clinicians for differential diagnosis. The CDDG is used by approximately 75% of the world’s clinicians to diagnose mental disorders.
It is assumed that the reader is familiar with the DSM-5 2 as the mental health classification published by the APA used for clinical practice, research, coding, and teaching.
A key difference between the CDDG and DSM-5 criteria sets is the approach to case definitions. Under the ICD-11, cases are defined according to whether a patient presents with characteristic features of a disorder, that is, those symptoms or signs expected to be present in every case. 5 Under the DSM-5, although some criteria must be present in all cases, positive cases often require a specified minimum number of symptoms from a list of polythetic items that have been present for a precise duration. The bases for these thresholds are rarely supported by empirical evidence. 6 While useful in research contexts for forming homogeneous groups of participants, this approach seldom reflects how clinicians assign psychiatric diagnoses.
The Revision Processes
In 2007, representatives of the APA, WHO, and National Institutes of Mental Health (NIMH) met to discuss a plan to harmonise the classification of mental and behavioural disorders to avoid unintentional differences and consider developing a single classification system. 7 At that meeting, significant agreement was achieved on the structure of the highest level disorder groupings (e.g., neurodevelopmental disorders). However, the APA and WHO subsequently followed separate paths to develop and test their respective revised classifications.
DSM-5
The initial plan for the DSM-5 was an ambitious updating of psychiatric nosology that was going to move away from defining mental disorders solely on the basis of phenomenological descriptors and incorporate dimensionality within categories, consider diversity more comprehensively, reduce the need for ‘not otherwise specified’ diagnoses, address high rates of comorbidity, and consider the inclusion of neuroscientific and genetic findings. 8,9 Consideration was also given to improving the clinical utility of the classification, that is, how well it assists with communication, treatment selection, prognostication, and identification of those in need of treatment. 9 –11 Aspirations for a paradigm-shifting classification became increasingly tempered as time marched closer to the publication date and pressures to complete draft proposals for field trials mounted. Concerns about the DSM revision process emerged both during and after the publication of DSM-5 in 201312—Namely, the potential for conflicts of interest because of task force members maintaining ongoing relationships with pharmaceutical companies 12,13 as well as inconsistent use of systematic reviews to inform proposed changes to disorder criteria. 14,15 Additional concerns were raised about the lack of transparency of work group and task force activities, that is, how and whether their work was critically evaluated and how feedback from public review was addressed. 12,16,17 First 14 speculated that the desire on the part of the APA to innovate the field of psychiatric classification might have usurped time and resources that should have been devoted to following a more rigorous empirical process. Thomas Insel, former director of the NIMH, implored the research community to abandon DSM categories in favour of NIMH Research Domain Criteria (RDoC), a dimensional model based on brain-behaviour relationships. 18 RDoC is proposed as a means of avoiding reification of psychiatric disorder categories and informing the design of studies to advance our understanding of mental illness with the promise of a neuroscience-based nosology.
ICD-11
The Department of Mental Health and Substance Abuse was tasked with updating the Mental and Behavioural Disorders chapter of the ICD as part of the larger revision of the WHO classification. Decisions about the objectives, processes, and outcomes for the revision process were taken in light of recommendations and advice proposed by the International Advisory Group from the Revision of ICD-10 Mental and Behavioural Disorders (AG). The AG, composed of international experts from various mental health professions, examined the uses, settings, and scope of the ICD to propose a set of objectives for the revision that would be most useful for patients and their families, clinicians, researchers, educators, health information custodians, and public health policy makers in member states. 19 The AG recommended that the revision focus on improving clinical utility of the classification to address one of the most serious shortcomings of previous versions of mental health classifications. 5,20,21 Thus, the WHO sought to develop a classification that would align with its mission of reducing global disease burden by equipping professionals with a practical tool to identify patients in need of care. It was considered particularly important to ensure its utility in identifying patients globally both in highly resourced specialist settings as well as in primary care, where most patients make contact with health care systems. The AG also determined that neuroscientific research was not sufficiently advanced to be used as a basis for diagnosis. Given that acceptability to clinicians is one important feature of clinical utility, 10 formative field trials were conducted in collaboration with international and national mental health professional organisations to ensure that the overall structure and superordinate categories were commensurate with cognitive constructs of mental health classification. 22 Informed by these findings, the AG then struck specific working groups according to broad diagnostic categories to comprehensively review the literature as well evaluate DSM-5 proposals for global applicability. Working group members comprised a linguistically, nationally, and professionally diverse set of experts who were formally vetted by the WHO for conflicts of interest. Working groups summarised their findings using a standardised referenced form that ensured that the content for proposed diagnostic entities would be uniform across the classification and include best available evidence. 5 The WHO secretariat then used this information to develop proposed guidelines for extensive field testing. The proposed structure and brief definitions that incorporate essential features of disorders are available for public viewing and comment at http://apps.who.int/classifications/icd11/browse/l-m/en, reflecting WHO’s desire for a fair, inclusive, and transparent process.
Field Trials for DSM-5 and ICD-11
Field testing of revised classifications is a critical step to ensure that proposed definitional changes to diagnoses operate as intended and represent an improvement over previous conceptualisations. Field testing can take the form of developmental field trials used to gather data that inform further revision of the classification prior to publication. In contrast, evaluative field trials establish expected psychometric properties and are frequently implemented in varied clinical settings after a nosology is published. 23 Results of developmental field trials and the extent to which data are used to make subsequent revisions provide pivotal information about a classification’s reliability, validity, clinical utility, and safety.
DSM-5
Although the APA stated its intention to conduct developmental field trials with proposed DSM-5 criteria, these field trials never materialised due to delayed start dates and a desire by the task force to meet their publication target. 24 Instead, only evaluative field trials lacking an iterative revision process were conducted to establish diagnostic reliability, as well as perceived clinical utility and feasibility of diagnostic criteria for 33 conditions. 25 –29 A number of methodological improvements were made in the implementation of the DSM-5 field trials, in particular regarding the data-analytic methods. 25 However, only overall indices of diagnostic agreement were measured (i.e., kappa statistics that control for chance agreement), which, when found to be lower than expected or poorer than that for the DSM-IV-TR equivalent categories, failed to provide usable information regarding what aspects of diagnostic criteria were not functioning as anticipated 23 significantly limiting the impact of results to inform further revisions. 1 Thus, even if there had been sufficient time to make additional revisions, the various DSM work groups would not have had usable data to address poor reliability. Furthermore, primarily severe cases were evaluated because field studies were only conducted in tertiary care hospitals. Lack of threshold cases makes it difficult to ascertain how well proposed criteria differentiate between psychopathology and normality. 24
The APA decided to adopt a test-retest rather than an interrater methodology for their evaluative field trial. Kraemer et al. 26 argue that such an approach better reflects how classifications are used in clinical settings (i.e., DSM-5 field trials have better external validity). However, test-retest reliability statistics reflect attributes of both the classification and the consistency of patient report at 2 time points tested. 23 DSM-5 field trials did not evaluate the reliability of the classification independently from patient report. It is therefore unclear what proportion of poor reliability can be attributed to the classification per se. Finally, field trials did not directly compare DSM-IV to DSM-5 to determine whether proposed changes improve diagnostic judgment.
Evaluation of the proposed criteria for generalised anxiety disorder (GAD) illustrates some of the problems with the DSM-5 field trials. One of the key changes proposed for GAD was the addition of behavioural criteria. Although the conceptualisation of behaviours associated with GAD was criticised for lack of specificity, 30 the change represented an attempt by developers to improve differentiation of GAD from other disorders. Test-retest reliability results of the proposed criteria were found to have an intraclass kappa of 0.20, interpreted by the authors as ‘questionable’. 29 Notwithstanding methodological differences between DSM-IV and DSM-5 trials, this result is vastly inferior to the interrater reliability kappa of 0.67 reported for DSM-IV field trials. 31 With only the kappa statistic to rely on, the APA would not have been able to determine whether addition of behavioural criteria resulted in poorer reliability and, if so, which of the criteria led to greater clinician disagreement. It is not known how the APA task force used these data (further evidence of lack of transparency), but ultimately it published criteria identical to those in DSM-IV. Others have suggested that this outcome is preferable given the low reliability of the proposed criteria as well as ongoing controversy in the field about how best to conceptualise GAD. 32 Regardless, without criterion-level data, this result represents a missed opportunity to incorporate over 2 decades of research on the phenomenology of GAD.
In sum, the DSM-5 field trial results do not provide users with estimates of how reliable the classification is as distinct from variability in patient report, lack information about the impact of changes to diagnostic thresholds, and are not informative for making decisions about which criteria should be considered for further revision.
ICD-11
The ICD-11 proposals are the object of considerable developmental field testing in multiple languages, 33,34 results that will inform modifications of the proposed guidelines prior to finalisation. The WHO decided to take a 2-stage approach to developmental field testing. At both stages, a novel methodological approach to evaluating clinical utility based on clinician responses was employed. 34 The first set of field trials concluding in 2017 focuses on testing specific conceptual changes proposed for the ICD-11 using clinical vignettes that evaluate whether diagnostic consistency and clinical utility are improved over the ICD-10. These so-called case-controlled field trials are implemented worldwide over the Internet, drawing participants from approximately 13,000 mental health professional members of the Global Clinical Practice Network (gcp.network). 33 When results suggest that an ICD-11 category has poor diagnostic consistency or clinical utility, feature-level data can be used to determine what aspects of the guidelines should be considered for further revision.
The second stage of field testing is taking place in diverse settings across the world, including in Canada at the Royal Ottawa Health Care Group and University of Ottawa Institute for Mental Health Research led by the authors. These field trials employ an interrater reliability methodology designed to evaluate diagnostic consistency and clinical utility of the guidelines. WHO chose an interrater methodology because they sought to directly evaluate reliability of the classification independent of variability in patient report. These so-called ecological implementation field trials will determine how ICD-11 proposals perform with patients in settings for where the classification will be used. The main focus is on disorders with the highest global disease burden as well as those that had unacceptable diagnostic consistency in the case-controlled trials. Further revision prior to publication will be guided by the findings of these studies.
In summary, ICD-11 guidelines are being rigorously field tested in a 2-stage process that will provide data at each stage to make further refinements to the classification prior to publication in 2018. Nearly 300 peer-reviewed publications have appeared about the ICD-11 since the beginning of the revision process, including those that describe in detail and transparently the proposals for the ICD-11, 35 –40 as well as the results of the case-controlled field trials. 41 Growing evidence suggests that the proposals for ICD-11 align well with global mental health professionals’ conceptualisations of the disorders they diagnose in clinical practice, produce acceptable levels of diagnostic agreement, are considered clinically useful by clinicians, and are applicable globally in multiple languages, including in Canada’s official languages.
Reliability, Validity, and Safety
The aforementioned issues with the overall DSM-5 revision process, including lack of transparency, conflicts of interest, delayed and then incomplete developmental field trials, disappointing reliability, and lack of useful information to further revise the proposed criteria, should raise serious concerns about whether the APA’s nosology is appropriate for clinical practice and educating future professionals in Canada.
Poor reliability of tested diagnostic categories is a crucial finding because acceptable reliability is a necessary (but not sufficient) psychometric feature for diagnostic validity. There are varied opinions on whether the reliability of the DSM-5 criteria sets are in fact unacceptably low 42 or whether they are lower than DSM-IV field trials as a result of the test-retest reliability methodology. 26 For example, Chmielewski et al. 43 showed that reliability for DSM-5 categories is significantly better when employing an interrater methodology using semi-structured interviews (e.g., major depressive disorder: interrater kappa = 0.92, 1-week test retest kappa = 0.60). However, the test-retest findings by Chmielewski et al., which largely replicated DSM-5 field trial methodology, were considerably higher than in the field trials (e.g., major depressive disorder: intraclass kappa = 0.60 versus 0.28 in the DSM-5 field trial), raising concern about whether the semi-structured interview, rarely used in regular clinical practice, might explain the results. To explain poor reliability of DSM-5 categories, Kraemer et al. 26 claim that field trial results are similar to diagnostic tests in other specialty areas of medicine and therefore the threshold for acceptable reliability should be lowered. However, as Hunsley and Mash 44 point out, examination of clinical instruments with the expected stability of a psychiatric diagnosis across a 1-week test-retest period within psychiatry and psychology suggests that a kappa of 0.70 is a minimum requirement. Applying this evidence-based standard, only 1 disorder in the DSM-5 field trial, major neurocognitive disorder, has acceptable reliability. Irrespective of whether kappas were acceptable, DSM-5 field trial methodology did not reveal which changes contributed to poor reliability.
There are also important conceptual changes to the DSM-5 made on the basis of weak scientific support that raise additional concerns about the classification’s validity. For example, to address the steady but unsubstantiated increase in pediatric bipolar disorder (PBD) diagnoses and use of antipsychotic medications in children and youth, the DSM introduced a new category of disruptive mood dysregulation disorder (DMDD) to draw a distinction between chronic irritability and anger outbursts and irritability more consistent with mania. However, comprehensive reviews of the literature on chronic irritability and anger suggest that such symptomatology can be associated with oppositional defiant disorder (ODD) and predictive of the development of depressive and anxiety disorders later in life. 45 The addition of DMDD creates a diagnostic category that, rather than guarding against unjustified psychopharmacological treatment, results in a novel pharmaceutical target. Other questionable examples include expansion of symptom presentations of posttraumatic stress disorder resulting in a massively polythetic construct to accommodate heterogeneity such as that seen in individuals with a history of multiple traumas, 46,47 rejection of a dimensional model of personality disorders and reversion to DSM-IV personality disorder categories despite concerns about their validity, 48,49 and broadening of the entry-level criteria of major depressive disorder to include hopelessness. 50 These are but a few highlights.
In contrast, ICD-11 proposals follow the prevailing evidence and, to illustrate with the same examples, provide clinicians with the opportunity to code for a chronic irritability and anger presentation within the existing category of ODD rather than introducing DMDD 45 ; introduce complex PTSD (CPTSD) 37 as a phenomenologically distinct but related condition to PTSD consistent with studies supporting that CPTSD captures a distinct patient population, namely those with a long history of severe or repeated traumas 38 ; simplify the diagnostic landscape of personality disorders by introducing a dimensional model akin to the dimensional model proposed by the DSM-5 Work Group for Personality and Personality Disorders that was ultimately relegated to Section III of DSM-5 (i.e., ‘emerging measures and models’) 40 ; and maintain low mood and loss of interest/pleasure as entry-level symptoms for depression. 51
Evidence for poor reliability and validity as well as lowering of the threshold for entry in to various disorders behooves us to ask whether diagnoses made with the DSM-5 are safe for Canadian patients or will lead to overdiagnosis or misdiagnosis with its attendant consequences.
Conclusion
The issue of the suitability of the DSM-5 for use in practice and training has been raised by our colleagues in South Africa 13 and Australia. 51 We join this chorus and conclude that the problems in the overall revision process, poor reliability, questionable validity, and concerns about safety of DSM-5 diagnostic definitions suggest that adoption of the classification in Canada is not in the best interest of our patients. The ICD-11 provides an alternative classification, which is being developed with stakeholder input, relied on systematic review of the literature by international experts, has been subjected to extensive multistage and multilingual developmental field testing, and is accruing significant empirical support demonstrating its reliability, validity, and clinical utility. Furthermore, an accompanying semi-structured clinical interview will allow researchers to form homogeneous participant groups if so desired.
We acknowledge that adoption of the ICD-11 in Canada for mental health diagnosis will require significant investment of time and resources on the part of administrators, clinicians, and students. However, it is our opinion that Canadian patients deserve an evidence-based classification intended to improve public health that is published by the world health authority, rather than a classification published by another country’s professional organisation that has been the object of sustained criticism.
Footnotes
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Kogan reports a grant from the University of Ottawa Medical Research Foundation during the conduct of the study, and Dr. Kogan is a consultant to the World Health Organization on the development of the ICD-11 Mental and Behavioural Disorders chapter. His travel to meetings of the WHO Field Studies Coordination Group was reimbursed, but he received no fees for consultation. Dr. Paterniti reports a grant from the University of Ottawa Medical Research Foundation during the conduct of the study. The opinions expressed in this article are those of the authors and do not represent the policies or positions of the World Health Organization.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
