Abstract

As a co-director of the WHO World Mental Health (WMH) Survey Initiative of which the New Zealand Mental Health Survey (NZMHS) is a part, I am familiar with the work of the research teams in each of the nearly 30 countries that have carried out WMH surveys. (For a complete list of participating countries and publications, see http://www.hcp.med.harvard.edu/wmh.) None of these teams has come close to matching the extraordinary early productivity of the NZMHS team in the 11 papers reported in this issue. In addition to papers on 12 month and lifetime prevalence and treatment, which are being prepared by all participating WMH teams, very useful additional papers are presented here on comorbidity, suicidality, disability and ethnic differences. Taken together, these results confirm the broad predictions that mental disorders are common, that they are often comorbid with each other and with physical conditions, that the disability associated with these disorders is often substantial, and that treatment is often either absent or delayed. Although only scratching the surface of the rich NZMHS dataset, these results provide a useful foundation for more in-depth analyses and, as noted by Professor Andrews in his commentary, raise a number of provocative questions about early intervention and other important policy issues.
In his comments on these 11 papers, Professor Jorm asserted that the results reported in these 11 papers are more ‘useful myths’ than scientific facts. Although five ‘myths’ were mentioned, there are two types of underlying problems that need to be distinguished. The first type involves conceptual problems with the current diagnostic system: categorical classification rules that obscure the fact that syndromes of interest are often better characterized in dimensional than dichotomous terms; and proliferation of diagnoses in recent DSM editions leading to an artificial appearance of high comorbidity. The second type of problem involves measurement: that Composite International Diagnostic Interview (CIDI) diagnoses do not approximate diagnoses made by specialist clinicians; that accuracy of diagnosis differs from one country to the next and, within countries, varies by sociodemographic status; and the likelihood that reports about treatment are imperfect.
I think it is fair to say that the NZMHS team, along with all the other WMH collaborators (and anyone else who has given the matter any serious thought), will agree with Professor Jorm's twin concerns about diagnostic thresholds and proliferation of diagnoses. Interestingly, though, the paper by Judd et al. [1] on subthreshold depression that Professor Jorm cited as evidence for the problem of arbitrary diagnostic thresholds made heavy use of results from epidemiological surveys based on earlier versions of the CIDI. This fact should make it clear that the CIDI is not the source of the problem, but rather a vehicle for investigating the problem. To this end, the NZMHS and other WMH surveys removed the diagnostic skip rules contained in previous versions of CIDI in an explicit attempt to collect data on subthreshold manifestations of disorders. In addition, fully structured versions of standard symptom severity scales were included in all WMH surveys to provide dimensional assessments of 12 month disorders. For example, the Quick Inventory of Depressive Symptoms Self-Report [2] was included to assess the severity of 12 month major depressive episodes and the Panic Disorder Severity Scale to assess the severity of 12 month panic disorder [3]. In each such case, CIDI subthreshold cases were assessed in addition to full cases in order to allow for inaccuracies in CIDI diagnostic thresholds. These data were not reported in the papers presented in this issue because they require focused evaluation of individual disorders that goes beyond the scope of these initial papers [4]. I mention the existence of these data, though, to make it clear that one of Professor Jorm's suggestions for future alternatives – to include dimensional assessment rather than only categorical assessment of key syndromes – was implemented in the NZMHS data collection and is a topic for future investigation.
The problem of proliferation of comorbid diagnoses is more difficult to address, as indirectly indicated by the fact that Professor Jorm made no recommendation for future alternatives to address this problem. One way the NZMHS and other WMH surveys tried to do this was by including questions about overlapping symptoms in the diagnostic assessments of individual disorders. Respondents who reported 2 weeks of dysphoria or anhedonia, for example, were administered not only a complete set of questions about the symptoms of DSM-IV and ICD-10 major depressive episode, but also questions about intercurrent symptoms of worry, panic and social anxiety. Information about temporal overlap in these syndromes during the 12 months before interview was also obtained for the same purpose. The NZMHS team and other WMH collaborators are actively involved in empirical studies of such symptom overlaps.
Professor Jorm's understandable concerns with CIDI measurement problems begin with the assertion that CIDI diagnoses do not ‘approximate those made by specialist clinicians’. This is an incorrect characterization of CIDI 3.0, the much-improved version of CIDI used in the WMH surveys. Clinical reappraisal studies carried out by WMH collaborators in three Western European countries and the US have documented good concordance between diagnoses based on CIDI 3.0 and diagnoses based on blinded clinical re-interviews, with area under the receiver operating characteristic curve in the range 0.73–0.93 for lifetime anxiety-mood disorders and 0.83–0.88 for 12 month anxiety-mood disorders [5].
Lifetime diagnoses are, of course, likely to be less accurate than recent diagnoses in both clinical interviews and CIDI interviews. It is important to recognize, though, that this is not a problem unique to CIDI, as CIDI lifetime diagnoses have good agreement with clinical lifetime diagnoses. We also know that bias in lifetime prevalence estimates is likely to be conservative, positively related to age at interview, and negatively related to severity and chronicity [6]. In addition, we have useful modelling strategies available to generate plausible estimates of lifetime prevalence in the presence of recall bias [7]. In light of these facts, the suggestion by Professor Jorm that lifetime assessment should be abandoned strikes me as overly rigid. Cautiousness is certainly needed in interpreting lifetime prevalence data, but the results regarding variation in age of onset distributions and broad aspects of illness course (e.g. number of years in episode) in the NZMHS data nonetheless strike me as very plausible and useful. It is noteworthy that the only realistic way to improve on retrospective lifetime assessment is to carry out long-term longitudinal studies that suffer from small sample sizes, attrition bias and being at the mercy of the ideas about important risk factors that were in vogue at the time of baseline data collection. A better approach is to accept the fact that long-term retrospective and longterm prospective studies both have weaknesses and that the best approach is to embrace thoughtful data collection and cautious interpretation of results based on both.
Turning back to the issue of CIDI validity, it is noteworthy that CIDI concordance with clinical diagnoses is even better when CIDI symptom-level data are used to predict clinical diagnoses [5]. In the latter case, CIDI diagnoses are transformed into dimensional predicted probabilities of specific clinical diagnoses. As discussed in detail elsewhere [8], this transformational approach yields considerably more accurate estimates of clinical diagnoses for a fixed investment of resources than a survey carried out exclusively by clinical interviewers. This statement might seem counterintuitive, but the critical insight is that the number of clinical interviews that could be completed for a fixed investment would be considerably smaller than the number of CIDI interviews with subsample clinical reappraisal interviews that could be completed for the same investment. The CIDI symptomlevel data are able to reproduce clinical diagnoses with enough accuracy that the increased precision related to the larger sample size more than offsets the fact that clinical diagnoses are based on predictions from a clinical reappraisal subsample rather than on direct interviews with all respondents.
In order for the transformational approach to be used, a substantial clinical reappraisal interview subsample (up to 10% of the full sample) has to be included as a core part of the CIDI survey. All WMH collaborators were encouraged to include this kind of robust clinical reappraisal component in their surveys. The fact that this was not done in the NZMHS is a major weakness, although one that could still be corrected with a supplemental CIDI survey that included clinical reappraisal interviews with all respondents who reported symptoms judged clinically significant in the CIDI and a probability subsample of others. The importance of completing this sort of assessment in each survey rather than merely relying on the strong clinical reappraisal results in Western Europe and the US is related to Professor Jorm's concern about comparative validity (‘the myth of national and sociodemographic differences’). Given the enormous variation that exists both between and within countries in the illness schemas used to make sense of emotional problems, variation in the extent to which CIDI questions translated into local languages tap into these schemas, and variation in the willingness of respondents to report such problems when they are recognized all make it exceedingly unlikely that CIDI diagnostic accuracy is consistent either across all countries or across all sociodemographic groups within a single country. It is possible to detect such variation and correct for it by recalibrating CIDI diagnoses, but only if a clinical reappraisal subsample exists.
As a follow-up to the last paragraph, it is worth noting that Professor Jorm's comment about CIDI prevalence estimates being implausibly low in some countries is not lost on the investigators in these countries [9, 10]. Variation of this sort was anticipated in designing the WMH surveys. We are using the CIDI samples in the countries with low prevalence (as well as in some countries with higher prevalence) as natural laboratories to carry out clinical reappraisal interviews and follow-up debriefing interviews having two aims: to determine the best way to recalibrate the current CIDI data to approximate clinical diagnoses (a task that is facilitated by the abovementioned fact that WMH interviews assessed subthreshold CIDI cases, yielding rich data for respondents below the CIDI diagnostic threshold for purposes of recalibration); and to determine ways in which the CIDI questions can be changed in future studies to improve concordance with clinical assessments.
The work on future modifications in CIDI question wording as well as survey design is motivated by a recognition of the importance of improving the standard ways we currently go about doing psychiatric epidemiological research. I agree with Professor Jorm that the unprecedented level of coordination achieved in the WMH Survey Initiative, which has resulted in an enormous number of CIDI surveys being carried out in all regions of the world, could have a negative impact on the field if we were to ‘sweep the uncertainties under the carpet’ and convert research on psychiatric epidemiology into research on CIDI epidemiology. We have no such intention, though, as we recognize the limitations of the current version of CIDI as well as of the WMH surveys. Rather than lament these inevitable limitations, though, we are doing something about them by carrying out thoughtful methodological studies aimed at reducing these limitations in future studies.
It is noteworthy in this regard that the WMH surveys are unprecedented in their inclusion of innovations to address the limitations noted by Professor Jorm. A number of these innovations have already been mentioned: (i) subthreshold diagnostic assessments aimed at avoiding the exclusion of clinically significant syndromes by relying on rigid categorical distinctions that might not apply equally in all countries; (ii) special question series to examine overlap among diagnoses that are perhaps incorrectly separated in the current diagnostic system; (iii) fully structured versions of standard clinical severity measures to allow comparison between dimensional and categorical classifications; and (iv) clinical reappraisal studies in a number of WMH countries to recalibrate CIDI diagnostic assessments. The WMH surveys also contain much more information about role impairment and disability than previous psychiatric epidemiological surveys, again allowing comparisons between threshold and subthreshold cases. The impairment and disability associated with a select group of chronic physical conditions are also assessed for purposes of rank ordering the burdens of specific mental and physical disorders. Although the initial NZMHS papers did not make use of these innovations, it is important to recognize that the innovations exist in the NZMHS data as a basis for refinement of initial results in future analyses.
The last of Professor Jorm's myths concerns one of the most interesting aspects of the initial NZMHS findings: unmet need for treatment. To say that the finding of substantial unmet need for treatment is a ‘myth’ is a bit too much. It is doubtlessly true that some of the CIDI non-cases in treatment were misclassified. To imply, as Professor Jorm does, though, that this applies to the great majority of CIDI non-cases in treatment is mere conjecture. People with spontaneous panic attacks do not wait for their attacks to become recurrent and for a month of persistent worry (both of which are requirements for a diagnosis of panic disorder) before finding their way to the emergency department for fear of having a heart attack. As Professor Jorm himself noted earlier in his commentary, there is no reason to believe that any categorical cut-point on the underlying dimensions of psychiatric diagnoses will cleanly capture all the people with sufficient distress, impairment, insight and belief in the effectiveness of medical treatment to seek treatment.
A different set of issues arise in considering people classified by the CIDI as having serious disorders who reported not seeking treatment. An important innovation in the NZMHS and other WMH surveys was the inclusion of sufficient information on distress and impairment to classify cases in terms of 12 month clinical severity. What do we make of a respondent who reported major depression lasting for 6 months with severe role impairment and suicidality in the absence of treatment? Is this not an instance of unmet need for treatment? Professor Jorm sidesteps this question by citing one of his own papers to the effect that self-reports of medical care (not mental health care) utilization generally overestimate true utilization compared with record checks [11]. But there is a much larger literature than this single paper on the accuracy of health-care utilization self-reports. A recent review of over 40 such studies [12] showed that accuracy varies significantly with length of recall, type of question (e.g. any utilization vs number of visits), intensity of treatment (with cases receiving an adequate course of treatment much more likely to report utilization than those receiving less intensive treatment) and the use of helpful memory prompts of the sort that were included in CIDI 3.0. (A detailed discussion of the CIDI 3.0 memory prompts is presented elsewhere [13].)
Based on the accumulated evidence in these numerous record check studies, including the finding that serious mental illness is associated with much more overreporting than under-reporting of treatment [14], one would expect that the vast majority of the 40% of NZMHS respondents classified as serious 12 month DSM-IV/CIDI cases who reported not receiving treatment in the past 12 months represent instances of true unmet need for treatment. Access to medical and pharmacy utilization records would have made this conclusion more definitive. However, great complexities exist in obtaining, processing and linking such archival data even in countries with centralized records. These tasks are virtually impossible in other countries. Given the competing demands on fixed study resources faced by the NZMHS team and their counterparts in other countries, we continue to believe that the decision not to encourage this type of investigation in the WMH countries was the correct one.
While I hope the above remarks make it clear that I judge the 11 papers reported in this issue to be a remarkable achievement in the short time since the NZMHS data have been available, I want to be clear that I also consider these papers no more than an initial set of preliminary reports creating a foundation for a much larger and more complex series of future investigations of the NZMHS data that will address concerns of the sort raised of Professor Jorm as well as other thoughtful critics [15]. The NZMHS will not be able to resolve all uncertainties, of course, but I feel quite sure that careful analysis of the data collected in the survey, in conjunction with parallel analyses and follow-up targeted data collection in the other WMH surveys, will advance our understanding of prevalence, diagnostic boundaries, societal costs and barriers to treatment in ways that will have importance implications for mental health care policy for years to come and will frame our understanding of remaining limitations in such a way that the next generation of psychiatric epidemiological studies will be able to build on this generation to advance our understanding even farther.
