Abstract
Supporting a national drive to improve the quality of mental health care is a key responsibility for a professional body such as the Royal Australian and New Zealand College of Pyschiatrists (RANZCP). The mechanisms available for improving quality are multidimensional and can include practices variously described as evidence-based medicine, total quality management, accreditation and accountability, professional development and consumer and carer empowerment [1]. Development of specific treatment guidelines, and their ongoing review and updating, are recognized as two of the most important processes that contribute to a credible quality improvement framework.
While Parker's critique of the RANZCP's clinical practice guidelines for the treatment of depression largely repeats his previous and widely reported concerns about current international classifications of depression [2], we consider that many of his detailed criticisms simply reflect a difference of opinion on how data should be evaluated or interpreted. Importantly, we make no apology for utilizing an evidence-based approach to this project, reflecting international and National Health and Medical Research Council (NHMRC) practice. The general limitations intrinsic to guidelines development have long been recognized and are well articulated by others [1], [3].
Evidence-based guidelines
Clinical practice guidelines (CPGs) can be of variable quality and can be particularly unhelpful if they are not based on the best evidence available, are not developed systematically and/or present the opinions only of selected individuals, research groups or other vested academic or commercial interests. Such issues relating to the assessment of the quality of guidelines need to be differentiated, however, from the inevitable gap between the available evidence-base and the complex clinical decisions that professionals need to make every working day. Guidelines seek to highlight those areas where there is clear evidence available to inform key clinical decisions, but equally emphasize those areas where there is insufficient information to make definitive statements. That is, guidelines are ‘powerful tools… when they are well developed and implemented’ [1] but have their specific limitations and do not, on their own, represent a comprehensive clinical approach.
Until an improved, alternative or modified paradigm arises in academic or clinical medicine, we consider that the profession should continue to embrace and promote the process of evidence-based guideline development. Currently, it does represent an internationally accepted standard for summarizing a wide range of data, and has largely replaced the idiosyncratic or highly selective academic review of a topic area that preceded it. We regret that Parker has not followed up his earlier similar criticisms [2] with an operational description of a demonstrably more powerful methodology.
We share Parker's concern that the treatment of depression is a most important matter. We recommended that it be based on a thorough evaluation of the individual, mindful of their particular circumstances, in the context of a sound therapeutic relationship. Appropriate treatment should be selected collaboratively with the depressed person, continued for an adequate period and the risk of relapse addressed actively. We agree with Parker on the essence of treatment.
The RANZCP CPGs follow the ‘guidelines for guidelines’ [4], which are aligned with international models. These draw heavily, but not exclusively, on randomized controlled trials. These have well recognized drawbacks as well as strengths, as outlined in the introduction to the CPG series [5]. While we agree with Parker that guidelines should meld supportive evidence and clinical wisdom, the balance between these two essential elements will always be controversial.
Randomized controlled trials (RCTs) have been the bedrock for evaluating new treatments since the early post-war period. They have well-known limitations. The recent re-examination of the role of selective serotonin re-uptake inhibitors (SSRIs) in depressed adolescents [6] has raised serious concerns about bias in drug company sponsored treatment trials and the effectiveness of the regulatory agencies in ensuring trial integrity. Clinical impression of benefit has also had its pitfalls (e.g. ‘evaluation’ of psychosurgery and deep sleep therapy) and the influence of advertising on clinical behaviour is increasingly recognized.
Even when there is agreement on the data, its interpretation will vary between different clinicians. This is not a problem unique to psychiatry or depression. Differences in emphasis and interpretation underpin disagreement between guidelines for the use of anticoagulants [7], treatment for breast and ovarian cancer [8] and the management of diabetes [9].
Classification of depression
Parker disparages reliance on RCTs because of reservations about the homogeneity of DSM-IV ‘major depression’; a perception that illness severity was used as the only defined subcategorization of depression in the CPG; and because RCTs ‘favour rapid responders’.
The correct subdivision of depression remains controversial. Would that we could ‘carve nature at the joints’ in this area, as Kendell enjoined us [10]. There is clear agreement that bipolar disorder differs from unipolar depression, and that psychotic depression differs too. Parker has argued persuasively for the existence of a melancholic subgroup [11] although Angst and Merikangas regard depression as a continuum [12] and Kendell and Jablensky note the lack of clarity of natural boundaries to most psychiatric diagnoses [13] The classification and criteria for mood disorders therefore remain controversial.
These guidelines relate to the current concepts of DSM-IV, the most widely used classification system in Australasia, which maps closely to the officially sanctioned ICD-10. This utilitarian approach may lack intellectual rigour, but when some of the best minds in psychiatry cannot agree on the most appropriate classification, a pragmatic approach is required. We chose to use DSM-IV. It allows the current research base (built, for better or worse, on this classification) to be summarized using constructs familiar to most practitioners. We expect greater consensus on, and sophistication in, classification in the future. We expect this will, in time, lead to significant advances in the evidence base.
Parker claims that we clumped all forms of depression together indiscriminately. We explicitly acknowledged the existing subdivisions in the DSM-IV classification, commenting separately on the different evidence related to psychotic depression, atypical depression and more severe non-psychotic depression. Of these, atypical depression is the least secure category (indeed it is another focus of criticism by Parker [14]), yet it remains in the classification. Parker has argued for the utility of the diagnosis of ‘CORE’ defined melancholic depression [15]. We did not include evidence summaries on melancholic depression because there are few treatment trials sharing a common definition of this syndrome. Severe depression generally includes significant numbers of melancholic symptoms. We did distinguish ‘severe’ from ‘moderate’ depression, to reflect specialist practice and to examine the differential efficacy of tricyclics and SSRIs in severe depression. The data from this P.M. ELLIS, I.B. HICKIE, D.A.R. SMITH 893 comparison did not support a significant body of clinical opinion. In the absence of agreed mechanisms for determining the relative value of clinical opinion, we chose to prefer the evidence base.
Randomized controlled trials
Parker states that RCTs tend to recruit subjects who are more prone to respond than patients treated in routine clinical practice. Clinical trials usually exclude people with significant co-morbid conditions. Co-morbidity, common in specialist practice, is often associated with poorer outcomes. More recent trials often include a ‘wash-out’ period to exclude ‘early responders’. For all that, the placebo response rate in ‘severe depression’ for the studies described in Table 3 was 31%. ‘Placebo’ is not ‘no treatment’, but includes therapeutic engagement and provision of hope and encouragement.
Parker argues meta-analyses of RCTs indicate that placebo is as effective in treating depression as any other treatment and thus any attempt to use such data is pointless. Among other selected examples, he quotes a portmanteau consideration of Food and Drug Administration (FDA) data which reaches the provocative conclusion that antidepressants are no better than placebo [16] ‘Junk in, junk out’ is a truism of meta-analysis. We note that the finding of our meta-analyses, and those of comparable American Psychiatric Association [17] and British Association for Psychopharmacology guidelines [18], based on careful selection of quality studies, agree that antidepressant treatments are superior to placebo. The similarity of response rate is perhaps not so surprising when nearly all these agents have been developed based on similar theories of action, tested in similar animal models.
The criteria for inclusion of studies in our summary tables were that they described well-conducted trials of adequate ‘doses’ of treatment over an adequate period of time (e.g. treatment with imipramine 150 mg or fluoxetine 20 mg for at least six weeks, minimum Hamiliton Depression scores of <17 for moderate and < 23 for severe depression). We also required that there was sufficient outcome data on the primary outcome measure to calculate ‘intent to treat’ NNTs (numbers needed to treat).
Specific issues
Parker is unclear about the process of developing these guidelines. These are described on pp. 638, 639 of the introduction to the series [5] and briefly recapitulated on p. 391 of the guideline [19]. The criteria for including or excluding the studies were stated. The search terms were listed, the databases and other sources indicated, the basis of response was defined and the statistical methodology referenced. We chose not to clutter the text with specific levels of evidence for every phrase. These were indicated clearly in colour in the figures showing treatment recommendations and the text is referenced conventionally.
He comments that there is ambiguity in the terms used to describe depression. References to the results of epidemiological studies in the introduction necessarily used the terms of those studies. We used the definitions in Fig. 3, those of DSM-IV, to classify trials and recommend treatments.
Parker states that a recommendation to use clinical rating scales is naive. We, like his Mood Disorders Unit, consider clinical judgement should be supplemented by formal measurement. The data presented reflects particular cut points on certain rating scales and thus it is logical to encourage their use so practitioners can relate their patients' problems to the database.
He argues that the tables should not have excluded drugs where few published trials are available and should not have attempted any related statistical analysis, at least not without explicitly commenting on the risk of a type II error (the risk of the comparison failing to show a difference between two treatments when such a difference in fact exists. The most common reason for this is reliance on inadequately powered studies, with too few subjects [20]). We make no apology for summarizing the data. We presume readers of the Journal are statistically aware and recognize the limitations of single rather than multiple studies on a given drug. It is implicit in science based on a positivist paradigm that one can disprove a hypothesis but never prove it. It is always possible that a larger study, or one with different inclusion criteria, will demonstrate a previously unrecognized difference. The issue for the pragmatic clinician is whether the difference between two treatments is clinically significant, in terms of either benefit or burden, and whether that data exists today. We would argue that once the NNT comparing two treatments exceeds 15 or 20, most clinicians and their patients would make their choice based on other factors, assuming the data are relevant to their particular situations.
Parker raises the issue of publication bias, resulting in ‘negative’ studies not reaching publication and thus biasing the results. The Cochrane Collaboration seeks to identify all such studies by contacting active researchers, reviewing conference proceedings, and the like. It is our view that this is not particularly effective and is honoured more in the breach than the observance. We did not have the resources to pursue this model. One could argue that the inclusion of only published peer-reviewed studies provides an element of quality control. It is a recognized but unsolved problem in evidence-based research.
We are grateful to Professor Parker for drawing attention to a typographical error in Table 3. The asterisk in this table should be against the comparison between venlafaxine and the SSRIs as a group, as indicated in the text and the legend.
Parker is critical of our recommendations on psychotherapy. A key difference between the sources of data we considered and some of the reviews he references is that we actively excluded studies incorporating data on people with mild depression. This resulted in fewer, but we would argue more relevant, studies. Parker refers to Wampold's re-evaluation [21] of Gloaguen's metaanalysis [22], questioning the specific advantages of cognitive behaviour therapy, yet this in turn has been criticized [23], as its conclusions rest on the exclusion of a single study, selected as an outlier. The interpretation of the National Institute of Mental Health study by Elkin [24] has also been revised, following recognition of the different outcome of those with atypical depression [25]. Far from excluding other forms of psychotherapy, we were careful to acknowledge the role of dynamically informed therapy and drew attention to the difference between structured psychotherapy as practised routinely and as encountered in a clinical trial.
Different study selection, of those with a focus on at least moderate depression, also underlies our minor differences on St John's wort. Indeed, one major review he quotes explicitly notes the lack of evidence on the use of St John's wort in severe depression [26].
We could continue. We are criticized for including nefazadone when its withdrawal from the Australian market was imminent. This was said to be on commercial grounds, and as it remains available in the US, we presume it may return, unless safety concerns preclude this. We are criticized for providing relatively more data on pindolol than longer established augmenting agents. The place of these is no longer controversial while pindolol's role in augmentation is so controversial and still of current interest [27–29]. It is not appropriate to quote every review in an evidence summary, so we did not include the ‘local publication’ [30], although this does suggest a mechanism which may account for the varying results to date on augmentation. In contrast, the critique castigates discussion of transcranial magnetic stimulation (TMS) and omega-3 fatty acids for their brevity. Our comments on these approaches, though brief, echo the relevant current Cochrane reviews, which consider the majority of studies in these areas to be of poor quality. We stated: ‘There is scant evidence of benefit but research samples have been small’ regarding TMS and ‘While omega-3 fatty acid levels are low in depression, there is no evidence that they improve depression. Further research is warranted.’ Brevity should not be mistaken for disinterest. Comment on atypical antipsychotics in the treatment of depression would be appropriate now, but when we completed our literature review in late 2003, evidence was scant on its use in major depression, in contrast to bipolar disorder. Emergent positive findings from an open label study were indexed in the penultimate week of 2003 [31] (after our literature review). An earlier review reported that two larger scale replications had failed to confirm initial promise [32].
Conclusion
We could continue to take issue with each point raised in Parker's critique. We hope it is clear to the impartial reader that our preparation of these guidelines was far from a thoughtless or careless process. A full rebuttal would make this commentary longer than the guideline itself. As we indicated earlier, clinicians and academics across medicine interpret the same data differently. Readers, as always, must make their own judgements on the value of the message and act accordingly. These guidelines were discussed widely, were placed on the College web site for an extended period, drew valued written comments from colleagues and were the focus of heated discussion at the College Congress on the recommendation on psychotherapies for depression. There were significant efforts to inform and engage Fellows and trainees in the overall process and the consultation on specific guidelines.
We see the core of Parker's concerns as lying more in the issue of how best to generate such guidelines. While there are embryonic models for evaluating qualitative research on treatment, there is very little, beyond forms of the Delphi consensus, on how to formally incorporate clinical experience, uncontaminated by commercial pressures, research allegiances, or force of personality of current leaders, and not restricted by the limitations of current service provision models. We used the framework of evidence-based medicine. In doing so, we have included the basic constructs that underpin high-quality guideline development as defined by others [3] and the RANZCP CPG process.
