Consorting with scoundrels? The perils of trials by checklist

Abstract

Someone is bound to have claimed the randomised controlled trial (RCT) as statistics’ greatest gift to medicine. Psychiatry was, perhaps surprisingly, most receptive to this gift, with trials of early antipsychotics and antidepressants adopting RCT methodology to varying extents. However, psychiatry’s real innovation has been the conduct of RCTs to evaluate psychotherapies, behavioural and other non-pharmacological interventions. This was surely revolutionary. Today, trials provide evidence for the effectiveness of the spectrum of interventions in mental health ranging from the treatment of individuals to programmes aimed at improving the health of populations.

Over half a century of trials have seen many advances in their design, conduct and reporting. All aspects of trials are becoming ever more regulated (see, for example, International Conference on Harmonisation [ICH], 1996; Schulz et al., 2010), most of which is highly beneficial. However, increasingly, one sees proposals for ‘CONSORT compliant’ trials! Simplistic prescription risks promoting the idea and practice that doing an RCT is simply a matter of adopting routine approaches to the design and conduct of trials, ticking boxes and reaping the evidence. Pressing into service a pro forma of what needs to be reported as a guide to what needs to be done diverts attention from optimising the design of individual trials and curtails discussion of broader issues concerning the way we conduct mental health trials.

What follows is a sampler – a personal, incomplete and possibly idiosyncratic one – of issues that are germane to the current conduct of trials in psychiatry, with particular focus on non-pharmacological treatments.

Into the trial …

The complete Consolidated Standards of Reporting Trials Consort

Most researchers will be familiar with the Consolidated Standards of Reporting Trials (CONSORT) diagram which shows the path of participants through a trial. What should be the first box of a complete flow chart is typically missing – the one showing the population from which participants were sampled. This may be because so few trials involve any formal sampling, whether from the community or defined patient groups, and therefore cannot report this information. Trialists are often creative and resourceful in soliciting participation, but the end result is that little is known of how representative those recruited are of patients who might be candidates for the treatments under investigation. In addition, little insight can be gained into the acceptability of the trial or its interventions to potential participants or recipients of treatment, as it is not possible to calculate the proportion of the eligible population exposed to an invitation to participate who are ultimately recruited. For instance, mid-20-century trials were dominated by the captive populations of psychiatric hospitals. Today, one suspects that many trials are populated by ‘seekers’: those left unsatisfied by previous treatment. There is nothing inherently wrong with this, but overrepresentation of cases of refractory illness in a trial may risk obscuring the demonstration of effectiveness of a treatment in those with incident or recently onset illness.

Lasagna’s Law

Psychiatric disorders are highly prevalent until one commences a trial, at which time the pool of potential participants seems to evaporate. Many trials fail to meet already suboptimal recruitment targets or have extended recruitment phases. This increases costs, inflates Type II error and reduces the value of the trial concerned.

Intention to be treated

While patients and subjects have become consumers and participants, the dominant RCT paradigm remains intention to treat (ITT), an inherently clinician-centred viewpoint. In psychiatry, perhaps more than in other branches of medicine, intention to be treated needs to be considered. People have very polarised views of psychiatric treatment, even if they have never received any: drugs, psychotherapy, exercise or ehealth may be seen as either essential or unimaginable. In simple treatment comparisons, this may not be a problem: those who find the intervention unacceptable will simply not agree to participate. However, many trials involve comparisons between types of treatment. For instance, psychotherapy may be compared to a drug (with appropriate placebos in place). Psychotherapy may be acceptable to some patients who do not wish to risk being exposed to a drug. Others may not want to risk missing out on drug treatment. What type of patients ultimately volunteer for such a trial and how representative are they of patients for whom the treatments being trialled might be suitable? Who knows!

Intention to be treated becomes even more complex when it changes during the course of a trial. This is manifest as non-compliance or withdrawal and will be discussed later.

There is no simple or single answer to all the issues surrounding recruitment to trials. However, poor rates of participation appear almost universal. We need better ways of recruiting participants into trials, not just better reporting. Greater literacy and understanding in patients and the community of the role of trials in psychiatry (and medical research in general) would promote increased and more informed involvement in trials. An open, transparent national internet recruitment portal for trials in mental health would have the potential to reach people currently unlikely to participate. Such a portal, together with other supporting activities, would increase literacy about trials in the general population and willingness to become involved in them.

The trial

Most trials in psychiatry are of relatively simple design. This is usually appropriate: the nature of most non-pharmacological treatments precludes designs like crossover trials, and advances like adaptive allocation and premature stopping are useful when outcomes are likely to be clear. However, even within the constraints of simple designs, there are important matters to consider.

Phases

The classification of pharmaceutical trials being phase 1, 2 or 3 (and sometimes, 4) is widely recognised and reflects the need to demonstrate that a novel molecule is not toxic and is tolerable (Phase 1); investigate whether it offers any benefits; and, if so, at what doses (Phase 2) before mounting a large scale trial of its effectiveness in clinical settings (Phase 3). Phases are less clearly defined for non-pharmacological interventions but are equally important. Many newly developed psychotherapeutic and behavioural interventions have, as their first trial, a self-described ‘pragmatically oriented Phase 3 RCT’. When they are undertaken, Phase 2 trials are often simply underpowered Phase 3 trials. Better understanding of the mechanisms of action of non-pharmacological treatments can only increase their potential benefits, by refinements to effective components, removal of ineffective elements and enhancement of their accessibility and acceptability. Given that these therapies are often resource intensive, optimising them should have a high priority.

There is a need for more early-phase, developmental trials of interventions in psychiatry. One suspects that the paucity of such research does not reflect lack of enthusiasm on the part of clinicians and researchers but the priorities of funding bodies who prefer later stage work which will purportedly yield results that are ready to translate into practice. It would be helpful to have a better organised system of classifying phases of non-pharma trials.

Power and precision

Much effort is applied to these statistical fantasies required by funders and ethics committees. Forecasted effect sizes are frequently aspirational rather than realistic. Researchers seem prepared to launch major studies even when the odds of concluding that a genuinely beneficial therapy is effective are as low as 4:1 (which is what 80% power corresponds to). It is not uncommon for trials to have multiple outcomes with none identified as the primary outcome or, nearly as bad, where all are treated as primary. The inflating effect on Type I error in these situations would surprise many readers and even the researchers involved.

Even when there are good grounds to expect a large effect, the modest sample justified by power analysis is bound to yield estimates of effectiveness of very low precision: confidence intervals will range from just above zero to unrealistically large. Even if the outcome is statistically significant, such a trial is relatively uninformative. In contrast, precise estimates are informative even if the outcome is not statistically significant.

Type II error is a menace in the development of new treatments: it risks condemning new treatments prematurely to the waste bin. Researchers may be optimistic about their newly developed intervention but must be realistic about planning its trial. Where a trial has multiple outcomes, significance tests and power analyses must allow for them. Precision of the estimates of effect that a trial will yield should be considered as being equally as important as power.

Pre-test versus post-test is no test

Participants embark on a potentially life changing event – an experimental treatment – that offers the possibility of alleviating their psychiatric disorder. It is proposed to ascertain their status before treatment and after it has concluded, by which time 20–30% of those who entered treatment will no longer be available and the loss will be almost certainly be non-random. This is the pre-test/post-test design. It is a self-evidently inadequate approach to collecting data, yet it dominates non-pharmacological trials. The reason for this may lie, in part, in the conceptualisation of many forms of psychotherapy as ‘packages’ where little point is seen in assessing outcomes before completion. However, the result is the collection of the most minimal information about response to treatment. Problems are exacerbated when there is dropout. Even when the mechanism of missingness is benign, knowing only the status of participants before treatment means the chances of valid estimation effectiveness are substantially reduced.

Except where it is absolutely unavoidable, the pre-test/post-test design should be consigned to history. Participants should be assessed during the course of a trial: even only one or two intermediate estimates are worthwhile. Where resources limit the number of occasions of measurement, consideration should be given timing these to coincide with stages of greatest change or when risk of withdrawal is highest. Assessment schedules should be chosen to maximize participant acceptability and to avoid precipitating otherwise avoidable dropout.

The randomness of randomisation

Random assignment of participants to treatment arms guarantees unbiased allocation with regard to factors that may influence outcomes, regardless of whether they are known or not. It does so in the long run and for large samples. But an individual trial is not the ‘long run’ and, unfortunately, too few trials are large. Alternatives to random assignment exist: minimisation is occasionally used in small studies. Most larger RCTs rely on stratified randomisation to ensure balance on prognostic variables. This allows only a limited number of variables to be included, and there seems to be a routine ‘menu’ of stratification variables considered regardless of their actual prognostic status. This may reflect the use of stratification to ensure representativeness of participants in each arm of the trial rather than any attempt at improving efficiency. This unofficial role may well be a reasonable one. Few people would take much notice of the outcomes of a trial if bad luck in randomisation meant that young males were predominant in one group with older females prevalent in the other, even if age and gender were not related to outcome.

It would be a brave person who deviated from the path laid down by Hill in defining the R in RCT (Medical Research Council, 1948). Nevertheless, alternatives to conventional randomisation should be actively considered when small samples are involved. This will almost always be the case in cluster randomised trials. Even in larger trials, methods of assignment should be constrained to ensure balance between arms, and generalisability should be explored.

The treatment and the patient versus ‘the treatment and patient’

The purest trial would be one in which each participant, in an identical state, is exposed to each of the interventions being evaluated. Analysis would ‘condition out’ participant effects, leaving the effect of the interventions alone apparent. This is largely statistical sophistry but underlines that trials are largely about the intervention under investigation (We want to be able to say ‘It works’.), while clinical practice is about patients (We want to be able to say ‘They’re better’.).

Personalised medicine is the new frontier of medical treatment. It largely relies on genotyping. Applications are emerging in drug treatments in psychiatry, notably regarding dosing according to patients’ genetically determined ability to metabolise particular drugs. The potential for non-drug treatments to exploit these developments is less clear, although one very basic role may be identifying patients likely to respond to current pharmaceuticals so that other treatment can be offered to non-responders. Regardless of the promise of genes, trials aimed at determining the type of treatment that will work (best) for individual patients should be a high priority.

There is great variability in response to all forms of psychiatric treatment. The search for predictors of response has been limited, but patient attributes at presentation with strong enough predictive power seem unlikely to emerge. The path of patients through treatment is most likely to be dictated by their response or non-response to what is offered to them. This dictates that non-pharmacological trials marrying treatment and patient are likely to be staged or stepped sequences of treatments. Due to the inherent diversity of paths taken by participants through such trials and the non-random subgroups formed, the tasks of mounting, analysing and interpretation are extremely demanding in terms of design, trial conduct and participant numbers required. A particular danger is that an envelope of interventions that is judged to be effective overall may include components of no value to patients.

Out of the trial

One of the most critical events in the course of a trial occurs when a participant discontinues their assigned treatment (non-compliance) or the trial itself (dropout/withdrawal). Trials in psychiatry routinely lose one-fifth to one-third of those who commence (see, for example, Hans and Hiller, 2013). This compromises any conclusions drawn from the study.

Withdrawal as an event

Discontinuation of treatment is a routine part of medical practice. It may occur due to response, non-response, side effects and a range of other factors that may or may not be related to treatment or to the person being treated. Discontinuation in trials is either ignored or treated as a nuisance despite its high prevalence. Abandoning the allocated treatment or the trial itself has the potential to substantially distort the outcomes of a trial either in favour and against an intervention. This applies particularly when discontinuation occurs at different rates and for different types of participants in each arm.

Where a substantial rate of discontinuation of trial treatments is likely, this should be allowed for in the trial design and, most likely, be considered as an outcome. Discontinuation of treatment should be separable from discontinuing involvement with the trial itself. The former is relevant to the interventions and to clinical practice. The latter applies principally to the unique circumstances of a trial. The reason for discontinuation and the participant’s status on outcome variables when it occurs should be established. Methods for the joint analysis of events such as discontinuation and other trials outcomes are emerging and warrant wider consideration.

Missing data

Modern statistical approaches to missing data are rocket science or, at least, are derived from astrophysics. While this may have helped astronauts return from space, nothing can bring back missing observations that were never actually collected. The best that can be done is to make optimal use of all data available. Anything that goes beyond this must add conjecture – often sanctified as statistical assumptions – to that data. Speculations that participants who drop out did not improve, are like those who remained, or showed any other pattern of change are just that – speculations.

The availability of sophisticated statistical procedures to deal with missing data often improves estimates of outcomes from trials and may reduce bias. However, reliance on them risks diverting effort and attention from the much more important pursuit of minimizing its occurrence. Trials should be designed with maximizing retention in mind. This may involve ensuring that participants really do have the intention of being treated and that intervention and assessment protocols are not unduly burdensome. Speculatively, one notes that during the course of a trial, participants willingly enter into binding contracts for mobile phones, gym memberships and wedlock. Is it too much to ask that, whatever their engagement with the intervention and course through a trial, they commit to providing data for its primary outcomes? And that they are appropriately compensated for their commitment?

After the trial

Is the analysis ITT?

Only the bravest researcher would seek funding for a trial without asserting that analyses will be undertaken on an ITT basis. As usually implemented, ITT estimates conflate two concepts. The first is analysis ‘as randomised’: almost always a good idea. The second reflects a pragmatic (real-life) orientation of a trial: sometimes a good idea. Real ITT estimates of the effectiveness of a treatment can be obtained only when the entire trial is appropriately designed and run: participants must be a sample of patients one would intend to treat, treatment must be delivered as it would be in routine practice and the comparator must be a real alternative. Statistical analysis alone does not make a trial or its outcomes ITT. Other estimates of effect can address important questions about a new treatment. For instance, it can be informative to estimate the effect of an intervention in participants who have received it in full or who have, at least, been exposed to a reasonable part of it. Note that doing this is not necessarily straightforward: comparing those individuals who complete the new treatment with those who complete the comparator risks confounding treatment effects and subsample differences. Nevertheless, methods do exist to make such estimates.

From a purely statistical perspective, many valid estimates of the effect of an intervention can be derived from a trial. Analysing outcomes ‘as randomised’ is almost always essential but it may be as important to address other questions. These must be carefully framed and implemented.

The trail forward

Transparency and wastage

Registration of trials, publication of protocols, adherence to ICH guidelines and guidelines for protocol and trial reporting have greatly improved the dissemination of information about trials, but further improvements are necessary. Registration does not guarantee that outcomes will be reported accurately and in a timely manner, if at all. The results of many registered trials are never published. Many believe that this reflects publication bias. This is almost certainly partially true: it seems easier to get small- to moderate-sized trials published when their outcomes are positive than if they are negative or inconclusive. However, researchers themselves are also likely to be at fault. Enthusiasm often wanes when a trial does not produce a positive outcome, and there is always the next big idea and next grant to pursue. When results are published, the length of time from conclusion of the trial to submission of an academic paper and its ultimate publication are often lengthy. In an era when amateurs can disseminate their views almost instantly to millions of people, this is surely unacceptable. And what of the results themselves? We rely heavily on the authors of papers that data have been appropriately analysed and reported. ‘Trust me, I’m a doctor’ may still carry some weight, but ‘Trust me, I’m a statistician’ never has! Reflecting the meagre statistical workforce in Australia, many trials are analysed by researchers with limited statistical knowledge. This does not imply that results are necessarily unreliable but reinforces as highly desirable the ability for external researchers to access trial data in order to confirm reported findings, extend findings or resolve other matters.

Efforts to ensure that outcomes of all registered trials become accessible in a timely manner are underway on a number of fronts. So too are initiatives by funding and governance bodies, and by journal publishers aimed to making data from trials accessible (see especially World Health Organization, 2015). The focus of many initiatives is drug trials but, eventually, all trials will be covered. This is something to applaud, not to fear, but everyone who works in trials must be prepared for fuller scrutiny of their work and for any repercussions of failure to communicate the outcomes of work proposed and undertaken.

Networks

It should be apparent from many of the issues raised above that answering key questions regarding new treatments in psychiatry with sufficient reliability and precision will almost invariably require larger samples than available to individual research teams. Trials initiated by a network of researchers and clinicians can achieve this goal, with added benefits of including the full spectrum of expertise required to plan and execute a trial. This goes well beyond the strategy of pharmaceutical companies that recruit multiple sites but where site participation is largely limited to recruitment. It might be hoped that networks of established researchers would engender the confidence of funding bodies to support the high cost of good trials.

In contrast to other branches of medicine, the formation of trial networks in psychiatry in Australasia is almost unknown. There are no obvious impediments to this; local researchers form smaller collaborations and participate in foreign networks.

Major psychiatry trials need to be undertaken within networks. Professional, research and funding bodies should encourage their formation. Barriers to the development of networks should be identified and addressed.

Prospective meta-analysis

This may sound like an oxymoron, but many – perhaps even most – good trials will end up in a meta-analysis. So why not plan for this from the start? Governance and reporting guidelines certainly help but meta-analysts must still attempt to aggregate what are inconsistent and incompatible data sources. As an antidote to this, imagine a ‘Trial-in-a-box’. It comprises everything necessary to mount a trial: libraries of measurement instruments, training videos for those delivering particular interventions and documentation regarding governance and quality control, a ready to install extensible database/trial management system or Internet access to one, and scripts for data checking and analysis. Such a system would allow researchers to mount a trial locally with very modest resources. They might retain only core measures or add others and side studies could be accommodated. The local group would ‘own’ their study, but contribute data to combined analyses.

Much of infrastructure and many of the components of Trial-in-a-box already exist, but in an environment where the most highly regarded funding is competitive rather than cooperative, ‘crowd sourced’ trials may require the creation of an antipodean National Institute of Mental Health. Even partial adoption of standardized trial components, such as consensus outcome measures would dramatically enhance the ability to aggregate trial outcomes and reach conclusions about treatment effectiveness faster and more confidently.

Conclusion

None of the above should detract from the outstanding trials undertaken in psychiatry. The conduct of trials is as evolutionary as every other field of research. However, we must never lose sight of why trials are conducted: they are not ends in themselves but are essentially filters, separating out interventions with genuine benefits from those that seemed like a good idea at the time. Current treatments offer benefits to many patients, but we have no cures, and many needs are left unmet. There is a desperate need for innovative and creative approaches to the treatment and prevention of mental illness. The pursuit of methodological rigour in trials must not be at the expense of the development of new treatments.

Footnotes

Acknowledgements

The author would like to thank Susy Harrigan for helpful comments on an earlier draft of this manuscript.

Declaration of interest

The author reports no conflicts of interest. The author alone is responsible for the content and writing of the paper.

Funding

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

References

Hans

Hiller

(2013) Effectiveness of and dropout from outpatient cognitive behavioral therapy for adult unipolar depression: A meta-analysis of nonrandomized effectiveness studies. Journal of Consulting and Clinical Psychology 81: 75–88.

International Conference on Harmonisation (1996) ICH harmonised tripartite guideline. Guideline for good clinical practice E6(R1&R2). Available at: http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E6/E6_R1_Guideline.pdf

Medical Research Council (1948) Streptomycin treatment of pulmonary tuberculosis. A Medical Research Council investigation. British Medical Journal 2: 769–782.

Schulz

Altman

Moher

(2010) CONSORT 2010 Statement: Updated guidelines for reporting parallel group randomised trials. BMC Medicine 8: 18.

World Health Organization (2015) WHO statement on public disclosure of clinical trial results. Available at: www.who.int/ictrp/results/reporting/en/