Sage Journals: Discover world-class research

Abstract

The randomised controlled trial of antipsychotic reduction and discontinuation addresses essential questions with regard to continued use of antipsychotics in schizophrenia, pertaining to social functioning and continued antipsychotic use. However, significant methodological issues stated in the trial protocol have the potential to confound interpretation of any findings. These include use of a non-blinded outcome measure, treatment as usual comparator and possible sample size issues.

Keywords

Antipsychotics randomised controlled trial schizophrenia

We read with interest the trial protocol for Research into Antipsychotic Discontinuation and Reduction (RADAR), a randomised controlled trial (RCT) of gradual antipsychotic reduction and discontinuation (Moncrieff et al., 2019), and look forward to publication of its findings.

In our opinion, two design issues may affect interpretation of trial results,

Use of treatment as usual comparator, non-blinded primary outcome

The study design is an open-label RCT, with treatment as usual (TAU) as comparator, the primary outcome being social functioning, measured by the social functioning scale (SFS) (Birchwood et al., 1990). This asks questions of the participant, with no inference made by the investigator. Participants are randomised to either antipsychotic reduction/discontinuation or TAU (maintenance treatment). In the intervention arm they appear to see a study psychiatrist, who one presumes may have contact with their treating team.

A non-blinded primary outcome measure (the SFS) is used despite the fact that clinician-assessed (and therefore potentially blinded) scales such as the Personal and Social Performance are available and used in clinical trials of antipsychotic medication (Kahn et al., 2018). Although the assessor may be blinded, the SFS score itself will be non-blinded as it relies wholly on subjective report of the participant, who will know if they are in the intervention arm or not.

In RADAR the intervention arm sees their treating team every 2 months and will therefore receive more input than TAU.

Three sources of bias (placebo effect, demand characteristics and investigator allegiance) could thus affect SFS scores. Unlike regression to the mean and random variation, the placebo effect is a non-specific psychological response to an intervention, with its own mechanisms. It is invoked by stimuli such as differential attention and expectation, good quality relationships and can generate statistically significant differences (Leucht et al., 2017; Wager and Atlas, 2015).

As participants are aware of what the researchers are investigating, what their intended findings are and how participants are expected to behave (McCambridge et al., 2012; Orne, 2017), they may modify their behaviour, in an attempt to be “good subjects”, as recognised in the Psychology literature for 60 years. This could be exacerbated by allegiance bias, where the study investigators might have a potential interest in the intervention.

Therefore, the study design makes it impossible to tell if any change in SFS is due to intervention (antipsychotic discontinuation or reduction), or non-blinded outcome measure and potential effects on the intervention arm, which have not been controlled for.

Whilst it may have been difficult to incorporate the ideal psychological placebo (or antipsychotic placebo), contacts with the community mental health team could have been matched, as could contact with the study psychiatrist. Moreover, there may be potential nocebo effects within the TAU group.

Power calculations

Adequate power is essential for interpretation of study results. As well as theoretical issues (Button et al., 2013) practical studies have shown low power exaggerates effects of bias (Kossmeier et al., 2020). As written, we could not understand which power calculation is informing the sample size calculation, specifically whether the authors intend both a superiority or non-inferiority analysis, and what assumptions were made. In the protocol they describe a sample size calculation estimating N = 134–206 based on a 4–5-point Minimal Clinically Important Difference (MCID) for SFS but do not make clear how they derived this, though the use of a MCID is sensitive to method choice (Woaye-Hune et al., 2020). The authors reference three studies to indicate sensitivity to change sufficient to identify ‘good and poor outcome’ (Hellvin et al., 2010; Jaracz et al., 2015; Leeson et al., 2012). Only one cited study measured good and poor outcome (Jaracz et al., 2015). The other cited studies do not demonstrate a relevant significant difference in SFS. One was a validation of a Norwegian version of the SFS, with comparison between bipolar disorder, schizophrenia and healthy controls, showing a difference between groups (Hellvin et al., 2010), the other was in people with schizophrenia who used and did not use cannabis, where no difference was found at follow-up (Leeson et al., 2012). They also cite four studies as showing a difference in SFS between antipsychotics (Ciudad et al., 2006; Jang et al., 2011; Koshikawa et al., 2016; Sheridan et al., 2015); only two examined antipsychotics (Ciudad et al., 2006; Koshikawa et al., 2016). They state the scale differentiates those with short and long duration of untreated psychosis, though no difference was seen in total score (Barnes et al., 2008). The authors also appear to have used Mean and Standard Deviation values for sample size calculation from different study populations, with no clear explanation given.

There is a non-inferiority sample size calculation for severe adverse events, reporting N = 402 for a 10% margin. With recruitment completed, their final sample size is N = 253 (University College, London, 2020).

Current European guidance recommends significant adverse events be given priority in power calculations (Anonymous, 2018: 1). The figure of 253 is less than either their own non-inferiority estimate or recommended range (300–600) in the guidance. According to the published protocol, summary review of serious adverse effects will only take place at 3 months. As the intention is to reduce medication every 2 months for around a 12-month period, the ability to detect adverse events before study completion appears limited.

The study design has two unusual features, both of which can affect power. First, comparing two maintenance regimes, one reducing and one not. For both regimes, the treating clinician will be making changes based on the participants’ clinical presentation. This will affect outcome measures in RADAR and imposes a within-participant correlation on RADAR’s repeated measures of progress.

The authors are, quite reasonably, anticipating their postulated improvement in social functioning might be associated with deterioration in symptomatology, as a result of medication reduction. However, and irrespective of clinical significance of these symptoms, studies they cite identify associations between symptom scores and SFS scores. Thus, the study design imposes a negative bias on SFS scores, relative to studies used for their power calculations (none of which involved antipsychotic reduction or withdrawal). They will thus potentially over-estimate differences that RADAR might expect, leading to potential power over-estimation. One study they cite (Hellvin et al., 2010) reports the SFS correlates −0.12 (non-significant) with positive PANSS, and −0.28 for negative PANSS. The explained variance (square of the correlation coefficients) might significantly affect power and sample size needed.

We undertook our own power calculations, to test these theoretical concerns. We estimated power and confidence intervals for their final sample size (253), together with sample size for a power of 0.9 as per their protocol. Despite the odd number of participants, we assumed a 1:1 intervention: control ratio, as this was their original design, and we could not be sure which arm the odd participant was assigned to. We compared uncorrelated and correlated within-participant measurements, using the online package GLIMMPSE, which can model within-subjects correlation using Linear Exponents of type 1 AutoRegressive (LEAR) models, with parameters for both initial correlation and subsequent decay. We examined the treatment effect assuming a superiority design, a 5-point mean difference at 24 months, smoothly increasing, and the authors’ 8.8 SD in SFS, with 0.05 probability of type 1 error. We used both the default LEAR parameters recommended in GLIMMPSE, and parameters assuming the initial correlation would be the same as for repeat reliability of the SFS (0.78) with the maximum decay recommended by the authors of GLIMMPSE as ‘compatible with biological systems’. We believe this covers the plausible range of within-subject correlations and decay over study duration. We considered unadjusted differences, and notional 10% shrinkage in mean difference to account for the overlap with worsening symptomatology noted above. We estimated confidence intervals for power based on their 5-point difference. We tabulate our results below (Table 1). The first column gives relevant power for RADAR’s sample size, the second and third columns give values of lower (0.025) and upper (0.975) bounds of the values’ 95% confidence intervals. The fourth column (assuming equal group sizes) gives group sample size necessary to achieve 0.9 power under different LEAR parameterisations, and the last two columns give postulated time-dependent intercorrelations in within-subject scores and their decay.

Table 1.

Sample size calculation.

Power for RADAR sample size	0.025 Confidence Interval	0.975 Confidence Interval	Group sample size to achieve 0.9 power	LEAR adjustment
Power for RADAR sample size	0.025 Confidence Interval	0.975 Confidence Interval	Group sample size to achieve 0.9 power	Correlation	Decay
0.99	0.66	1	138	0	0
0.75	0.1	1	386	0.5	0.3
0.60	0.05	0.99	538	0.78	0.5
10% Shrinkage in mean difference
0.99	0.34	1	140	0	0
0.74	0.05	1	392	0.5	0.3
0.59	0.05	0.99	546	0.78	0.5

LEAR: Linear Exponents of type 1 AutoRegressive.

The table shows that, while shrinkage had less effect, the uncertainty in their estimated power increases sharply as the LEAR parameters increase, as do the predicted sample sizes, while power decreases. Therefore, the sample size of 253 is possibly insufficient to detect the MCID identified, given the study design. It is well-understood that interpretation of negative findings is impaired in an underpowered trial: here, negative findings could relate to symptomatic deterioration or adverse events. However, lack of power also impairs the interpretation of positive findings. RCTs do not compare probability distributions, but alternative models, whose relative likelihood may be assessed by the Bayes factor (Dienes and Mclatchie, 2018). A criterion for preferring H₁ of p < 0.05 for H₀ translates to a Bayes factor of about three, that is, the model generating H₁ is about three times as likely as that generating H₀. Thus, the chances of being struck by ‘survivor’s curse’ in an underpowered design is much greater than the 1:20 the typical p < 0.05 cutoff suggests.

Conclusion

This is a much-needed trial, though the design (use of essentially a self-report outcome measure, TAU comparator with resultant effects on primary outcome measure) and issues with power makes interpretation of findings difficult.

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: SJ has received honoraria for educational talks given for Janssen and Sunovian, and his employer (King’s College London) has received honoraria for educational talks he has given for Lundbeck. PF-P has received consulting fees from Lundbeck, Angelini, Menarini, and Sunovian, and honoraria for lectures given for Angelini and Menarini. DF declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Sameer Jauhar

References

Anonymous (2018) ICH E1 Population exposure: the extent of population exposure to assess clinical safety, European medicines agency. Available at: https://www.ema.europa.eu/en/ich-e1-population-exposure-extent-population-exposure-assess-clinical-safety (accessed 4 July 2021).

Barnes

TRE

Leeson

Mutsatsa

(2008) Duration of untreated psychosis and social function: 1-year follow-up study of first-episode schizophrenia. Br J Psychiatry 193: 203–209. DOI: 10.1192/bjp.bp.108.049718.

Birchwood

Smith

Cochrane

, et al. (1990) The social functioning scale. The development and validation of a new scale of social adjustment for use in family intervention programmes with schizophrenic patients. Br J Psychiatry: J Ment Sci 157: 853–859. DOI: 10.1192/bjp.157.6.853.

Button

Ioannidis

JPA

Mokrysz

, et al. (2013) Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci 14: 365–376. DOI: 10.1038/nrn3475.

Ciudad

Olivares

Bousoño

, et al. (2006) Improvement in social functioning in outpatients with schizophrenia with prominent negative symptoms treated with olanzapine or risperidone in a 1 year randomized, open-label trial. Prog Neuro-Psychopharmacol Biol Psychiatry 30: 1515–1522. DOI: 10.1016/j.pnpbp.2006.05.010.

Dienes

Mclatchie

(2018) Four reasons to prefer Bayesian analyses over significance testing. Psychon Bull Rev 25: 207–218. DOI: 10.3758/s13423-017-1266-z.

Hellvin

Sundet

Vaskinn

, et al. (2010) Validation of the Norwegian version of the social functioning scale (SFS) for schizophrenia and bipolar disorder. Scand J Psychol 51: 525–533. DOI: 10.1111/j.1467-9450.2010.00839.x

Jang

, et al. (2011) ‘Longitudinal patterns of social functioning and conversion to psychosis in subjects at ultra-high risk’, The Australian and New Zealand Journal of Psychiatry, 45(9), pp. 763–770. Available at: https://doi.org/10.3109/00048674.2011.595684.

Jaracz

Górna

Kiejda

, et al. (2015) Psychosocial functioning in relation to symptomatic remission: a longitudinal study of first episode schizophrenia. Eur Psychiatry 30: 907–913. DOI: 10.1016/j.eurpsy.2015.08.001.

10.

Kahn

Winter van Rossum

Leucht

, et al. (2018) Amisulpride and olanzapine followed by open-label treatment with clozapine in first-episode schizophrenia and schizophreniform disorder (OPTiMiSE): a three-phase switching study. Lancet Psychiatry 5: 797–807. DOI: 10.1016/S2215-0366(18)30252-9.

11.

Koshikawa

Takekita

Kato

, et al. (2016) The comparative effects of risperidone long-acting injection and paliperidone palmitate on social functioning in schizophrenia: a 6-month, open-label, randomized controlled pilot trial. Neuropsychobiology 73: 35–42. DOI: 10.1159/000442209.

12.

Kossmeier

Tran

Voracek

(2020) Power-enhanced funnel plots for meta-analysis. Z Psychol 228: 43–49. DOI: 10.1027/2151-2604/a000392.

13.

Leeson

Harrison

Ron

, et al. (2012) The effect of cannabis use and cognitive reserve on age at onset and psychosis outcomes in first-episode schizophrenia. Schizophr Bull 38: 873–880. DOI: 10.1093/schbul/sbq153.

14.

Leucht

Huhn

, et al. (2017) Sixty years of placebo-controlled antipsychotic drug trials in acute schizophrenia: systematic review, bayesian meta-analysis, and meta-regression of efficacy predictors. Am J Psychiatry 174: 927–942. DOI: 10.1176/appi.ajp.2017.16121358.

15.

McCambridge

de Bruin

Witton

(2012) The effects of demand characteristics on research participant behaviours in non-laboratory settings: a systematic review. PLoS One 7: e39116. DOI: 10.1371/journal.pone.0039116.

16.

Moncrieff

Lewis

Freemantle

, et al. (2019) Randomised controlled trial of gradual antipsychotic reduction and discontinuation in people with schizophrenia and related disorders: the RADAR trial (Research into Antipsychotic Discontinuation and Reduction). BMJ Open 9: e030912. DOI: 10.1136/bmjopen-2019-030912.

17.

Orne

(2017) On the social psychology of the psychological experiment: with particular reference to demand characteristics and their implications. Am Psychologist 17: 776–783.

18.

Sheridan

Drennan

Coughlan

, et al. (2015) Improving social functioning and reducing social isolation and loneliness among people with enduring mental illness: report of a randomised controlled trial of supported socialization. Int J Soc Psychiatry 61: 241–250. DOI: 10.1177/0020764014540150.

19.

University College, London (2020) Research into antipsychotic discontinuation and reduction (RADAR): a randomised controlled trial. Clinical trial registration NCT03559426. clinicaltrials.gov. https://clinicaltrials.gov/ct2/show/NCT03559426 (accessed 1 July 2021).

20.

Wager

Atlas

(2015) The neuroscience of placebo effects: connecting context, learning and health. Nat Rev Neurosci 16: 403–418. DOI: 10.1038/nrn3976.

21.

Woaye-Hune

Hardouin

Lehur

, et al. (2020) Practical issues encountered while determining minimal clinically important difference in patient-reported outcomes. Health Qual of Life Outcomes 18: 156. DOI: 10.1186/s12955-020-01398-w.

Off the RADAR;questions regarding the trial protocol of a randomised controlled trial of antipsychotic reduction and discontinuation