Abstract
Background
The Treatment for Adolescents with Depression Study (TADS) continues to influence international clinical practice guidelines for adolescent depression.
Objective
We assessed TADS’ effectiveness and harms through a Restoring Invisible and Abandoned Trials (RIAT)-led reanalysis.
Methods
TADS was a phase-three multicentre, parallel four-arm randomised controlled superiority trial that randomised adolescents (n = 439) to fluoxetine alone (FLX), CBT alone (CBT), open-label fluoxetine plus CBT (COMB) or placebo pill alone (PBO) over 12 weeks. COMB and CBT groups were unblinded to their treatment allocation. Primary outcomes were the Children’s Depression Rating Scale-Revised (CDRS-R) and Clinical Global Impression-Improvement scale (CGI-I).
Results
Our ITT reanalysis showed a significant decrease in CDRS-R at 12 weeks in COMB compared with PBO (−6.65; p < 0.001) and CBT (−7.63, p < 0.001), but not FLX (−4.07, p = 0.063). There was no difference between FLX, CBT and PBO. There was a higher proportion of CGI-I responders in COMB (73.72% [SE 4.43]) compared with all other arms: FLX (64.16% [SE 4.76]), CBT (51.06% [SE 5.29]) and PBO (38.76% [SE 4.87]; all p < 0.001); FLX had more responders than PBO (p = 0.004). 369 adverse events were reported for 171 participants, with 66% occurring in those taking fluoxetine. We found 32 serious adverse events (22 in those taking fluoxetine), which varied from original authors’ reporting.
Conclusions
Our reanalysis replicated the original investigators’ reporting that COMB demonstrated the most robust outcomes and that FLX was not superior to PBO. In contrast to the original TADS Team’s reporting, there was a higher level of harm uncovered in allocation groups taking fluoxetine, including 11 unreported suicide-related adverse events. Overall, the marked increase in harms among participants taking fluoxetine warrants further circumspection when prescribing it to adolescents.
Keywords
Introduction
Rates of prescribing of antidepressants to adolescents continue to increase globally,1,2 despite uncertainty about their benefits and harms. 3 Furthermore, some poor-quality research has misled doctors and the public about the effectiveness and safety of antidepressants. 4 The publication arising from the Treatment for Adolescents with Depression Study (TADS; ClinicalTrials.gov Identifier: NCT00006286) is arguably the most influential paper on the use of selective serotonin reuptake inhibitors (SSRIs) for adolescents diagnosed with MDD, being cited over 1900 times to date. 5 Although superiority was only demonstrated for TADS’ unblinded COMB arm, the publication was seminal in shaping guidelines and establishing fluoxetine internationally as the mainstay ‘evidence-based’ treatment for adolescent MDD.
TADS was a multicentre RCT ‘examining the comparative effectiveness of established treatments for adolescents with major depressive disorder (MDD)’ 6 (p. 6) conducted by the Duke Clinical Research Institute. It was a response to a National Institute of Mental Health (NIMH) Request for Proposals (NIMH-98-DS-0008) that set out questions about the long-term effectiveness of pharmacological treatment and specific psychotherapy in adolescents. TADS’ protocol stated the study would contrast ‘the degree and durability of improvement obtained across four treatment strategies’: fluoxetine alone (FLX), MDD-specific cognitive-behavioural therapy (CBT), fluoxetine plus CBT (COMB) and a single control condition, placebo pill (PBO) 7 (p. 6). The protocol specified primary effectiveness outcomes as improvements on depression rating scales, with the number of adverse events (AEs) specified as a secondary outcome of the study. 6 TADS’ protocol argued a multicentre RCT was necessary to replicate and extend generalisability of findings from previous studies that found CBT8–12 and FLX13–15 were both independently efficacious in treating depression in children and adolescents.
TADS’ methodology has been criticised in more detail elsewhere for not including a comparison of CBT-plus-fluoxetine versus CBT-plus-placebo, 16 and because two of the study arms were unblinded.17,18 Previously published TADS analyses had several flaws, including the approach used to deal with missing data; and lack of correction for multiple comparisons, which is the appropriate procedure for trials when they are considered related. 19 Hence, we aimed to determine the effectiveness of each treatment when these and other potential flaws were addressed.
The Restoring Invisible and Abandoned Trials (RIAT) initiative, established in 2013, provides an important mechanism for accountability in clinical trials. 20 RIAT’s systematic methodology addresses selective and misleading outcome reporting in randomised controlled trials (RCTs) through rigorous reanalysis of trial data. RIAT have established strict requirements for restoring a trial, 20 including that data should be reanalysed and reported according to the original protocol to enable an objective review of findings previously reported by trial investigators. The overall objective of our reanalysis was to accurately report protocol-specified primary and secondary effectiveness outcomes and harms in TADS to clarify whether TADS justifies the widespread use of antidepressants prescribed without CBT for adolescents with a diagnosis of MDD.
This paper reports on our reanalysis of acute phase outcomes from TADS, which were originally reported by March et al. 5 We issued a call to action in May 2018 to provide the original TADS Team investigators an opportunity to address any misleading reporting or undertake corrective measures. 21 There have been no responses from the investigators published to date.
Methods
We used TADS data from the first stage of the trial that is reposited on the National Database for Autism Research (NDAR; https://nda.nih.gov/about.html; Collection ID: 2145). 22 We were unable to obtain a clinical study report or individual study case report forms (see commentary 23 ).
Our RIAT reanalysis was informed by the original TADS protocol and its revisions.6,24 Where analytic methods were not clearly specified, we used standard approaches deemed most statistically and clinically robust. We used an RIAT audit record (RIATAR) to document original study reporting (Appendices 1 and 2). We acknowledge the rigour of the RIAT process compromises the readability of this manuscript and have reported fine details and replicated or secondary findings in Appendices.
Trial design
TADS was a phase-three, parallel four-arm superiority trial consisting of treatment arms randomised to FLX, CBT, COMB or PBO, in a 1:1:1:1 allocation ratio.
Participants
TADS Team investigators enrolled 439 adolescents aged 12–17 years who met DSM-IV criteria for MDD between ‘spring 2000 and summer 2003’ across 13 US sites (ClinicalTrials.gov Identifier: NCT00006286). 25
Participants could be screened over a maximum period of 12 weeks, with stability and pervasiveness of their MDD determined by clinical interview. 24 Interviews required the participant and a primary caretaker (‘parent’) report on dysphoria and/or anhedonia, and presence of these symptoms ‘in at least two of three contexts: at home, at school or with friends’ 24 (p. 25). We could not determine complete characteristics of all adolescents screened for TADS from the dataset. Protocol-specified inclusion and exclusion criteria for TADS are summarised in Appendix 3, Table S1.
Patient and public involvement
There was no indication participants or their parents were involved in trial design, as was common practice at the time.
Interventions
Participants were randomised into one of four treatment arms, with the acute phase of TADS lasting 12 weeks. Study drugs were provided to FLX, COMB and PBO participants during medication visits administered by a study ‘pharmacotherapist’ on six occasions over 12 weeks. Bottles of fluoxetine (Prozac) and matching inactive placebo pills were supplied as 10 mg doses by the study co-sponsor Lilly Inc., 5 (p. 819). Participants were dispensed with bottles containing enough 10 mg pills for 1 week, 6 (p. 40), which ranged between 7 and 28 pills per bottle (up to a maximum dose of 40 mg per day) 26 (p. 17).
Dosing began at 10mg for the first week (one pill daily) and increased to 20mg (two pills daily) during week 2 ‘assuming no limiting side effects’ 26 (p. 14). From week 4, the pharmacotherapist increased doses in 10 or 20mg increments (up to a maximum of 60 mg by week 12) based on participants’ CGI-Severity scores (CGI-S) 6 (p. 88) 26 (pp. 9-10). Participant treatment compliance was monitored by their parents and pharmacotherapists 26 (p. 7).
CBT aimed to build ‘social cognitive skills for coping with stress’ and provide psychosocial education to participants’ parents 6 (pp. 42-3). Participants in COMB and CBT arms were offered a total of 14 hours of CBT sessions over 12 weeks 6 (p. 43).
Sample size
The TADS protocol only reported power calculations for the CGI-I response (i.e. a score of 1 or 2) 6 (p. 19). It did not account for any loss to follow-up or protocol non-compliance.
Randomisation
Allocation sequences were generated by a ‘randomisation phone number’ service, which used a ‘computerized randomization algorithm’ to allocate treatment assignment 6 (p. 64). Separate randomisation schedules were created within each site, stratified by gender using initial random block sizes of four, and subsequently, four or eight 5 (p. 808). Participants were allocated to either FLX (n = 109), CBT (n = 111), COMB (n = 107) or PBO (n = 112).
Blinding
Participants, TADS personnel and pharmacotherapists were blinded to whether drug kits contained fluoxetine or placebo, for pill-only arms (FLX and PBO). Participants in CBT and COMB were aware of their treatments. Independent evaluators (IEs) were blinded to all treatment arms and guessed participant treatment allocation at weeks 6 and 12 post-randomisation. The effect of the TADS blinding protocol on treatment outcome is reported in further detail in another analysis by our TADS-RIAT team. 27 Blinding was reportedly broken for participants who experienced Serious Adverse Events (SAEs) 28 (p. 9). TADS statisticians were not blinded to treatment assignment 6 (p. 65).
Outcomes
TADS outcome measures were evaluated using clinician-rated and self-reported psychometric scales for different measures (Appendix 3, Table S2). Assessment visits were deemed separate from treatment visits, although both could take place on the same day 6 (p. 39). Treatment visits were conducted by a CBT therapist and/or pharmacotherapist who also rated some secondary outcome measures.
IEs conducted blinded assessments and were different from clinicians that determined dosing or administered treatments. Full assessment batteries were administered by the IE at baseline and week 12 (all questionnaires) and minor assessment batteries (including primary outcomes) were conducted at week 6 6 (p. 102). We did not have access to quality control data to re-analyse inter- and intra-rater reliability, which are reported elsewhere by the TADS team (see 5).
Primary outcome variables
Primary efficacy variables specified by the protocol were the IE-rated CDRS-R and CGI-I measures from randomisation across 12 weeks of treatment, recorded at baseline, week 6 and week 12 5 (p. 17).
The CDRS-R was rated by IEs based on participants’ symptoms throughout the prior week. The protocol refers to an official CDRS-R manual for exact instructions on conducting these interviews. 29 A 17-item ‘Best Description’ total score was calculated for the CDRS-R, which was used in the present reanalysis.
The CGI-I comprises a seven-item, clinician-rated scale determining the ‘MDD-specific severity of illness and degree of improvement since baseline’ 30 (p. 4) that was rated compared to participants’ ‘clinical status’ at baseline 30 (p. 8). According to the protocol, a ‘response’ at 12 weeks was defined as a score ≤2 on the IE-rated CGI-I 6 (p. 19). Although the protocol considered loss to follow-up as a non-response on the CGI-I, our reanalysis either used multiple imputation (MI) for the intention-to-treat (ITT) analysis or excluded such cases for the per protocol (PP) analysis. The use of MI was based on the consideration that the missing outcome data was assumed missing at random (MAR) and that the assumption of a treatment non-response for these cases would lead to conservative treatment effect estimates. In such circumstances, the use of MI leads to less biased results. 31
Secondary outcome variables
Pre-specified secondary outcomes and their raters are listed in Appendix 3, Table S3. The protocol also specified outcomes regarding a change over time in ‘non-MDD behavioural/symptomatic and functional outcome domains’ and ‘an unweighted composite score covering all domains of outcome’ 6 (p. 18). As these were unclearly defined, we interpreted ‘non-MDD behavioural/symptomatic and functional’ outcomes as being informed by the ADS mania sub-scale, ADS function sub-scale, K-SADS-PL (for ‘non-MDD’ diagnostic criteria), PESQ and PQLQ, and have reanalysed and reported these.
Attrition and protocol violations
There were two types of attrition recorded in TADS: drop out and premature termination from the trial. Study dropouts were defined as TADS participants who ‘[withdrew] consent following randomisation’ 6 (p. 36) or ‘[withdrew] consent from the assessment portion of the study’ and were therefore ‘not eligible for TADS treatment’ 6 (p. 57). These participants were considered as lost to follow-up 6 (p. 53).
Protocol-defined ‘premature terminators’ were participants whose condition deteriorated or who experienced clinical crisis 6 (p. 56). These participants could continue to receive treatment designated by their randomised treatment arm, switch to another treatment (e.g. discontinue PBO and change to COMB) or discontinue study treatment altogether. Unlike dropouts, they would continue to participate in study assessments.
An Adjunct Services and Attrition Prevention (ASAP) Panel provided ‘a consistent vehicle for sites to handle situations that require additional evaluation and/or intervention beyond that provided for in the study protocol’ 28 (p. 20). The TADS Team previously reported that ASAP use varied by site and inconsistent data collection occurred. 32 We report on the frequency of ASAPs for each treatment arm and used ASAP records in our reanalysis of harms (see below section on harms).
Suicidality monitoring
The manual of procedures for managing suicidality outlines that, at ‘every office and phone visit’, pharmacotherapists and CBT therapists should ‘inquire about suicide risk’, referring to ‘tracked measures’ that occurred ‘on a reasonably frequent basis’ 33 (pp. 3; 25,31). These measures consisted of individual items from scales listed in Appendix 3, Table S3.
The Suicidal Ideation Questionnaire (SiQ-Jr) is a 10-item self-reported scale also administered at baseline and each treatment visit 6 (p. 90). Its change over time is reported using descriptive statistics.
Harm endpoints
Source of data on AEs
The definition of an AE from the AE/ASAP Manual is as follows: ‘any unfavorable medical change that occurs during or after beginning the study that may or may not be related to or caused by study drug or CBT treatments. A medical event is defined as a clinically significant change in physical and/or mental health status’
28
(p. 4).
Data on AEs in this reanalysis came from the ‘AESAE’ dataset, supplemented with additional data from the ASAP dataset. The TADS Team also indicated whether they thought AEs were related to the drug, but we were only able to report on whether the participant was taking fluoxetine at the time of the event.
Reanalysis of AE data
The protocol stated AEs reporting would be descriptive and did not specify a statistical approach to these data, despite undertaking such analyses in all subsequent publications 6 (p. 18).
In this reanalysis, all available verbatim terms for events in the ‘AESAE’ dataset were reviewed by JLN, who was blinded to treatment allocation and to the TADS Team’s coding. JLN’s review was audited by NA, JK and JJ, who were not blinded. Some matters were resolved by consensus after JLN’s blind was removed (Appendix 9, Tables S2-4).
Medical Dictionary for Regulatory Activities (MedDRA) is endorsed by the FDA and was used to code verbatim terms (see https://www.meddra.org/). JLN is trained using MedDRA and has previous experience of RIAT reanalysis.34,35 Verbatim terms in the TADS AESAE dataset were allocated a MedDRA ‘preferred term’ (PT) grouped according to a primary ‘system organ classification’ (SOC). In some cases, MedDRA also provides a potential secondary SOC, which may be more appropriate depending on the clinical history of the particular AE. It should be noted that the choice of classification for some of these preferred terms can make a considerable difference to the overall AE profile of any given drug (see 34). For example, the PT ‘insomnia’ is allocated under the primary SOC ‘psychiatric’, but it could also be classed as ‘nervous system’. In most cases, the primary SOC was utilised. However, for PTs falling under the primary SOCs of ‘infection’ and ‘injury’, PTs were allocated to their secondary system organ classification rather than the more generic primary SOC. For example, ‘influenza’ is classed as ‘infection’ as the primary SOC and ‘respiratory’ as the ‘secondary’ SOC.
The term ‘suicidal event’ was defined by the TADS Team as ‘discrete episodes of suicidal ideation, suicidal attempts or preparatory acts towards an imminent attempt’ 36 (p. 743). March et al. stated ‘suicide-related events' consisted of ‘either worsening suicidal ideation or make a suicide attempt, or both’ 5 (p. 810). Our RIAT analysis therefore presented any events labelled as suicidal ideation, suicidal gestures or suicide attempts in the dataset. Self-injury without suicidal intent was not classified as a ‘suicidal event’.
We used the following definitions to classify suicide-related events: • ‘Suicidal ideation’ – thoughts, ideas and verbalisation of potential intent to purposefully end one’s own life, with or without a method or plan. • ‘Suicidal gesture’ – translating thoughts/ideas into behaviours including making preparations and actions that could potentially lead to death whereby likelihood of death is low. The TADS suicide prevention manual noted a suicidal gesture constituted suicidal ideation with a method ‘in hand’ without action being taken, ‘c.f. threatened to take pills he/she was holding but didn’t actually do it’. • ‘Suicide attempt’ – a formal attempt to end one’s life.
According to the protocol, AEs may or may not have initiated ASAPs, which were ‘primarily driven by whether out-of-protocol interventions are required to manage the AE’ 6 (p. 56). Where AEs initiated an ASAP, further details were documented in the ASAP database according to whether it was treatment emergent, initiated panel review and/ or any changes in treatment, external treatment or a suicidality assessment (evaluating intent, method of suicide, desire to die, lethality of the incident, hopelessness and aggression).
AEs reported in the ASAP dataset that were missing from the original AESAE dataset are included in our reporting. The TADS AE/ASAP manual states ‘[e]ither mania or worsening MDD may trigger [an] ASAP; only mania is considered an AE, however’ 28 (p. 6). However, in the TADS dataset, worsening depression was indeed coded as an AE; therefore, ASAPs with worsening depression are also reported in our reanalysis.
We did not have data regarding the timing of administration of concomitant medications (including concomitant SSRIs). Therefore, we classified participants’ treatment at the time of an AE according to their randomised treatment arm unless they were prematurely terminated and another treatment was recorded at a known timepoint. In our reporting, the ‘fluoxetine’ group consists of FLX, COMB and premature terminators prescribed fluoxetine. The ‘no fluoxetine’ group consists of CBT, PBO and premature terminators with treatment recorded as ‘none’.
CONSORT guidelines recommend expected and unexpected AEs be reported separately to address the issue of ‘priming’ with medication side effects. 37 TADS participants were informed of potential side effects of fluoxetine during consenting procedures 6 (pp. 97–103) and by their pharmacotherapist 26 (p. 10). However, the AE dataset did not provide enough detail to assess whether a specific event could be associated with a priming effect. For example, the AE term ‘abdominal discomfort’ could either be due to dyspepsia (not an expected effect) or nausea (expected side effect).
The Columbia Rating Scale
Although not pre-specified by the protocol, a post-hoc reclassification of ‘all possibly suicidal events’ 38 (p. 1443) was undertaken by independent experts at the Columbia Suicidality Classification Group. 36 They were commissioned by the FDA to reanalyse ‘narratives for all possible suicidal and aggressive events’ using the Columbia Classification Algorithm of Suicidal Assessment. We were not able to gain access to these narratives. 23 Components of each event recorded as narratives were re-classified as ‘1 = Suicide attempt; 2 = Preparatory suicide behaviour; 3 = Self-injurious behaviour, unknown intent; 5 = Self-injurious behaviour, no suicidal intent; 6 = suicidal ideation; [or] 7 = other, no evidence of deliberate self-injurious behaviour’ 28 (p. 1). TADS publications only reported Columbia scores ‘where a subject had more than one suicide-related event’ and represented them once using the most severe code 38 (p. 1447). We therefore also report the most severe code for each event as recorded by the AE dataset (Appendix 9).
Statistical analysis
Our statistical reanalysis of primary and secondary outcomes was directed by the original TADS protocol 6 (pp. 17-18, 65-7). Baseline demographics, clinical characteristics and outcome measures are reported using descriptive statistics. The mean and standard deviation (SD) are reported for normally distributed continuous variables, and median and inter-quartile range for non-parametric variables. Shapiro–Wilk tests of normality were used to assess the distribution of baseline data. Categorical variables are presented as count (n) and proportion (%).
TADS’ primary hypotheses were that FLX, CBT and COMB would demonstrate improvement on measures of MDD compared with PBO, and this improvement would be greater in COMB compared with FLX or CBT. 6 Although this suggests TADS was a superiority trial of COMB treatment, the protocol included conflicting definitions for their primary study treatment comparisons. We adjusted p-values to account for multiple comparisons between treatment arms, as recommended when treatment arms are related. 19 Adjustments were performed using the Bonferroni adjustment approach since the protocol did not consistently state which of the six possible treatment comparisons were of interest.
The protocol stated statistical modelling would be repeated in three steps for each outcome, using ITT analysis, and analysis with as-observed and with ‘completer’ cases 6 (p. 17). We used multiple imputation of raw total outcome scores to accommodate the ITT analysis. Consistent with the protocol, we defined observed cases (OCs) as those having data available for a specific study visit irrespective of drop out or premature termination. 6 The protocol also defined a population of ‘completers’ which could have data missing at week 12 and/ or not complete all study assessments. We therefore used a definition of per protocol (PP) to include participants who did not drop out, were not prematurely terminated and had at least completed a single week-12 assessment.
All analyses were performed using Stata version 16.0 (StataCorp, USA). Complete statistical syntax for this reanalysis and examples are reproduced in Appendices 4 and 5, and imputation techniques are provided in Appendix 6. Our methodology and results were audited by an external statistician (Appendix 7).
Reanalysis of continuous variables
The primary outcome, CDRS-R, and secondary outcomes - ADS depression, ADS function, ADS mania, CGAS, HONOSCA, PESQ, RADS and PQLQ total scores - were analysed using ITT with multiple imputation used to impute missing scores at any of three time-points (baseline, week 6 and week 12). 39 The imputation model used Multivariate Imputation by Chained Equations (MICE) to impute missing CDRS-R outcome and CDRS-R baseline values with randomised arm, time, clinical trial site and gender as regular variables. The number of imputed datasets needed for the analysis was determined using Stata’s mcerror command 39 (see Appendix 6). The analysis model used a mixed-effects model with gender, clinical trial site, baseline outcome and an interaction between time and randomised arm as fixed effects, and patient as a random effect. Treatment effects for each of the three treatment arms versus placebo and each of the other three pairwise treatment arm comparisons were assessed. Fixed effects in the analysis model included baseline CDRS-R, treatment arm, time as categorical (baseline, week 6 and week 12), a group-by-time interaction and gender and site (as categorical). Participant ID was included as a random intercept to account for correlation within participants. Bonferroni adjustments were performed to account for multiple comparisons (six comparisons between the four arms). 40 Bonferroni adjusted p-values <0.05 for differences between arms were considered statistically significant.
The protocol is ambiguous about the rate of change in the CDRS-R score across time being a primary outcome (vs the change in CDRS-R at week 12), listing it as a primary outcome in the aims, 6 (p. 17) and as an alternative approach in statistical methods 6 (p. 67). To examine the change in CDRS-R across time, we log-transformed the continuous time variable using natural log (days since start of treatment+1) and included this in a similar mixed-effects model, except for natural log (time) replacing the categorical time variable. We estimated the marginal mean slopes for each treatment arm to calculate the mean change in CDRS-R between baseline and 84 days (week 12) and compared these changes between the four arms. A Bonferroni adjustment was performed for six between-group comparisons.
OC and PP analyses were performed using the same methods.
Reanalysis of dichotomised variables at week 12
We assessed IE-rated and clinician-rated CGI-I responses (responder vs non-responder; primary outcome), and diagnosis of MDD on the K-SADS-PL at week 12 on ITT, OC and PP cases using chi-squared analysis and binary logistic regression. We performed both ITT and modified ITT analyses. We used chi-squared analysis without imputation (i.e. a modified ITT population) and binary logistic regression with multiple imputation (60 datasets; ITT population). The logistic regression analyses for response at week 12 (for both ITT and PP populations) was performed on a protocol-informed basis (i.e. including covariates) to adjust for potential confounders and quantify the effects of other covariates of interest, namely, age, gender, race and site. Finally, we also assessed possible effect modification for the treatment effects across sites using the PP population by adding a site-by-treatment-arm interaction term to the logistic regression model. This was done as an alternative to Mantel-Haenszel analysis due to small sample sizes at some sites and hence, cross-tabulated treatment response by site groups that contained zero participants which prevented Mantel-Haenszel analysis that requires a non-zero number of subjects at each site-by-treatment cell. All results were adjusted for multiple comparisons using Bonferroni adjustment (six comparisons).
Reanalysis of ordinal outcome measures
The clinician-rated CGI-S was analysed at week 12 using multiple imputation (25 datasets) and mixed-effects ordinal logistic regression that quantified the effects of the covariates: age, gender, race and site. For the unimputed ITT analysis, chi-square analysis was used.
Challenges to our RIAT reanalysis
Bias
Our RIAT reanalysis aims to present an independent, protocol-informed reanalysis of TADS data. We acknowledge that our TADS-RIAT team might be perceived as biased against the original study investigators’ conclusions because the audit process itself is based on questioning these results and some team members have previously published detailed criticisms of TADS.16–18,41–44 We tried to address this risk with in-depth documentation of our review process, as outlined in Appendices 1, 2 and 9. We have also made our statistical syntax and results available online for others to review and analyse with respect to the public online NDAR dataset (Appendices 4, 5, 6 and 8), and engaged a statistical expert to audit our approach and results (Appendix 7).
Data access
Duke University initially signed a data agreement to provide narratives regarding SAEs during TADS but subsequently withdrew on grounds that they could not be sufficiently de-identified. 23 After extensive negotiation with Duke University, we were unable to obtain the original study CRFs (which were destroyed in 2017) nor SAE forms (which remain intact). TADS databases also state information on concomitant medications was not available because data were too complex (‘TADS Analysis Datasets’ in 45). This meant we could not reliably analyse the causation of AEs that may have been caused by withdrawal effects or adverse effects of other treatments.
Le Noury et al. 34 made clear there are many ways that AE data can be analysed and represented. There can be issues with idiosyncratic coding systems, failure to transcribe all AEs that are recorded, filtering data with statistical techniques, restriction of AEs to those occurring above a certain frequency, grouping of different AEs, insufficient consideration of severity and ignoring the effects of drug withdrawal or effects of concomitant drugs. It was difficult to determine other discrepancies because we did not have information directly associated with each AE report and we had to piece together information from other databases’ variables which would not reliably inform an AE (e.g. suicidality rating scale score and length of time on medication).
Interpreting the study protocol
RIAT analyses are guided by the original TADS protocol, which was conservatively re-interpreted where inconsistent or unclear in several key areas (see Appendix 3, Table S4).
Statistical challenges
Key issues with interpreting TADS protocol.
Results
Characteristics of TADS participants at baseline, according to treatment arm.
Terminology used for gender and race are reproduced from NDAR dataset documentation.
Excludes dysthymia.
Anxiety disorders included panic disorder, social phobia, simple phobia, agoraphobia, generalised anxiety disorder, post-traumatic stress disorder and acute stress disorder. ODD = oppositional defiant disorder.
Clinician-rated ADS and CGI-S scores are reported provided they were the first record and rated ≤7 days after randomisation.
Figure 1 shows TADS participants’ treatment allocations and study flow post-randomisation, where drop out or premature termination was not mutually exclusive. A total of 80 participants (18%) dropped out or were prematurely terminated (n = 48 [11%] dropouts and 42 [10%] premature terminators; with 10 [2%] of these being both dropouts and premature terminators; Appendix 3, Table S5). There was no difference in loss to follow-up by treatment arm (Appendix 3, Table S6). Participants lost to follow-up were more frequently male and had a shorter median duration for their first depressive episode. There were some cases for which we were unable to reliably determine reasons for dropout or premature termination because the ASAP dataset coding was unclear or absent (n = 35 without ASAP). Flowchart of participants by treatment arm. Counts reflect data available for the CDRS-R, where ITT = intention-to-treat population, OC = observed cases, and PP = per protocol population (participants who did not drop out, were not prematurely terminated and had a week-12 CDRS-R rating). Drop out and premature termination may have occurred in the same participant and are not mutually exclusive in this figure.
Treatment compliance
Participants allocated to FLX, COMB and PBO arms were all titrated to 20 mg per day of study drug by the end of week 2 post-randomisation. The most commonly prescribed dose recorded throughout the trial was 20 mg of study drug per day. The maximum recorded dose for participants in drug-only arms was 50 mg (n = 3 FLX and 1 PBO participant(s), respectively) and 40 mg for participants in COMB (n = 15 participants).
Average compliance was consistent across groups assigned a study drug (Appendix 3, Table S7). However, a considerable number of participants (n = 167) only had doses documented for a median of 9.86 weeks (ranging 0.86–11.43 weeks).
Protocol violations
Subject-initiated protocol violations were poorly described by the trial protocol, its manuals and available data. A summary of these violations as recorded by the ASAP log (n = 100 ASAPs in 76 participants) is presented in Appendix 3, Table S8. The most common reasons for ASAPs were suicidality (n = 34) and worsening of depression (n = 33).
Concomitant treatments
The Study Screening Log indicated 22 participants entered the study taking concomitant medications, but only 10 were specified in the Concomitant Medication Log. Overall, 72 participants were administered concomitant treatments, including 31 who were neither dropouts nor premature terminators (at least seven of whom were taking concomitant antidepressants; Appendix 3, Table S9).
Twenty-nine participants changed treatment following premature termination, with six switching treatment to FLX (n = 5 originally allocated to PBO and 1 from COMB), five participants switching to COMB (from PBO) and 18 switching to ‘no treatment’ (n = 9 FLX, 3 CBT, 3 COMB and 3 PBO).
Blinding
With four allocation arms, adequately blinded IEs would be expected to correctly guess treatment allocation 25% of the time. However, TADS IEs correctly guessed participant treatment allocation in 171 of 352 guesses at week 6 (49% correct overall; 67% FLX, 53% CBT, 41% COMB and 32% PBO) and 170 of 352 guesses at week 12 (48% correct overall; 63% FLX, 51% CBT, 39% COMB and 40% PBO; see also: Appendix 3, Table S10). This finding raises the possibility of significant unblinding, which may have been sufficient to distort the results according to any bias IEs may have had (presumably favouring active treatment). We report further on the role of blinding and treatment expectancy in TADS elsewhere (see 27).
Primary outcomes
The change in CDRS-R across 12 weeks
Missing data
Among participants with incomplete data at a given visit, there was one participant in the COMB arm missing baseline data on the CDRS-R scale. Five participants were missing scores at week 6 (n = 1 CBT, 2 COMB and 2 PBO) and three at week 12 (2 FLX and 1 COMB). When participants with missing visits were also included in the dataset, and prior to multiple imputation, there was one missing baseline score, 54 missing week-6 scores (n = 10 FLX; 15 CBT, 11 COMB and 18 PBO) and 63 missing week-12 scores (n = 13 FLX, 21 CBT, 13 COMB and 16 PBO).
Treatment effects
Mean CDRS-R scores at week 12. a
Observed mean (±SD) CDRS-R scores by treatment arm and time point and estimated treatment effects at week 12, using time as a categorical variable (baseline, week-6 and week-12) for ITT, OC and PP analyses.
Week-12 treatment effects are shown for pairwise treatment contrasts with Bonferroni adjustments, including 95% CIs and p-values.
The CGI-I at 12 weeks
Missing data
One participant was missing a week-12 outcome score in the CBT arm.
Treatment effects
Proportions (n, %) of treatment responder and non-responders, and predicted response rates on the CGI-I at 12 weeks following treatment.
Pairwise treatment contrasts for predicted response rates (%) at week 12 are shown with Bonferroni adjustments, including SE and p-values for each contrast.
Using a binary logistic regression analysis and MI (60 datasets), there were differences in the response rates at week 12 for COMB versus PBO (p < 0.001), FLX versus PBO (p = 0.004) and for COMB versus CBT (p = 0.033). Age, gender, race and site were not predictors of the response rate (p > 0.05 for all). PP and OC analyses showed similar patterns of treatment effects.
Secondary outcome measures
Secondary outcomes demonstrated treatment contrasts that were broadly consistent with each other, where COMB treatment demonstrated superiority compared with PBO on all but the PESQ scale, and FLX was only superior to placebo on clinician-rated CGI-I and K-SADS-PL measures (see Appendix 3, Tables S12-23). Specifically, ITT predicted response rates on the clinician-rated CGI-I and K-SADS-PL (‘current MDD' category) were statistically significantly higher in both COMB (both p-values <0.001) and FLX compared to PBO (p = 0.006 and p = 0.013, respectively; Appendix 3, Table S15).
Harms
Adverse events
Only AEs specific to the acute phase of TADS treatment were included in our reanalysis. Six cases that occurred pre-randomisation or at baseline (specifically: restlessness [in one FLX participant], nasal congestion [FLX], viral URTI [COMB], dizziness [COMB], jaw pain [COMB] and vomiting [COMB]) are not included in our reporting. There were 369 AEs documented in 171 participants post-randomisation.
Reclassification of AE data using MedDRA
Our MedDRA reclassification reviewed events reported in both AESAE and ASAP databases because as noted above, not all events meeting criteria for AEs in ASAP were documented in AESAE (e.g. the reason for an ASAP was listed as ‘suicidality’ or ‘hypomania or mania’ but no AESAE entry corresponded to the event, either by participant ID [i.e. not recorded at all] or timing of the case [no record within the same week]). Our reclassification of AEs is recorded in further detail in Appendix 9.
Sixty-seven events not reported in AESAE were documented in ASAP and are summarised in Appendix 9, Table S5. We note eight visits recorded in the pharmacotherapist treatment log that was updated weekly recorded yes for the following question: ‘Did the respondent report any adverse events related to the medication?’ but were not reported in the AE or ASAP database. It was not possible to further clarify these cases, so they were not included in our reporting.
Summary of AEs during the acute phase of TADS a .
Events are organised according to whether the participant was taking fluoxetine at the time of the event and whether the event was an SAE, classified into primary MedDRA SOC terms, in order of event frequency.
List and count of SAEs during the acute phase of TADS according to whether participants were taking fluoxetine at the time of the event, reported by their MedDRA SOCs and PT names.
All AEs (n = 132 in 62 participants) with a primary MedDRA classification under ‘psychiatric disorders’ are reported according to their MedDRA PT name according to treatment at the time of the AE in Appendix 9, Table S6. Some of these events required joint adjudication by the TADS-RIAT team, as their classification in the AE database was unclear given additional information available from ASAP entries (Appendix 9, Tables S2-4). For example, an AE recorded as self-injurious behaviour without suicidal intent required inpatient psychiatric hospitalisation, and a subsequent ASAP entry approximately a week later was coded as a suicide attempt but was not Columbia coded. We documented this event once as a suicide attempt because it was unclear whether these two cases refer to the same event (e.g. whether self-injurious behaviour was later reclassified as an attempt in the ASAP) or they are two separate events (e.g. self-injurious behaviour led to hospitalisation and there was a suicide attempt made while in hospital).
Post-hoc reclassification by the Columbia Suicidality Classification Group occurred for 70 AEs in 61 participants. Only 8 suicide attempts were associated with Columbia records (6 occurring on fluoxetine and 2 not on fluoxetine) and event coding appeared inconsistent. For example, one SAE was recorded as self-injurious behaviour with unknown intent according to Columbia codes; however, it was associated with an ASAP for a planned suicide attempt. Provided this additional information, we reclassified this event as a suicide attempt. Columbia ratings are reported verbatim alongside our adjudication of the data in Appendix 9, Table S11.
Overall, our reanalysis uncovered 10 suicide attempts in 10 participants, 3 of which were recorded as such in ASAP records but not AESAE database. All self-harm related AEs are summarised in Appendix 9, Tables S3-4.
ITT reporting
We report harms by participants’ randomised treatment arms in Appendix 9, Tables S7-9. Overall, participants randomised to FLX had the most AEs recorded (n = 139 events), followed by PBO (n = 102) and COMB (n = 100). The CBT arm had less than a quarter as many AEs recorded compared with all arms (n = 28 events). COMB and FLX had the same number of suicide attempts recorded (n = 4 in each arm).
Site-specific reporting of AEs
The number of AEs and number of participants experiencing AEs varied by site (see Appendix 9, Table S10), ranging from 1 out of 21 (5%) to 8 out of 10 participants (80%).
SIQ-Jr
SIQ-Jr scores decreased over time, with the greatest change occurring in the COMB arm (median decrease of 12 points; Appendix 3, Table S25).
Discussion
Summary of findings
On both primary outcomes at week 12 (the CDRS-R and the CGI-I), COMB participants improved significantly in comparison with PBO and CBT, and in comparison to FLX on the CGI-I. FLX was not superior to PBO or CBT on the CDRS-R but was superior to PBO on the CGI-I. CBT was not superior to PBO on either primary outcome measure. Reanalysis of secondary outcome measures demonstrated a statistically significant advantage of FLX compared to PBO only on two out of eleven measures, both of which were rated by clinicians who were unblinded.
Our reanalysis of harms revealed that AEs were more common than was reported by the TADS Team investigators, particularly among participants who were taking fluoxetine at the time of an AE. SAEs were also more common in participants on fluoxetine, particularly suicide attempts.
Comparison with original TADS publications
Our protocol-led reanalysis reflected March et al.’s findings that COMB was superior to other allocation groups on both primary outcomes (CDRS-R and CGI-I). 5 However, our findings for fluoxetine differed, whereby March et al. found it was superior to PBO on both primary outcomes. In addition, our findings for secondary outcomes, most of which were not analysed per-protocol in the originally reported study, 5 found inconsistent superiority for fluoxetine over placebo.
The original analysis reported superiority of COMB treatment. The 6.7-point difference on the CDRS-R between COMB and PBO needs to be considered in the light of TADS’ design. COMB participants received open-label fluoxetine, were informed they were receiving ‘gold-standard’ treatment,6,46 and also received more clinical attention than other treatment arms as they saw both the CBT clinician and pharmacotherapist weekly.46,47 More recent, albeit smaller trials, which double blinded all treatment arms, have failed to corroborate that the addition of fluoxetine to CBT is superior to placebo 48 or fluoxetine alone 49 in treating adolescent depression.
Benefits ascribed to fluoxetine in TADS were comparable to placebo on a majority of measures. FLX was superior to PBO on the CGI-I, as well as two other clinician-rated measures, which must be interpreted in the context of compromised blinding. 27 Specifically, we previously reported treatment guesses in TADS were more accurate than would have been expected by chance. 27 Overall, inconsistent benefits do not outweigh the harms identified in this study, which our reanalysis finds were more prevalent than initially reported.
Reporting of adverse events
Comparison of AE reporting for the acute phase of TADS by different publications compared to TADS RIAT and according to treatment at the time of the AE a .
Figures are restricted to psychiatric SAEs, suicide attempts or suicide-related events (suicide-related events comprise suicidal ideation, suicidal gestures and suicide attempts grouped together). The fluoxetine column represents participants taking fluoxetine at the time of the AE and either comprised those allocated to FLX or COMB, or who had been prematurely terminated previously and then prescribed fluoxetine.
Our reanalysis, despite our limited access to narratives, reveals lack of clarity and under-reporting of AEs. In neither the primary paper 5 nor the main safety paper 38 could we discern the number of SAEs being reported, but it seemed to be no more than 25. We found documentation of 32 SAEs in TADS datasets, including two that had not been designated as SAEs but were associated with hospitalisations. Within these SAEs, 10 suicide attempts in 10 participants (9 of whom were taking fluoxetine) were documented. In contrast, March et al. reported only seven (n = 2 in FLX, 1 CBT and 4 COMB) 5 and Emslie et al. reported five suicide attempts (n = 2 in FLX, 1 CBT and 2 COMB). 38
Overall, our findings demonstrate harms were clinically significantly more prevalent in participants taking fluoxetine compared with participants in CBT and PBO arms. The TADS Team downplayed concern about AEs by arguing that disentangling medication-related AEs from illness-related ones is difficult. This interpretation is invalid as unfavourable results in RCTs should not be explained away by attributing harms to the illness process.
Limitations
Our reanalysis was limited because it was a secondary analysis of already-processed data. We were also restricted to protocol-led parameters and ideally would have adjusted for confounders such as the duration of depressive episode at baseline, which varied significantly across treatment arms and was reported as a moderator of response in subsequent post-hoc analyses by The TADS Team. 46
Generalisability
Regarding concerns about the quality of reporting, given our experience with study 329 34 and the citalopram CIT-MD-18 study, 50 our findings of irregularities with TADS, the underreporting of suicidality and other harms, appear generalisable.
Interpretation of our findings
Our reanalysis uses multiple imputation methods to reduce bias and improve precision, together with the appropriate inclusion of adjustment for multiple treatment comparisons. We found little support to justify the routine use of fluoxetine as treatment for adolescent depression, which is not concordant with the TADS teams’ conclusions or the way TADS results have generally been represented since publication. On balance of our reporting on harms, the widely perceived benefit of prescribing fluoxetine for adolescents, is questionable.
Implications for further research and practice
The findings of this as well as several other reanalyses suggest that antidepressant trial results cannot be taken at face value, and they underline the importance of full data and transparency in clinical research.
Conclusion
Our reanalysis confirms the original reported findings that superiority over placebo was not demonstrated for fluoxetine. Contrary to original TADS Team’s reporting, we have uncovered a higher, clinically significant level of harm, including 11 additional suicide-related adverse events.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Supplemental Material
Supplemental Material - Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study
Supplemental Material for Restoring TADS: RIAT reanalysis of the Treatment for Adolescents with Depression Study by Natalie Aboustate, Jon Jureidini, Richard Woodman, Joanna Le Noury, Julie Klau, Elia Abi-Jaoude and Melissa Raven in International Journal of Risk & Safety in Medicine.
Footnotes
Acknowledgements
This work was partly funded by a grant from the Laura and John Arnold Foundation administered by the RIAT Support Center, University of Maryland (Subaward No. 1802226). The funder had no role in the preparation of this manuscript. Prof Peter Doshi, who works for the sponsor (RIAT Support Center), facilitated some negotiations with Duke University in an attempt to obtain additional TADS data (see 23). Data and/or research tools used in the preparation of this manuscript were obtained from the NIH-supported National Database for Autism Research (NDAR). NDAR is a collaborative informatics system created by the National Institutes of Health to provide a national resource to support and accelerate research in autism. Dataset identifier DOI: 10.15154/1504156. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or of the submitters submitting original data to NDAR. These data are not available for dissemination through our organisation. A/Prof Mark Jones audited our statistical analysis and results, providing a statement of his opinion in
. Dr Emily Aldridge audited our reanalysis of self-harm-related adverse events. The late Dr Catalin Tufanaru critically reviewed our statistical methodology and reporting in an early draft of this manuscript.
Ethical consideration
As a secondary analysis using existing and non-identifiable data, this reanalysis was deemed exempt from ethical review by the University of Adelaide’s Human Research Ethics Committee (HREC reference: 33958).
Author contributions
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly funded by a grant from the Laura and John Arnold Foundation administered by the RIAT Support Center, University of Maryland (Subaward No. 1802226).
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors have completed the Unified Competing Interest form (available on request from the corresponding author) and declare the grant from the Laura John Arnold Foundation was used to support part of Aboustate, Jureidini and Klau’s salaries in the past. The authors have no financial relationships with any organisations that might have an interest in the submitted work in the previous 3 years and no other relationships or activities that could appear to have influenced the submitted work.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
