Abstract
ANOVA testing (generalized linear model) using a model including effects of site, treatment and site x treatment interaction was applied. Otherwise we used only descriptive statistics.
Background
In 2013, in response to concerns about selective reporting of outcomes of randomized controlled trials, an international group of researchers called on funders and investigators of abandoned (unpublished) or misreported trials to publish undisclosed outcomes or correct misleading publications [1]. This initiative was dubbed ‘restoring invisible and abandoned trials’ (RIAT). The researchers identified many trials requiring restoration, and emailed the appropriate parties, asking them to signal their intention to publish the unpublished trials or publish corrected versions of misreported trials.
The RIAT researchers approached GlaxoSmithKline (GSK) (formerly SmithKline Beecham: SKB) and asked whether it intended to restore Study 329, a double-blinded randomized controlled trial comparing paroxetine and imipramine with placebo in the treatment of adolescent major depression. GSK did not signal any such intent.
The acute phase of Study 329 was originally reported in the Journal of the American Academy of Child and Adolescent Psychiatry in 2001 [2]. It was reanalysed and published under the RIAT initiative in 2015 in BMJ (Le Noury et al.) [3].
The acute phase of Study 329 was followed by a six-month continuation phase. This article represents a RIAT publication of the continuation phase.
The 1993/1994/1996 trial protocol [4] stated that the objectives for the continuation phase were: To provide information on the safety profile of paroxetine and imipramine when these agents are given to adolescents for an extended period of time; To estimate the rate of relapse among paroxetine, imipramine, and placebo responders who were maintained on treatment (p. 000547).
The clinical study report (CSR) [5] stated that the continuation phase ‘was not designed to determine whether paroxetine or imipramine are superior to placebo in preventing relapse’, but instead ‘to provide information on the relapse rates of responders over an extended period’ (p. 000023).
Study enrolment took place between April 1994 and March 1997. The final date on which the last patient took study medication during the continuation phase was 3 September 1997. In a small number of patients, 30-day follow-up data were collected into February 1998.
We have reanalysed Study 329 according to the RIAT recommendations. To this end, we have used the CSR [5], including Appendices A-G (publically available on the GSK website [
Except where indicated, in accordance with RIAT recommendations, our methods are those set out in the protocol, [4] as outlined in RIAT Appendix 1.
Participants
The acute phase participants comprised 275 adolescents between the ages of 12 and 18 years, meeting DSM-IV criteria [9] for a current episode of major depression of at least eight weeks’ duration(see Le Noury et al. [3] for details of eligibility criteria, standardization across sites, randomization, blinding, recruitment, screening, consent, demographic and baseline characteristics). In the continuation phase, patients who had responded to treatment were eligible to continue on the same medication at the same dosage for an additional six months.
Some participants were not able to progress to the continuation phase because of a shortage of study medication supplies, resulting from a slower-than-expected rate of enrollment, which led to some of the medication expiring before use (CSR, p. 000027), Amendment 2 of the Study 329 protocol (approved 28 October 1996) provided two options for these patients: treatment by a third party, who was provided with the identity of the study medication, or open-label paroxetine treatment for up to six months (after a one-week down-titration and washout period) by the study physician (p. 000538).
Interventions
Study medication was provided to patients in weekly blister packs. Patients were instructed to take the medication twice daily. There were six dosing levels. Over the first four weeks of the acute phase, all patients were titrated to level 4, corresponding to paroxetine 20 mg or imipramine 200 mg, regardless of response. Non-responders (those failing to reach responder criteria) could be titrated over the following four weeks up to level 5 or 6, corresponding to a maximum dose of 60 mg paroxetine 60 mg and 300 mg imipramine.
Medication compliance was evaluated based on the number of capsules dispensed, taken, and empty blister packs returned. Non-compliance was defined as taking less than 80% or more than 120% of the number of capsules expected to be returned at two consecutive visits, and resulted in withdrawal from the study. Any patient missing two consecutive visits was also withdrawn.
Patients were provided with 45-minute weekly sessions of supportive psychotherapy, [10] primarily for the purpose of assessing the treatment effects.
Taper phase
A discontinuation taper phase was recommended for all patients, whether terminating the study early (during both the acute and continuation phase), or completing the acute phase but not continuing, or completing the six-month continuation phase. If the patient accepted a taper phase, the protocol recommended tapering medication/placebo in a linear fashion over a seven to 17 day period, with patient, family, and clinical and research personnel all remaining blind to medication assignment.
Not all patients agreed to a taper phase. For those who did taper their medication, it was difficult in some cases to be certain of the exact duration of exposure, because the date of last dose was left incomplete. We have used the exact number of days where available, and for other patients we have assumed an average taper phase of 2 weeks, unless there were indications to the contrary. The taper phase includes patients tapering during the acute phase as well as the continuation phase, so there are more patients in the taper phase than in the continuation phase.
Outcomes
Patients were evaluated every four weeks from week 12 to week 32 during the six months of the continuation phase.
Efficacy Endpoints
a. Percentage of Patients Who Relapsed
The protocol defined patients as relapsed if they no longer met the criteria for response. The protocol definition of response was having a Hamilton Depression Scale (HAM-D) score ≤8, or a 50% or greater reduction in HAM-D score relative to the baseline score (such patients were defined as ‘responders’ in the CSR). The CSR added a second indicator of response, ‘remission’, defined as having a HAM-D score ≤8, ‘in order to provide a rigorous anchor point in analyzing relapses in the continuation phase’ (p. 000050). We have accepted this departure from the protocol and applied this more conservative remission criterion to our analyses of relapses. This reduces the number of relapses.
In addition we have regarded as relapses any patients who, having previously responded, were discontinued from the study because of a suicide-related event. A number of these patients were hospitalized or discontinued from the study immediately at that point without a further HAM-D being completed. We have regarded such cases as relapses even though the most recently undertaken HAM-D score (prior to the event) might have been < 8. This seems to us a necessary modification of the protocol as such an outcome was in all likelihood not anticipated when the protocol was developed or at any point prior to the analysis of the data. The data are available for other researchers to analyse using other approaches.
b. Percentage of patients withdrawing because of lack of efficacy
The protocol called for the percentage of patients withdrawing because of lack of efficacy to be evaluated at the end of the continuation phase. We have included in this category those patients whose final HAM-D scores were consistent with a lack of efficacy, even if the stated reason for withdrawal was non-compliance or protocol violation or adverse events other than suicide-related events.
Safety endpoints
An adverse experience/event was defined in the protocol (p. 000564) as:
‘any noxious, pathologic or unintended change in anatomical, physiologic or metabolic functions as indicated by physical signs, symptoms and/or laboratory changes occurring in any phase of the clinical trial whether associated with drug or placebo and whether or not considered drug related.
This includes an exacerbation of pre-existing conditions or events, intercurrent illnesses, drug interaction or the significant worsening of the disease under investigation that is not recorded elsewhere in the case report form under specific efficacy assessments.’
Adverse events were to be elicited by the investigator asking a non-leading question such as: ‘Do you feel different in any way since starting the new treatment/the last assessment?’ (p. 000565). Details of treatment-emergent adverse events, including their severity, any change in study drug administration, investigator attribution to study drug, any corrective therapy given, and outcome status were documented. Attribution or relationship to study drug was judged by the investigator to be ‘unrelated’, ‘probably unrelated’, ‘possibly related’ or ‘related’ (p. 000565).
Sources of safety data
Adverse event data come from the CSR of the continuation phase lodged on GSK’s website, primarily Appendix D: Patient Data Listings of Adverse Experiences. Appendix B provides details of concomitant medications. Additional information was available from the summary narratives in the body of the CSR for patients who had adverse events that were designated as serious or led to withdrawal. However, a number of other patients discontinued because of adverse events that were not regarded as serious, or discontinued because of lack of efficacy or protocol violations, did not generate patient narratives.
The tables in Appendix D of the CSR report the verbatim terms used by the blinded investigators along with preferred terms as coded by SKB using the Adverse Drug Events Coding System (ADECS) dictionary. Appendix D also includes ratings of severity and ratings of relatedness. We used the MedicalDictionary for Regulatory Activities (MedDRA®) to code the verbatim terms provided in CSR Appendix D. MedDRA terminology is the international medical terminology developed under the auspices of the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) www.meddra.org). It has been endorsed by the United States Food and Drug Administration (FDA) and is now used by GSK [11].
Our analysis of the acute phase [3] had established that there are significant adverse event data missing from CSR Appendix D, so all CRFs for all patients entering the continuation phase were scrutinised for all adverse events occurring during the continuation phase. These adverse events were compared with those reported in CSR Appendix D. This review process identified additional adverse events that had not been recorded as verbatim terms in CSR Appendix D. It also led to recoding of a number of the reasons for discontinuation. The new adverse events and the reasons for changing discontinuation category are recorded in RIAT Appendix 2 accompanying this paper.
Coding of adverse events
The protocol (p. 000574) stated that adverse events were to be coded by body system and preferred terms, and compared using descriptive statistics, but did not specify a coding dictionary. The CSR (written after the study concluded) specified that adverse events were coded using the Adverse Drug Experience Coding System (ADECS), which SKB used at the time (p. 000044). ADECS was derived from a coding system developed by the FDA, Coding Symbols for a Thesaurus of Adverse Reaction Terms (COSTART), but ADECS is not itself a recognized system and is no longer available.
We coded adverse events using MedDRA, which has replaced COSTART for the FDA, because it is by far the most commonly used coding system today. For coding purposes, we have taken the original terms used by the clinical investigators as transcribed into the CSR Appendix D, and applied MedDRA codes to these descriptions. Information from Appendix D was transcribed into spreadsheets (available at Restoring Study329.org). The verbatim terms and the ADECS coding terms were transcribed first into these sheets, allowing all coding to be done before the drug names were added in. The transcription was carried out by a research assistant who was a MedDRA trained coder, but took no part in the actual coding. All coding was carried out by JLN, and checked by DH, or vice versa.
All of our coding from the verbatim terms in the CSR Appendix D was done blind, as was coding from the CRFs.
In general, MedDRA coding stays closer to the original clinician description of the event than ADECS. Most coding was straightforward. The vast majority of the verbatim terms simply mapped onto coding terms in MedDRA. The main coding challenges arose in relation to suicide-related events in the acute phase; these are covered in Le Noury et al. [3].
Analysis of safety data
As in our acute study re-analysis, in analysing the safety data, we present all adverse events rather than only those happening at a particular rate. Secondly, we have grouped events into broader system-organ-class (SOC) groups–psychiatric, cardiovascular, gastrointestinal, respiratory and other. Thirdly, we break down events by severity, selecting adverse events coded as severe, and utilising the listing in the CSR (Appendix G) of patients who discontinued for any reason.
In our acute study re-analysis [3], we laid out our categorization of suicidal events in detail. These were events whose coding in our opinion should have included a suicide-related code. There were also a considerable number of events coded under headings such as nightmares or abnormal thoughts. At the time this study was conceived and executed, few investigators or patients were familiar with the phenomenon of treatment-induced behavioral change, up to and including suicidality, and no rating instruments were included to ensure that such events were recorded systematically. Investigators and patients were apparently not briefed on this possibility. It is possible that a number of treatment induced behavioral disturbances would have been communicated obliquely. We have therefore taken events that might conceal a suicide-related event and presented these together in Table 12a, along with more clearly suicide-related events. The details of all patients included in this way are laid out in Table 12b.
As the acute and continuation phases are of very different duration, and a significant number of patients dropped out in the course of the continuation phase, a simple listing of the adverse events from each phase risks misleading. We have therefore presented the total number of events but also estimated the rate at which events occurred by duration of exposure.
Data access
We have made available on Study329.org all the data we have used, with the exception of the actual CRFs, which we do not have permission to share.
Statistical methods
No formal hypothesis testing was planned for the continuation phase. We applied ANOVA testing (generalized linear model) using a model including effects of site, treatment, and site×treatment interaction as per the Study 329 protocol. Otherwise we used only descriptive statistics.
Results
Attrition due to non-response, dropout and relapse is shown in Table 1. Demographic data are laid out in Table 2. Table 3 has response and relapse data. Tables 4–13 have safety data.
Table 1 shows that only 43 of 275 patients completed the continuation phase.
Table 2 shows the demographic characteristics of those who entered the continuation phase.
Efficacy
The overall profiles of HAM-D scores for observed cases in both acute and continuation phases for all three arms of the study are presented in Fig. 1.
Although efficacy could be assessed for the acute phase, the dropout rates between the acute and continuation phases and within the continuation phase were too high to allow a standard efficacy analysis. In the continuation phase, the dropout rates were 30/49 [61%] for paroxetine; 27/39 [69%] for imipramine; and 18/31 [58%] for placebo.
Table 3 shows response (remission) and relapse rates, which ranged from 21% for placebo to 41% for paroxetine. We have included in Table 3 a category of potential completers, to take into account 13 patients who dropped out of the study although their HAM-D scores were well within the responder range (HAM-D of 2 or 3). Some of them were discontinued because further blinded treatment was unavailable; for others, no clear explanation for discontinuation was given.
Discontinuations
During the continuation phase, 75 patients discontinued (30 from paroxetine, 27 from imipramine and 18 from placebo). The reasons for discontinuation are given in Table 4. Table 4 in RIAT Appendix 2 gives a breakdown of when these dropouts occurred.
Following a review of the codes given for reasons for withdrawal from the study that were found in the CSR (Appendix G), along with a review of patient narratives and CRFs where applicable, we proposed changes to these reasons for withdrawal in a proportion of those discontinued. These proposed changes can be found in Table 5 in RIAT Appendix 2.
Safety
Table 5 gives the number of adverse events reported by SKB in their continuation phase CSR. The report only provides data on events happening at a 5% or greater rate.
This can be contrasted with the data in Table 6, which presents all adverse events found in both CSR Appendix D and the CRFs, summarised by System Organ Class (SOC). In MedDRA, some adverse events always fall within a particular SOC; others require that the coder choose between SOCs. A full summary and full breakdown of adverse events can be found in Tables 1 and 2 in RIAT Appendix 2.
Severity ratings
Designating adverse events as serious hinged on the judgement of the clinical investigator. We are therefore not able to make comparable judgements of seriousness, but there are two other methods to approach the issue of severity of adverse events. One is to look at those rated as severe rather than moderate or mild at the time of the event (see Table 7).
A full breakdown of the severe adverse events within each SOC can be found in Table 3 of RIAT Appendix 2. A second method is to look at those that led to dropouts (Table 4).
Adverse events by exposure
As there were a large number of discontinuations in the continuation phase, a simple listing of adverse events may tell a different story from an analysis of these events in proportion to the duration of exposure.
Table 8 shows the weeks of exposure for each group in each phase. Tables 9 and 10 show the numbers of adverse events and severe adverse events in each phase, with the rates per 100 weeks exposure displayed in Figs. 2 and 3.
We have also looked at all behavioral adverse events in each phase (Table 11 and Fig. 4). These events include: agitation, aggravated depression, akathisia, abnormal dreams, depersonalisation,disinhibition, feelings of despair, hallucinations, impulsive behavior, negative thoughts, neurosis, paranoia, psychosis, suicide attempt, suicidal ideation, suicidal gesture, self-harm, and self-injury.
Table 12 gives the numbers of suicidal adverse events in each phase, with the rates per 100 weeks exposure displayed in Fig. 5. A full listing of all patients who experienced potentially suicidal events is presented in Table 13.
Adverse event profile of patients entering the continuation phase
Patients who completed acute and continuation phases may belong to different cohorts. Accordingly we analysed the acute-phase adverse event profiles of patients who entered the continuation phase, compared with those who did not, shown in Table 13. A full breakdown of all these adverse events can be found in Table 12 in RIAT Appendix 2.
Discussion
The original Study 329 investigators are to be commended for undertaking a study that included a continuation phase for the purposes of providing longitudinal data on the treatment of adolescents with major depression. As one of the few bodies of data offering information on longer term treatment of adolescents with mood disorders, the study data are of value.
We analysed and reported the continuation phase according to the original Study 329 protocol (with approved amendments). RIAT Appendix 1 shows the sources of information used in preparing this paper, which should aid other researchers who wish to access the data, either to check our analysis or to interrogate it in other ways. We draw minimal conclusions regarding efficacy and harms, inviting others to offer their own analysis.
The number of patients relapsing was designated a secondary outcome in Study 329. The results, however, remain unpublished. In our analysis, although we used more stringent criteria (i.e. remission) for response, we found higher rates of response in all three treatment groups than were reported in Keller et al. or in the text of the CSRs prepared by SKB, because we followed the classification in the CSR Appendix D, where response is based on HAM-D scores only, regardless of whether the patient or investigator violated the protocol.
Our analysis revealed higher relapse numbers in the active treatment groups than in the placebo group. This is in part determined by our decision to include in the relapse category patients who had a significant adverse event in the behavioral domain, but the higher numbers hold whether or not these patients are included.
Relapse was not a primary endpoint of the trial, and cannot be analysed in a way that would allow a definitive statement about rates of relapse compared to placebo. Furthermore it can be difficult to distinguish between apparent relapse and an adverse drug reaction, requiring caution in the case of patients who fail to respond to active treatment. Some of the patients in this study appear to have become paranoid or manic, or to have had a depressive relapse, all of which might lead to further diagnoses and/or prescriptions (a prescribing cascade) when in fact the wisest course of action might be to withdraw treatment.
The data on adverse events controlled for duration of exposure points to the taper phase as the riskiest period of treatment. It was difficult to be confident of the exact duration of exposure in the taper phase in some patients, but our estimates of duration are not likely to have inflated adverse event figures.
The CSR argued that simply looking at relapses is not a good way to establish long-term comparative efficacy (p. 000023). It proposed a randomized discontinuation design as the best way forward(p. 000023). However, the data from this study point to a discontinuation syndrome associated with paroxetine use. If this is the case, a randomized discontinuation design would not work, and we would be left with a more naturalistic option like the present study.
With regard to adverse events, the continuation phase of the study stands out as a phase where fewer adverse events either happened or were recorded. This to some extent is not surprising. It might be expected that the acute phase would weed out those patients not suited to the treatment they were on. But simple explanations like this may not fully account for the data, in that the patients entering the continuation phase appeared to have as many adverse events during the acute phase as those patients who did not opt to continue with treatment.
There are no other studies in this age group that we are aware of with which this study can be compared.
In our reporting of the acute phase of Study 329, we suggested that researchers and clinicians should recognise the potential biases in published research, including the potential barriers to accurate reporting of harms to which this study pointed. We also urged regulatory authorities to mandate access to trial data. This analysis of the continuation phase of Study 329 adds further weight to this recommendation.
It also adds weight to our invitation to others to access the data we have used. We are very clear that the analyses offered here are not the only ones possible. Our understanding of this dataset can only be enhanced by input from others who may make differing calls regarding coding and/or apply different analytic tools to the data.
Trial Registration
Registration number and name of trial register: SmithKline Beecham study 29060/329.
Trial Protocol
SmithKline Beecham study 29060/329, Final Clinical Report (Acute Phase) [5], Appendix A, Protocol (from p. 000531) [4].
Trial Funding
SmithKline Beecham study.
Ethical approval
“The protocol and statement of informed consent were approved by an Institutional Review Board (IRB) prior to each center’s initiation, in compliance with 21 United States Code of Federal Regulations (CFR) Part 56. Written informed consent was obtained from each patient prior to entry into the study, in compliance with 21 CFR Part 50. Case report forms were provided for each patient’s data to be recorded.” (Final Clinical Report page 000030). The sample informed consent is provided in Appendix C of the protocol (pp. 000590-000594). No further information is available regarding the particular IRB that approved the study.
Funding for RIAT reanalysis
No funding received.
Data analysis protocol for RIAT reanalysis
Submitted to GSK on 28 October 2013. Approved by GSK on 4 December 2013.
Authorship
All authors meet ICMJE authorship criteria. Conception/design of the work: Healy, Jureidini, Nardo. Acquisition of data: Jureidini (negotiation with GSK); Tufanaru and Abi-Jaoude (RIATAR); Nardo (efficacy data using GSK online remote system); Le Noury (harms data using GSK online remote system). Data analysis: Nardo (efficacy); Le Noury and Healy (harms). Data interpretation: all authors. Drafting the work and revising it critically for important intellectual content, final approval of the version to be published: all authors. Agreement to be accountable for all aspects of the work: all authors (guarantor Jureidini).
Appendices
RIAT Appendix 1 – RIAT audit record (RIATAR)
RIAT Appendix 2 – Adverse event tables
