Abstract
Randomized controlled trials (RCTs) of a new investigational drug often include active as well as placebo control arms. The active arm, comprising an approved treatment for the indication under study, along with the placebo arm, are together required to establish assay sensitivity; if the active treatment outperforms placebo, as expected, the results of the RCT can be further interpreted, but if the active treatment is no better than placebo (such as because of ceiling or floor effects), the RCT is a failed trial. The concepts involved are explained from scientific and ethical perspectives.
Keywords
Randomized controlled trials (RCTs) of new investigational drugs (NEW) sometimes have three treatment arms: NEW, active comparator (ACTIVE; this is an existing drug that has been approved for the indication under study), and placebo comparator (PLACEBO). In such RCTs, two questions arise: (a) Why is ACTIVE required; that is, why is it not sufficient to have just PLACEBO and examine whether NEW is more effective than PLACEBO? (b) Why is PLACEBO required; why is it not sufficient to show, for example, that NEW and ACTIVE do not differ significantly in efficacy, and so conclude that the two must be equal?
These questions are scientifically important. These questions are also often asked in postgraduate examinations, or by members of ethics committees who consider that to administer placebo is unethical. The answers to these questions require an understanding of the ceiling and floor effects.
Ceiling and Floor Effects
Imagine an examination that is so simple that everybody scores full marks; that is, everybody reaches the ceiling. In such an examination, this ceiling effect prevents us from discovering who the brightest student is, and who is the least bright. Likewise, imagine an RCT in which the patient population sampled is so responsive to treatment that most patients improve considerably even with placebo; that is, response to placebo nears the ceiling. In such an RCT, even if NEW patients reach the ceiling for efficacy, PLACEBO patients may be so close to this ceiling that there may not be sufficient difference between the groups for statistical significance to be identified. Thus, in an atypically responsive sample, the ceiling effect can prevent the identification of a possible genuine difference in efficacy between NEW and PLACEBO.
Next, imagine an examination that is so hard that everybody scores zero; that is, everybody is at the floor. In such an examination, this floor effect prevents us from discovering who the brightest student is, and who is the least bright. Likewise, imagine an RCT in which the patient population sampled is so treatment-resistant that most patients do not improve with any treatment. In such an RCT, NEW and PLACEBO patients improve negligibly, and there is so little difference in response between groups that they may not differ to a statistically significant extent. Thus, in an atypically refractory sample, the floor effect can prevent the identification of a possible genuine difference in efficacy between NEW and PLACEBO.
Need for an Active Comparator Group in RCTs
In RCTs designed with NEW, ACTIVE, and PLACEBO arms, NEW and ACTIVE are not intended to be statistically compared. This is because if NEW is truly effective, then NEW and ACTIVE would perhaps not differ much, and so a very large sample would be necessary to demonstrate superiority of one drug over the other, or noninferiority of NEW to ACTIVE. Such three-arm RCTs, rather, are only powered to detect a superiority of NEW over PLACEBO, and ACTIVE is required to establish
What is assay sensitivity? If statistical analysis shows that NEW is not superior to PLACEBO but ACTIVE is superior to PLACEBO, we can conclude that ACTIVE is effective and that ceiling or floor effects are not responsible for the failure of NEW to outperform PLACEBO. However, if NEW and ACTIVE are both no better than PLACEBO, depending on how large the response is, we conclude that a floor effect (all groups respond uniformly poorly) or ceiling effect (all groups respond uniformly well) has scuttled the RCT; and because we expected ACTIVE to outperform PLACEBO and ACTIVE did not, we conclude that such a study failed assay sensitivity, and that this was a
Need for a Placebo Comparator Group in RCTs
Imagine that NEW is actually
Another Way to Understand the Subject
If the sample is treatment-refractory, ACTIVE will perform poorly; so if NEW also performs poorly, there could be two explanations: (a) NEW is truly ineffective. (b) NEW may be effective but performed poorly because the sample was refractory (floor effect). Without a placebo group, we would not know which possibility is right.
If we do have a placebo group, we might find that ACTIVE is statistically superior to PLACEBO, and that NEW is also better than PLACEBO. So, NEW is effective. Or we may find that ACTIVE is better than PLACEBO but NEW is not better than PLACEBO. So, NEW is ineffective. Or we may find that ACTIVE is no better than PLACEBO, indicating that this is a failed trial; “failed,” because it fails to show the superiority of an approved treatment. In failed trials, we cannot comment on whether or not NEW is effective. These are the reasons why we ideally need both an active and a placebo comparator group. The active comparator group is for
Alternately, if the sample is very responsive, ACTIVE may perform very well; so, if NEW performs as well, there could be two explanations: (a) the good response is because of a placebo effect and (b) the good response is because of true efficacy. Has the “ceiling effect” prevented us from discovering whether or not NEW is effective?
We would only know the answer if we had included a placebo arm. If PLACEBO does as well as ACTIVE, we would know that this is a failed trial and that a ceiling effect may explain the observed efficacy of NEW. As already stated, in a failed trial, we cannot comment on whether or not NEW is effective. So, once again, the need for PLACEBO is apparent, and the presence of ACTIVE helps determine assay sensitivity.
Parting Note
Why not design a noninferiority trial that compares just NEW and ACTIVE? In such a trial, there is no need for PLACEBO and hence no ethical objection to the study. Noninferiority RCTs require a very large sample size. If NEW is ineffective, this would mean exposing a large number of patients to an ineffective treatment, which is ethically unjustified. In contrast, a much smaller sample would have been required to show that NEW and PLACEBO do not separate in a superiority trial, meaning that fewer patients would have been exposed to ineffective treatment. So, if NEW is ineffective, inclusion of PLACEBO is actually ethically desirable! More to the point is that, because of the absence of a placebo group, we can never determine whether or not a noninferiority RCT was a failed trial. This defeats the purpose of research. Interested readers are referred to Miller 1 for further discussion on the subject.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
