Abstract
Regulatory trials rigorously test the ability of a treatment to impact a known outcome, such as seizure frequency, as measured in well-controlled trials. Such outcomes are called efficacy outcomes. Sometimes these measured outcomes do not reflect the true effectiveness of the drug in the clinic. This article provides some examples of how this can happen and also discusses trials intended to measure effectiveness.
In the past 25 years, there has been an explosion in antiepileptic drug (AED) development. Three quarters of the drugs we now use regularly were developed since 1990. Drugs developed and brought to market during this time are often called “second and third generation.” Whereas the older (first-generation) AEDs were not subjected to the rigorous Food and Drug Administration (FDA) standards that are currently in place, the second- and third-generation AEDs have been evaluated in rigorous placebo-controlled trials that have been designed to satisfy regulators, including the FDA, as well as the European Medicines Agency (EMA). These trials provide an initial confirmation that a new AED has a beneficial effect on seizures. However, even after the drug is considered acceptable for approval, there are many questions that remain with respect to clinical use. This review will highlight what is known about a drug as a result of regulatory trials, what remains to be learned, and how additional clinically relevant information can be uncovered.
The Regulatory Path of a New AED
Typically, regulatory authorities require two well-controlled clinical trials to be performed before they will approve a drug for use. The first trials are typically performed in adults with treatment-resistant partial onset seizures. Often, once efficacy has been demonstrated in one indication, approval for further indications (for example, another age group or another epilepsy syndrome) will only require a single trial. These regulatory studies typically enroll treatment-resistant patients who are already on one to three antiepileptic drugs, not including a vagus nerve stimulator, and are having at least three to four countable partial seizures per month. A 6- to 12-week baseline precedes randomization. Patients are then randomized to add-on placebo versus one to three fixed doses of the investigational drug. There is typically a 1- to 4-week titration period, followed by a 12-week maintenance period (1).
Efficacy versus “Effectiveness”
Do regulatory trials inform us as to whether a drug works? In order to answer this question, it is important to understand the concepts of “efficacy” versus “effectiveness” (2). Efficacy is the term that is used to describe the improvement measured in controlled clinical trials. The term was coined to remind us that this “improvement” may or may not be a reflection of what happens in real life, in the clinic. Why should that be? There are several reasons. For example, “efficacy” is measured with respect to one or two prespecified outcome measures, in a very specific population, over a limited time frame. These factors (selected outcome measure, population, and time frame) can significantly impact the generalizability and usefulness of the information derived from trials. Effectiveness, in contrast, describes the ability of a drug to be useful in the clinic. Regulatory clinical trials are very good for determining efficacy (that is, after all, what these trials are for) but may miss the mark in determining effectiveness, which is what clinicians are most interested in. Several illustrative examples directly related to epilepsy trials are useful to demonstrate this point.
Prespecified Outcome Measure
In order to perform a methodologically sound trial, an outcome measure should be selected in advance, and in fact the regulatory authorities require this. This prerequisite prevents “cherry-picking” one positive outcome from many outcomes, which may be positive, negative, or mixed. This does not mean that regulatory authorities ignore outcomes other than the primary, but this is the one that is scrutinized the most, and the one that is ultimately conveyed in promotional literature and the package insert provided to patients. So, for example, a drug may be shown to cause 35% of patients to experience a 50% reduction in the occurrence of partial-onset seizures, with a statistically significant difference from placebo. But, this number (the drug's efficacy) can be misleading for a number of reasons. Take, for example, the add-on trial in refractory partial-onset seizures performed to obtain regulatory approval for oxcarbazepine (3). In this trial, 50% of patients on the highest dose, 2,400 mg, had more than 50% reduction in seizure frequency compared with 13% for placebo, 27% for 600 mg, and 42% for 1,200 mg. This makes 2,400 mg the most efficacious dose. But, other endpoints that were measured in the trial make it unlikely that this was the most effective dose. For example, the discontinuation rate was the highest for 2,400 mg. In fact, more patients discontinued (66.7%) at this dose than had a 50% reduction. This is possible because even patients who discontinued could have had a reduction in seizures, albeit short lived, as a result of drug administration. Similarly, in the trials performed for regulatory approval of tiagabine, the most efficacious dose for partial-onset seizures in two placebo-controlled add-on studies was 56 mg. Overall, efficacy increased with increasing dose (4). But this measured outcome does not tell the whole story, and in the case of tiagabine, part of the story may derive from an outcome measure that was never even evaluated in the pivotal trial. A number of years after the drug was approved, an analysis was undertaken looking at the number of patients who worsen when randomized to a new AED in comparison with placebo (5). It was presumed that since all the drugs examined in the study had been efficacious, fewer patients would worsen when randomized to active drug than those who were randomized to placebo, and this was correct for the other drugs examined (topiramate, gabapentin, and levetiracetam) but not for tiagabine. In this pooled analysis, the odds of at least a doubling of seizures increased by a factor of 1.4 (95% confidence interval [CI] 1.0 to 2.0, p < 0.05) for each increase in dose level above placebo (5). At the highest dose of 48 to 56 mg, 8.9% of patients had a doubling or more, compared with only 3.3% on placebo (5). So, while tiagabine made some patients better, it made others worse. This likely would impact the clinical effectiveness of the drug.
Population
Populations who are randomized in regulatory trials are a very select group and differ from the population as a whole. Their epilepsy is quite severe (trials for an indication in add-on partial seizures typically require 3–4 seizures/month), and they have typically failed many drugs before resorting to an investigational agent. Significant psychiatric illness, medical illness, or both often are exclusion criteria. The very old and very young are also excluded. These restrictions limit the generalizability of the trials. In most cases, we have no way of knowing whether patients who do not fit these constraints would fare similarly. Also, to date, regulatory trials have focused on populations that are easier to recruit into trials, namely, treatment-resistant partial-onset seizures, treatment-resistant generalized tonic-clonic seizures, and seizures associated with the Lennox-Gastaut syndrome. The effectiveness (or, for that matter, efficacy) of the drugs in other syndromes is often left to determination by trial and error in the clinic.
Time
Regulatory trials typically compare treatment arms for only 12 weeks, since patients cannot be maintained on placebo in a blinded trial for a longer time. There is great debate about whether this is really long enough to determine if a drug will work in the clinic. In fact, often the difference between drug and placebo is greatest in the first month of the trial, and by the third month, response in the placebo group and the treatment group is becoming more similar. Is it possible that, given enough time, the two lines would merge? Alternately, it is possible that the treatment may become more effective over time (as has been suggested for the device therapies, including vagus nerve stimulation, deep brain stimulation, and cortical responsive stimulation) (6–8). If either is the case, we will have a great deal of trouble proving it because of the short availability of the control group. Thus, for all we know, the efficacy we measure (over the short term) may be driving us to select drugs that are strong at the outset with no staying power, and rejecting drugs that might be better in the long run.
If determination of efficacy may be misleading, why measure it at all? Why not measure effectiveness from the get-go? Unfortunately it is not that simple. There is an inherent tension between internal and external validity of a trial. By using randomization and an internal control such as placebo, one is able to limit the likelihood that chance, bias, or both have influenced trial outcome. This enhances the likelihood that results are “true” (or valid) and are definitely related to the intervention (the investigational drug). Yet, the very act of controlling and randomizing the study creates an artificiality that may limit the ability to generalize the results to a larger set of circumstances (external validity).
A number of trials have been performed specifically to assess the effectiveness of drugs. Among the most well-known are the Standard and New Antiepileptic Drugs (SANAD) studies. SANAD A compared the standard drug carbamazepine with the newer drugs topiramate, gabapentin, lamotrigine, and oxcarbazepine in patients thought to have partial-onset seizures (9). SANAD B compared the standard drug vaproic acid with the newer drugs lamotrigine and topiramate in patients primarily thought to have idiopathic generalized epilepsy (10). The SANAD studies addressed many of the concerns mentioned above. The study was quite long, and outcome could be examined in many cases over years, not months. The population was varied with respect to age and other characteristics. Since they were enrolled at time of diagnosis, some were treatment sensitive and some ultimately were resistant to treatment. There were several outcome measures, but the primary outcomes were time to treatment failure and time to 1 year seizure remission. Of note, the SANAD study would not be considered as regulatory evidence of drug efficacy since, due to pragmatic reasons (difficulty of masking so many medications, inability to adjust blinded doses) the study was randomized, but not blinded. In addition, no placebo arm was included since the treatments were prescribed in monotherapy. In SANAD A, lamotrigine was determined to be more effective than carbamazepine, not due to superior efficacy (they were about the same), but due to lower likelihood of drop-outs because of side effects. Of note, the most common adverse event associated with treatment failure was rash, and this occurred in 7% of patients allocated to carbamazepine and accounted for 21% of carbamazepine treatment failures. This result compared with a failure rate for rash on lamotrigine of 3%. Rash related to carbamazepine may be associated with both choice of titration rates and dose; and, in the interest of simulating the “real world,” this was left to the physician's discretion. This information brings into question the certainty of selecting a “drug of choice” based on an effectiveness measure that includes tolerability issues, as they may in fact be able to be improved upon by altering how the drug is administered. Another example of how using a combined efficacy/tolerability measure can muddy the picture is aVeteran's Affairs (VA) cooperative study that compared 600 mg of carbamazepine immediate release, versus 150 mg of lamotrigine and 1500 mg of gabapentin in elderly with newly diagnosed seizures. Patients assigned to carbamazepine had a longer time to first seizure but were also more likely to drop out than patients in the other arms, and thus carbamazepine was declared the “loser.” But it is likely that many drop-outs might have been avoided with the use of sustained-release carbamazepine, or titration to a lower target dose (11, 12). These are examples of how, similar to measurement of efficacy, measurement of effectiveness may present issues of interpretation.
The regulatory agencies have drawn their line in the sand, stating their preference for a result that is indisputably “true,” even if it is not particularly clinically informative. And there is some merit to this perspective. One has to begin with a clear demonstration that a drug works at least in one circumstance. For example, in the SANAD study, since there was no substantial difference in the seizure outcomes for carbamazepine and lamotrigine, it could be argued either that neither worked or that both worked equally well. It is for this reason that regulators in the United States do not consider a “no-difference” outcome in a comparative study to be informative.
In summary, one could argue that neither “efficacy” nor “effectiveness” trials alone may be able to inform us about whether drugs are able to improve the symptoms of epilepsy in the clinic, that is, whether they truly “work” in all senses of the word. Relying on efficacy studies alone would be unsatisfying, as they bear so little relationship to most populations and situations that doctors deal with from day to day. Yet, they play a role in the overall determination of whether drugs “work.” It would be inherently dangerous to perform studies of clinical utility (effectiveness) without proof of efficacy first, so that we don't end up marketing placebo treatments. Evaluating data from both types of trials may eventually provide us with the best information upon which to make treatment decisions.
Highlights
The reduction in seizures relative to placebo measured in randomized, controlled regulatory trials may not always reflect the usefulness of a drug in the clinic.
The population that is enrolled in regulatory trials may not reflect the variety of patients seen in the clinic, and therefore trial results may not be generalizable.
Clinical regulatory trials give a snapshot of treatment effects over a relatively short duration (several months). Although we presume this will reflect longer term effects, this may not always be the case.
Results from “efficacy” (more artificial, more controlled) and “effectiveness” (more real-world, less controlled) trials need to be combined to see a true picture of treatment effects.
