Detecting meaningful changes in trials of headache treatments: Which outcome measure is best?

Abstract

A major premise of clinical drug trials in headache is that the outcome measures they use are valid and reliable; in other words that they closely and reproducibly gauge the burden of headache and associated symptoms. A third desirable psychometric characteristic is ‘responsiveness’, the ability of a measure to detect clinically meaningful changes in headache symptoms that occur as a result of treatment. These attributes, along with clinical examples and explanations, are listed in Table 1.

Table 1.

Selected psychometric attributes of outcome measures in clinical trials of treatments for acute headache.

Attribute	Definition	Example
Reliability	Is the measure reproducible?	When assessed at different times, do headache patients with steady pain levels provide the same or very similar ratings of pain intensity?
Responsiveness	Does it alter in response to clinically meaningful (relevant) changes in headache?	When patients report that a medication has produced a meaningful change in their headache, does the outcome measure reflect this change?
Validity	Does the question measure the aspect of headache it is intended to measure?	Do patients with headache agree that a 0–4 rating scale can represent headache pain, and agree that higher numbers are associated with higher pain levels? Do the 0–4 pain ratings correlate with other established, graded measures of pain, such as a visual analogue scale?

In this issue of Cephalalgia, Aicher et al. retrospectively analyze data from a large German study to examine the responsiveness of a variety of efficacy measures commonly used in headache research (1). This study was well conducted, and the analyses they present meet the stringent criteria proposed in the COSMIN checklist to evaluate the methodological quality of studies on measurement properties of health status measurement instruments (2). In an open pre-phase portion of the trial, 1734 subjects with tension-type or migraine headaches treated a single headache with their usual non-prescription medication. They were then randomized to one of six medication groups for blinded treatment of a subsequent headache. The groups received either the fixed combination of acetylsalicylic acid, paracetamol and caffeine or the combination without caffeine, the single drugs, or placebo.

The outcome measures listed in Table 2 were examined in both trial phases and compared to the subject’s global assessment of drug efficacy, which was selected by the researchers as a ‘gold standard’ measure. This global assessment was obtained within 12 hours of taking study medication by asking subjects ‘How do you assess the efficacy of your tablets?’ Possible answers were 1 (very good); 2 (good); 3 (less good) or 4 (poor). For the purposes of the responsiveness analysis, subjects rating the efficacy of their treatment as 1 or 2 were considered to be ‘satisfied’ and those rating the efficacy of their treatment as 3 or 4 were considered ‘non-satisfied’. This dichotomization allowed the researchers to evaluate responsiveness by generating receiver operator characteristic curves, a method commonly used in studies of diagnostic accuracy. The cut-off points for each measure were chosen to balance sensitivity and specificity. The area under the curve (AUC) for each continuous measure quantifies its ability to discriminate between satisfied and non-satisfied subjects. Put another way, the AUC is an estimate of the probability that a subject who is classified as ‘satisfied’ or ‘non-satisfied’ scored above or below the cut-point, respectively, for that measure. For the non-continuous measure of ‘pain-free’, only a single measure of sensitivity and specificity can be determined.

Table 2.

Outcome measures evaluated against the gold standard of retrospective satisfaction with efficacy.

Outcome measure	Method of determination
Time from treatment to 50% pain relief	Derived from ratings on a 0–100 mm visual analogue scale, assessed at 30 minutes, 1, 2, 3 and 4 hours
Time from treatment until reduction of pain intensity to 10 mm	Derived from ratings on a 0–100 mm visual analogue scale, assessed at 30 minutes, 1, 2, 3 and 4 hours
Weighted sum of pain intensity difference (%SPID)	Based on visual analogue scale assessment of the maximum extent of pain relief an individual patient could achieve; actual pain relief then expressed as a percentage of the maximum achievable pain intensity difference
Pain intensity difference at 2 hours	Difference between initial pain and pain at 2 hours as measured in millimeters on the visual analogue scale
Proportion of subjects pain-free at 2 hours after treatment	Patient report of no pain

Results showed that all of the evaluated outcomes were correlated with the subject’s global assessment of efficacy. Subjects who were non-satisfied tended to be those who had little pain relief, while those who were satisfied were more likely to report substantial and more rapid pain relief. Furthermore, when subjects were disaggregated into four groups based on their original efficacy ratings of poor, less good, good and very good, there were step-wise increases in satisfaction that correlated with the magnitude and speed of relief. The AUCs for the different outcomes, however, were remarkably similar within and between the two phases of the trial, ranging from 0.77–0.86 in the pre-phase and 0.84–0.89 in the treatment phase. In both the pre-phase and treatment phases of the trial, the weighted sum of pain intensity difference (%SPID) performed slightly better than the other end-points. This advantage was slim, though, and the authors wisely do not make much of this small variation. They conclude that the optimal cut-off for distinguishing between satisfied and non-satisfied subjects is a time to 50% pain relief between 70 and 90 minutes, a time to pain intensity reduction to 10 mm of just less than 3 hours, and a %SPID in the 60% range. They note that the outcome recommended by the International Headache Society Committee on Clinical Trials, pain-free at 2 hours, has a sensitivity in the low 60% range with a specificity in the 80% range, and observe that it makes ‘less use of the available information compared to the endpoint of time to 50% pain reduction’.

The results of this study are in line with previous evidence showing that patients value speedy, substantial reductions in pain (3,4). Based on these findings, the answer to the question ‘Which outcome measure is best at detecting change in treatment trials of acute headache?’ is ‘Actually, they are all pretty good – at least when judged by the relatively crude standard of satisfied or non-satisfied’. Even the best measure, however, misclassified almost 10% of subjects. Why is it that some subjects who had substantial or rapid relief of pain were not satisfied? There are several possible explanations. One is that although subjects were asked to provide a global assessment of efficacy not including adverse effects, some might have had difficulty separating the two ideas. Thus, even if pain relief goals were met, a subject who experienced unpleasant side effects from treatment might still report poor satisfaction. Another is that the global efficacy rating was obtained retrospectively, and subject recall of earlier events may be inaccurate or distorted. Finally, pain is only one unpleasant component of headache for many patients, who may also suffer from nausea, photo- or phonophobia. While for some subjects pain relief may be the most important determinant of efficacy, for others it might be improvement in other aspects of suffering. Unfortunately, this cannot be determined because all of the outcome measures evaluated in this study had to do with pain rather than the associated features of headache.

This study is therefore uninformative about the extent to which drug efficacy for individual non-pain components of headache might contribute to satisfaction. It would be necessary to study their contribution if these outcomes are ever intended for use to support a drug labeling claim, because FDA guidance for industry regarding patient-reported outcome measures states that ‘… if improvement in a score for a general concept (e.g., symptoms associated with a certain condition) is driven by a single responsive item (e.g., pain intensity improvement) whereas other important items (e.g., other symptoms) did not show a response, a general claim about the general concept (e.g., improvements in symptoms associated with the condition) cannot be supported’ (5).

The authors provide a convincing argument for the use of global efficacy ratings as a gold standard for the evaluation of treatment of acute headache. As a patient-reported outcome, this has face validity and is simple to measure. A global efficacy rating, however, does not incorporate information about adverse events. In clinical practice tolerability may play an important role in treatment satisfaction and adherence. This might not matter when testing a relatively tolerable non-prescription medication with minimal side effects, as in this study. Previous research about the importance of treatment side effects to migraine patients is mixed (6,7). Because of this, caution is warranted in assuming that the good performance of the efficacy-only outcome measure in this study is generalizable to studies of prescription medications with more burdensome side effects.

The binary nature of this outcome measure is another weakness: although few would dispute that it is clinically meaningful to move from the non-satisfied to the satisfied category, smaller changes within these categories might also be perceived as beneficial and influence treatment. Outcomes were assessed at defined times of 30 minutes, 1, 2, 3 and 4 hours, rather than through the use of stopwatch or other methods, thus introducing imprecision into the results.

It is illuminating to consider the findings of this study in the context of a similar study by Friedman and colleagues, who assessed the properties of outcome measures in trials of emergency department treatment for acute migraine (7). The researchers in that case also chose a patient-reported global outcome measure as their gold standard criterion. A difference from the present study was that their measure of ‘would take again’ was a composite appraisal of both efficacy and side effects. This they defended as ‘a simple, dichotomous, clinically sensible outcome, which allows migraineurs to factor important intangibles of efficacy and adverse effects of treatment into an overall assessment of care’. Instead of using ROC methodology, the authors reported odds ratios with 95% confidence intervals for the association between each outcome measure and the gold standard criterion. Their results showed that most traditional outcome measures, such as pain-free at 2 hours, or sustained pain-free status, were modestly associated with the gold standard criterion, but even the best still incorrectly classified about 20% of subjects. The conclusions of Friedman et al. provide a caution relevant to the present study, namely that ‘… measuring pain alone, functionality alone, and certainly other migraine symptoms or adverse effects alone does not adequately summarize a patient’s experience with the migraine medication … Migraine clinical trials that focus exclusively on improvement in a pain intensity scale may not be measuring the most clinically relevant outcome’ (7).

There is a paucity of research examining the psychometric properties of outcome measurements in headache. The study by Aicher and colleagues adds importantly to our understanding of the way in which changes in the clinical status of headache patients influence specific outcome measures. The authors are certainly correct to call for more attention to the construct of responsiveness by the International Headache Society Committee on Clinical Trials. Additional research is needed to see if the findings from this study of a tolerable non-prescription drug in a mixed population of non-treatment-seeking headache sufferers will apply in other populations and settings. As well as the use of ROC curves, other indices of responsiveness might be examined, such as standardized response means or the correlation between effect sizes of different measures.

References

Aicher

Peil

Diener

H-C

. Responsiveness of efficacy endpoints in clinical trials with over the counter analgesics for headache. Cephalalgia 2012; 32(13): 953–962 (this issue).

Mokkink

Terwink

Patrick

. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010; 19: 239–249.

Lipton

Hamelsky

Dayno

. What do patients with migraine want from acute migraine treatment?. Headache 2002; 41(Suppl 1): 3–9.

Davies

Santanello Lipton

. Determinants of patient satisfaction with migraine therapy. Cephalalgia 2000; 20: 554–560.

Food and Drug. Administration: Draft Guidance for Industry on Patient-Reported Outcome Measures: Use in Medicinal Product Development to Support Labeling Claims. Federal Register 2006; 71: 5862–5863.

Gallagher

Kunkel

. Migraine medication attributes important for patient compliance: concern about side effects may delay treatment. Headache 2003; 43: 46–53.

Friedman

Bijur

Lipton

. Standardizing emergency department-based migraine research: an analysis of commonly used clinical trial outcome measures. Acad Emerg Med 2010; 17: 72–79.