Abstract

Selected psychometric attributes of outcome measures in clinical trials of treatments for acute headache.
In this issue of Cephalalgia, Aicher et al. retrospectively analyze data from a large German study to examine the responsiveness of a variety of efficacy measures commonly used in headache research (1). This study was well conducted, and the analyses they present meet the stringent criteria proposed in the COSMIN checklist to evaluate the methodological quality of studies on measurement properties of health status measurement instruments (2). In an open pre-phase portion of the trial, 1734 subjects with tension-type or migraine headaches treated a single headache with their usual non-prescription medication. They were then randomized to one of six medication groups for blinded treatment of a subsequent headache. The groups received either the fixed combination of acetylsalicylic acid, paracetamol and caffeine or the combination without caffeine, the single drugs, or placebo.
Outcome measures evaluated against the gold standard of retrospective satisfaction with efficacy.
Results showed that all of the evaluated outcomes were correlated with the subject’s global assessment of efficacy. Subjects who were non-satisfied tended to be those who had little pain relief, while those who were satisfied were more likely to report substantial and more rapid pain relief. Furthermore, when subjects were disaggregated into four groups based on their original efficacy ratings of poor, less good, good and very good, there were step-wise increases in satisfaction that correlated with the magnitude and speed of relief. The AUCs for the different outcomes, however, were remarkably similar within and between the two phases of the trial, ranging from 0.77–0.86 in the pre-phase and 0.84–0.89 in the treatment phase. In both the pre-phase and treatment phases of the trial, the weighted sum of pain intensity difference (%SPID) performed slightly better than the other end-points. This advantage was slim, though, and the authors wisely do not make much of this small variation. They conclude that the optimal cut-off for distinguishing between satisfied and non-satisfied subjects is a time to 50% pain relief between 70 and 90 minutes, a time to pain intensity reduction to 10 mm of just less than 3 hours, and a %SPID in the 60% range. They note that the outcome recommended by the International Headache Society Committee on Clinical Trials, pain-free at 2 hours, has a sensitivity in the low 60% range with a specificity in the 80% range, and observe that it makes ‘less use of the available information compared to the endpoint of time to 50% pain reduction’.
The results of this study are in line with previous evidence showing that patients value speedy, substantial reductions in pain (3,4). Based on these findings, the answer to the question ‘Which outcome measure is best at detecting change in treatment trials of acute headache?’ is ‘Actually, they are all pretty good – at least when judged by the relatively crude standard of satisfied or non-satisfied’. Even the best measure, however, misclassified almost 10% of subjects. Why is it that some subjects who had substantial or rapid relief of pain were not satisfied? There are several possible explanations. One is that although subjects were asked to provide a global assessment of efficacy not including adverse effects, some might have had difficulty separating the two ideas. Thus, even if pain relief goals were met, a subject who experienced unpleasant side effects from treatment might still report poor satisfaction. Another is that the global efficacy rating was obtained retrospectively, and subject recall of earlier events may be inaccurate or distorted. Finally, pain is only one unpleasant component of headache for many patients, who may also suffer from nausea, photo- or phonophobia. While for some subjects pain relief may be the most important determinant of efficacy, for others it might be improvement in other aspects of suffering. Unfortunately, this cannot be determined because all of the outcome measures evaluated in this study had to do with pain rather than the associated features of headache.
This study is therefore uninformative about the extent to which drug efficacy for individual non-pain components of headache might contribute to satisfaction. It would be necessary to study their contribution if these outcomes are ever intended for use to support a drug labeling claim, because FDA guidance for industry regarding patient-reported outcome measures states that ‘… if improvement in a score for a general concept (e.g., symptoms associated with a certain condition) is driven by a single responsive item (e.g., pain intensity improvement) whereas other important items (e.g., other symptoms) did not show a response, a general claim about the general concept (e.g., improvements in symptoms associated with the condition) cannot be supported’ (5).
The authors provide a convincing argument for the use of global efficacy ratings as a gold standard for the evaluation of treatment of acute headache. As a patient-reported outcome, this has face validity and is simple to measure. A global efficacy rating, however, does not incorporate information about adverse events. In clinical practice tolerability may play an important role in treatment satisfaction and adherence. This might not matter when testing a relatively tolerable non-prescription medication with minimal side effects, as in this study. Previous research about the importance of treatment side effects to migraine patients is mixed (6,7). Because of this, caution is warranted in assuming that the good performance of the efficacy-only outcome measure in this study is generalizable to studies of prescription medications with more burdensome side effects.
The binary nature of this outcome measure is another weakness: although few would dispute that it is clinically meaningful to move from the non-satisfied to the satisfied category, smaller changes within these categories might also be perceived as beneficial and influence treatment. Outcomes were assessed at defined times of 30 minutes, 1, 2, 3 and 4 hours, rather than through the use of stopwatch or other methods, thus introducing imprecision into the results.
It is illuminating to consider the findings of this study in the context of a similar study by Friedman and colleagues, who assessed the properties of outcome measures in trials of emergency department treatment for acute migraine (7). The researchers in that case also chose a patient-reported global outcome measure as their gold standard criterion. A difference from the present study was that their measure of ‘would take again’ was a composite appraisal of both efficacy and side effects. This they defended as ‘a simple, dichotomous, clinically sensible outcome, which allows migraineurs to factor important intangibles of efficacy and adverse effects of treatment into an overall assessment of care’. Instead of using ROC methodology, the authors reported odds ratios with 95% confidence intervals for the association between each outcome measure and the gold standard criterion. Their results showed that most traditional outcome measures, such as pain-free at 2 hours, or sustained pain-free status, were modestly associated with the gold standard criterion, but even the best still incorrectly classified about 20% of subjects. The conclusions of Friedman et al. provide a caution relevant to the present study, namely that ‘… measuring pain alone, functionality alone, and certainly other migraine symptoms or adverse effects alone does not adequately summarize a patient’s experience with the migraine medication … Migraine clinical trials that focus exclusively on improvement in a pain intensity scale may not be measuring the most clinically relevant outcome’ (7).
There is a paucity of research examining the psychometric properties of outcome measurements in headache. The study by Aicher and colleagues adds importantly to our understanding of the way in which changes in the clinical status of headache patients influence specific outcome measures. The authors are certainly correct to call for more attention to the construct of responsiveness by the International Headache Society Committee on Clinical Trials. Additional research is needed to see if the findings from this study of a tolerable non-prescription drug in a mixed population of non-treatment-seeking headache sufferers will apply in other populations and settings. As well as the use of ROC curves, other indices of responsiveness might be examined, such as standardized response means or the correlation between effect sizes of different measures.
