Abstract
Background
The sensitivity of patient-reported outcomes (PROs) to detect the effects of treatment change depends on the match between the change in items of the PRO and the change that takes place in a sample of people. The aim of this study is to compare the sensitivity of different PROs in detecting changes following the initiation of biologic treatment in asthma.
Keywords
Introduction
Patient-reported outcomes (PROs) form an increasingly important part of treatment evaluation because they measure the impact of treatment from the patient’s perspective. In this article, we compare four different types of PRO in their ability to detect change over time in severe asthma following initiation of biologic therapy: a symptom questionnaire measuring asthma control, two quality of life questionnaires and a questionnaire of perceived change. A PRO is sensitive to change to the extent that the items of the questionnaire capture the changes that occur in a specific population for a specific treatment. 1 Each of the four types of questionnaire captures different kinds of change and therefore may not be equally sensitive to treatment.
Asthma symptoms are, in part, associated with increased airflow obstruction leading to a lower forced expiratory volume in 1 s (FEV1). The Asthma Control Questionnaire (ACQ-6) is a validated questionnaire that measures asthma control by the severity of asthma symptoms and shows a strong correlation with lung function.2,3 The ACQ-6 should therefore provide a sensitive measure of improvements in lung function produced by pharmacological treatment.
Quality of life (QoL) is affected by symptoms but also by other factors such as personality, 4 economic status 5 and social support. 6 Small changes in symptoms could create large changes in QoL and the relevant questionnaires, as long as the QoL questionnaire measures aspects of life that change following improved asthma control.
There are two types of quality of life questionnaire, disease specific and generic. As their names suggest, disease specific questionnaires measure only deficits experienced in a particular disease whereas generic questionnaires measure deficits across a range of diseases. The severe asthma questionnaire (SAQ) 7 is designed to measure the QoL deficits that occur in severe asthma and therefore may be more sensitive to change compared to a generic questionnaire such as the EQ5D, 8 which is known to poorly represent the deficits of QoL in severe asthma 9 and therefore should be less sensitive to change.
Measures of perceived change, such as global rating of change (GRoC), require people to evaluate how much they feel they have improved. These are seldom used as a primary outcome measures because of concerns about the accuracy of retrospective recall and other possible biases. 10 Retrospective recall of health events is known to be poor, including an underestimation of asthma symptoms, 11 and people create implicit narratives of their treatment that can create an over-estimation of the benefit of treatment. 10
The aim of this study is to compare the sensitivity to change as measured by effect size over several time points for four questionnaires: SAQ, ACQ-6, EQ5D and GRoC.
Methods
Design
An open label multi-centre sensitivity to change study, using the SAQ, other questionnaire measures and clinic data.
Participants
Participants were recruited from six UK specialist severe asthma centres (Royal Devon & Exeter, University Hospitals Plymouth NHS Trust, North Bristol NHS Trust, University Hospitals Birmingham, Royal Brompton and Harefield Hospitals, Manchester University NHS Foundation Trust) as part of usual care. All patients had undertaken a full multidisciplinary assessment according to UK severe asthma guidelines. 12 Inclusion criteria: patients with severe asthma (GINA step 4 & 5) commencing a biologic treatment in normal clinical care following NICE guidelines.
Participants were asked to complete questionnaires at baseline and at routine biologic administration clinic visits (0, 4, 8, 12 and 16 eeks).
Questionnaires
Severe Asthma Questionnaire: The SAQ is a validated specific scale for severe asthma (10) comprising 16 items measuring difficulties caused by asthma in 16 domains over a 2 week period. 7 Responses are given on a 7-point scale that are averaged to form the total SAQ score and three subscales, SAQ-MyLife, SAQ-MyMind, and SAQ-My Body. 13 In addition, the SAQ provides a SAQ-global score based on a single 100-point Borg-type scale. Higher scores indicate better quality of life. The minimum clinically important difference (MCID) is 0.5 for the SAQ and 11 for the SAQ-global.
Asthma Control Questionnaire-6: The ACQ-6 has six items, of which four measure symptoms, one measures short acting bronchodilator use and one activity limitations over a 7 day period. Responses are given on a 7-point scale that averaged to form the ACQ-6 score. Lower scores indicate better asthma control, and a score of 0–0.75 is classified as well controlled asthma. 2 The MCID is 0.5.
The Euroqol (EQ-5D-5L): The Euroqol is a generic health status questionnaire comprising five dimensions measured on a single day. 8 We used the index values of this scale rather than the average values, as index values are used in resourcing decisions. Index values were calculated using the 2012 value set for England. 14
Global rating of change Questionnaire: Patients were asked to rate the GRoC in terms of improvement by circling a statement ‘which best describes how you feel since starting your new treatment for your asthma’. The GRoC used in this study comprises 11 response options, no change, a little better, somewhat better, moderately better, a good deal better and a great deal better, with deterioration represented by the same quantifiers but using the word ‘worse’. 10
Procedure
After providing informed consent, patients completed baseline questionnaires (SAQ and ACQ-6) in the clinic before their first treatment. Patients completed questionnaires (SAQ, ACQ-6, GRoC) during subsequent routine clinic visits as part of usual care at 4 weeks, 8 weeks, 12 weeks and 16 weeks after starting treatment. Patients receiving benralizumab attended at 0, 4, 8 and 16 weeks and so did not attend clinic at 12 weeks due to the transition to an eight weekly dosing schedule. The EQ-5D-5L and FEV1 was assessed at baseline and at 16 weeks. Clinical records were used to provide demographic details. Exacerbations requiring OCS, healthcare utilisation and prednisolone dose (if on maintenance OCS) were documented at each clinic visit.
Ethical approval
This study received ethical approvals from the Research Ethics Committee/Health Research Authority (REC reference: 19/WA/0011, IRAS project ID: 250167) and was sponsored by University Hospitals Plymouth NHS Trust.
Statistics
We present the data for baseline, 4, 8 and 16 weeks only as the numbers attending week 12 were reduced as those receiving benralizumab did not attend. The COVID pandemic coincided with the start of data collection, so data at 16 weeks was reduced because many of the health staff who were collecting data were reassigned to COVID-related duties. For this reason, we present two analyses, one intention to treat and one per protocol, and we used t-tests to compare the per protocol patients with the intention to treat after removing the per protocol patients from the intention to treat sample. There is no imputation of missing data for the intention to treat sample: we report all valid data but only valid data. Although the sample size is reduced at later time points, the use of the two methods of analysis has the advantage of providing additional calculations of sensitivity to change.
Effect size was Cohen’s D calculated from the mean difference in score between baseline and follow-up divided by the pooled standard deviation, except in the case of GRoC where Cohen’s D was calculated from the mean score (i.e. the difference between the score and zero) divided by the standard deviation of the GRoC score. Cohen’s D provides a standardised measure of change between questionnaires and therefore permits comparison of questionnaires that have different scaling properties. Conventional interpretation of Cohen’s D is that an effect size of 0.2 is considered small, 0.5 is medium and 0.8 large. 15 We used Spearman’s ranked correlations.
Results
Patient characteristics, means (95% confidence intervals).
Means and standard deviations for questionnaires for the intention to treat group (n) and the per protocol group (n = 22).
ACQ: Asthma Control Questionnaire; SAQ: Severe Asthma Questionnaire; GRoC: global rating of change; ITT: intention to treat; PP: per protocol.
Effect sizes (Cohen’s D) for change in questionnaire scores at three follow-up time points for intention to treat and per protocol samples.
ACQ: Asthma Control Questionnaire; SAQ: Severe Asthma Questionnaire; GRoC: global rating of change; ITT: intention to treat; PP: per protocol.
Means and standard deviations for SAQ subscale scores for the intention to treat group (n) and the per protocol group (n = 22).
SAQ: Severe Asthma Questionnaire; ITT: intention to treat; PP: per protocol.
Effect sizes (Cohen’s D) for change in SAQ subscale scores at three follow-up time points.
SAQ: Severe Asthma Questionnaire; ITT: intention to treat; PP: per protocol.
Correlation coefficients (rho) between questionnaire scores at baseline and objective measures.
ACQ: Asthma Control Questionnaire; SAQ: Severe Asthma Questionnaire.
Discussion
The sensitivity of a questionnaire to change depends on the items of the questionnaire, on the treatment and on the population studied. 1 In this study, sensitivity was assessed at three time points. The dropout rate for this study was high due to the pandemic, and as dropout can be non-random, we provided two analysis, intention to treat and per protocol. Although there was no significant difference, there is a trend for those in the per protocol to have higher effect sizes compared to those in the intention to treat sample. For both the intention to treat and per protocol groups, the MCID for the SAQ, SAQ-global and ACQ-6 was achieved as early as week 4.
As we performed a per protocol analysis in addition to an intention to treat analysis, it is possible to compare the relative effect sizes of the questionnaires in different samples. Our results reveal that the relative effect size between questionnaires varies as a function of the population and as a function of time. Sensitivity to change is not an absolute property of a scale but something that varies over time and with the population. Previous comparisons of the sensitivity of quality of life questionnaires have provided comparisons at only one time point and with only one group.16,17 The variability of effect size as a function of group and time should be recognised when comparing between questionnaires.
Our results provide six comparisons of effect size. Some questionnaires are consistently better or worse than others. Some questionnaires are sometimes better or worse depending on which comparison is made.
The results show that that the relative sensitivity, as measured by effect size, of the different questionnaires changes to some extent between the six different comparisons, showing that comparisons of sensitivity drawn from only one group and at one time may not be generalisable. There is no evidence of important differences in sensitivity between the SAQ, SAQ-global and ACQ-6. The SAQ-global is slightly more sensitive on four of the six comparisons but, given the variability in relative effect sizes observed between these questionnaires, there is no clear evidence of difference between them.
The effect size of the EQ-5D-5L is consistently less than all the other questionnaires in all comparisons. The effect size of the EQ-5D-5L is about half that of the SAQ, SAQ-global and ACQ-6, and it is therefore safe to conclude that the EQ-5D-5L is the least sensitive of the questionnaires studied. The comparative insensitivity of the EQ-5D-5L is to be expected from a scale that only partially captures the QoL deficits of severe asthma on a single day, 8 and findings are also consistent with evidence that this scale is comparatively insensitive to rehabilitation. 18
The sensitivity of the EQ5D-VAS was slightly better than the EQ-5D-5L, but not as sensitive as other scales and, in particular, less sensitive than the SAQ-global. Although the EQ5D-VAS and SAQ-global are single item scales varying between 0 and 100, they differ in two respects. The EQ5D-VAS asks patients to rate health; the SAQ-global asks patients to rate quality of life specifically in relation to asthma. The EQ-5D-VAS is a scale with only the end points specified. The SAQ-global is a Borg scale with additional quantifiers placed at empirically derived points along the scale. Borg scales have been shown to be more reliable than visual analogue scales. 19 Our data shows that they may also be more sensitive.
The SAQ has three subscales made up from groupings of the 16 items that make up the SAQ. There was no consistent difference in sensitivity for these three subscales. The My Body subscale was the most sensitive of the three at all three times points for the per protocol group but not for the intention to treat subscale, where it was most sensitive only at week 8. These findings illustrate how small differences in sensitivity can arise from differences in population and time point, again showing that sensitivity is not an absolute property of a questionnaire, but relative to the population and time of assessment.
Although all three subscales of the SAQ are sensitive to the effects of treatment, only the My Life subscale correlates with baseline FEV1%. A larger data set has also shown that the correlation between FEV1% and the My Life subscale is larger than My Life and the My Body subscales. 13 As patients’ judgements of quality of life are determined by numerous factors (including dispositional mood, lung function, effects of treatment), these data suggest that the My Life and My Body subscales are more affected by factors other than lung function. Examination of other correlations with other PROs demonstrated no relationship between sensitivity to change and FEV1%. The conclusion from these data is that the use of QoL as an outcome variable provides different information from change in lung function and that both PROs and objective measures should be used in clinical trials as they provide different kinds of information about the effects of treatment.
Our study measured change in an open label study rather than comparative change between placebo and active treatment. We found that a GRoC had the highest sensitivity to change, but it does not follow that it is most sensitive in comparing placebo with active treatment. The reason is that retrospective measures are known to be affected by biases, including that of implicit theory. 20 When patients receive a new treatment, they form a narrative that the treatment is likely to be effective, and their response, for both placebo and treatment, is therefore affected by this narrative.21,22
The high sensitivity of the SAQ-global but not the EQ5D-VAS shows that single item scales can sometimes be highly sensitive. The SAQ-global is derived from an earlier scale, the Global Quality of Life Scale. 18 Multi-item scales require the patient to consider the components of quality of life. These components are then aggregated most commonly with no weighting, but sometimes as in the case of the EQ5D with weightings from people who are not ill. As a result, the components of the multi-item quality of life scale are not aggregated using the patient’s own individual utilities or weights. By contrast, the SAQ-global requires patients do this aggregation themselves, something that will take place fast, automatically and with implicit rather than conscious consideration of events in a person’s life. 23 Our findings indicate that this fast, implicit process can be no worse than that provided by longer questionnaires in detecting change.
Conclusion
The sensitivity of the ACQ-6, SAQ and SAQ-global to the effect of starting biologic treatment is similar. The EQ5D-VAS was less sensitive and the EQ-5D-5L was least sensitive to treatment. The comparative sensitivity of PROs varies slightly as a function of population and time of assessment, so comparisons with other data sets should be treated with caution.
Our study shows that in clinical practice, a rapid increase in quality of life can be detected by the SAQ and SAQ-global. The easy use of the SAQ-global makes it a suitable tool if time is limited. The EQ-5D should not be used in clinical practice to evaluate change in quality of life in severe asthma as it is comparatively insensitive to change, and the use of this questionnaire in economic decision making may lead to an underestimation of the benefit of biologic treatment.
Footnotes
Acknowledgements
This research was made possible by a non-promotional grant provided by GSK. JL is supported by the National Institute for Health Research (NIHR) Applied Research Collaboration (ARC) South West Peninsula. SF is supported by NIHR Manchester Biomedical Research Centre and the study supported by the NIHR Manchester Clinical Research Facility. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The work was supported by GlaxoSmithKline.
