Abstract
Background: Various patient reported outcome measures (PROMs) are used in idiopathic pulmonary fibrosis (IPF). We aimed to describe their psychometric properties, assess their relationship with 1-year mortality and determine their minimal clinically important differences (MCIDs). Methods: In a prospective multicentre study, participants with IPF completed the King’s Brief Interstitial Lung Disease Questionnaire (K-BILD), the modified Medical Research Council (mMRC) dyspnoea scale, St George’s Respiratory Questionnaire (SGRQ) and University of California, San Diego shortness of breath questionnaire (UCSD-SOBQ) three-monthly intervals over a 12-month period. Forced vital capacity (FVC) was matched with questionnaires and mortality was captured. Anchor- and distribution-based methods were used to derive MCID. Results: Data were available from 238 participants. All PROMs had good internal consistency and high degree of correlations with other tools (except UCSD-SOBQ correlated poorly with FVC). There were significant associations with mortality for K-BILD (hazard ratio 16.67; 95% CI 2.38–100) and SGRQ (hazard ratio 4.65; 95% CI 1.32–16.62) but not with the other PROMs or FVC. The median MCID (range) for K-BILD was 6.3 (4.1–7.0), SGRQ was 7.0 (3.8–9.6), mMRC was 0.4 (0.1–0.5) and UCSD-SOBQ was 9.6 (4.1–14.2). Conclusions: The K-BILD was related to other severity measures and had the strongest relationship with mortality.
Keywords
Introduction
Idiopathic pulmonary fibrosis (IPF) is a chronic progressive fibrotic lung disease with significant morbidity and mortality. 1 Assessing patient reported outcome measures (PROMs), including assessments of breathlessness and health-related quality of life (HRQOL), in people with IPF is important as they are outcomes of clinical trials 2 and part of clinical service specifications (www.nice.org.uk/CG163). However, as no universally accepted tool exists, different patient PROMs used.
The St George’s Respiratory Questionnaire (SGRQ) 3 and King’s Brief Interstitial Lung Disease Questionnaire (K-BILD) 4 are the most commonly employed tools in IPF. However, the SGRQ was developed for use in people with chronic obstructive pulmonary disease (COPD), for which its minimal important clinical difference (MCID) is between 7 and 10, 5 and although it relates to physiological impairment in IPF, an IPF-specific version (SGRQ-I) has been developed and validated. 6 In terms of assessing breathlessness in IPF, the University of California, San Diego shortness of breath questionnaire (UCSD-SOBQ) 7 and the modified Medical Research Council (mMRC) dyspnoea scale 8 are the most frequently used, both of which were also developed mostly in patients with COPD. The EuroQol five dimension five level (EQ-5D-5L) is a well-validated global health status instrument designed for use in clinical and health-economic trials. 9 Anxiety and depression are common in IPF and related to HRQOL 10 ; it can be assessed by the Hospital Anxiety and Depression Score (HADS).
Assessment of the psychometric properties of PROMs in people with IPF is important to understand their suitability for use in clinical research and practice. The psychometric properties of SGRQ in people with IPF have been reviewed 11 and recently evaluated from data from trials of nintedanib.12,13 Likewise, the validation of K-BILD 14 and SGRQ-I 15 has also been evaluated separately in a Danish cohort of 150 people with IPF. However, neither the HADS nor the mMRC have been validated in IPF. Our aim, in this large prospective multicentre UK-wide study, was to describe the internal consistency, construct validity and known group validity of the SGRQ, K-BILD, mMRC dyspnoea score, UCSD-SOBQ, HADS and EQ-5D-5L simultaneously so that their qualities can be contrasted. We aimed to determine their MCID and relationship to 1-year mortality.
Methods
This was an observational prospective longitudinal multicentre study, of seven specialist interstitial lung disease (ILD) centres throughout the UK (Bristol, Cambridge, Devon & Exeter, Newcastle, North Staffordshire, Norwich and Sheffield) over 12 months. Participants were identified by clinical staff and recruited by local researchers who administered the questionnaires. The questionnaires were self-completed unsupervised either in hospital reception rooms or at home. Data were collected between July 2014 and October 2016. The study was conducted according to the principles of Good Clinical Practice. It was approved by NRES Committee South West–Exeter (Reference number 14/SW0047) and registered on the clinicaltrials.gov database (Identifier NCT02176707).
Participants
Participants were identified by reviewing hospital databases, patient registries and medical notes. Participants were eligible for enrolment if they were aged more than 40 years and had IPF based on multidisciplinary consensus, according to contemporaneous guidelines, 16 and were able to provide written informed consent. Participants were excluded if they had a significant co-existing respiratory disease, medical or psychiatric condition exhibiting a clinically relevant effect on symptoms, HRQOL and disease progression as determined by the principal investigator. Participants with airflow obstruction defined as a ratio of forced expiratory volume in 1 s (FEV1) to forced vital volume (FVC) of less than 0.6 or a residual volume greater than 120% predicted were also excluded.2,17
Outcome measures
The following PROMs were presented in the order below, but the order of completion was not monitored. EQ-5D-5L 9 contains five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each dimension is answered on a five-level scale (1 no problems, five severe problems/unable to do). It was converted to utilities using standard UK health state valuations. K-BILD 4 describes health status during the past 2 weeks in people with ILD. It contains 15 questions evaluating three dimensions: psychological, breathlessness and activity, and chest symptoms on a seven-point Likert scale. Total score ranges from 0 (worst health status) to 100 (best health status). SGRQ 3 incorporates 50 items across three domains: symptoms, activity and impacts. The total score ranges from 0 (best health status) to 100 (worst health status). The SGRQ-I was generated from the SGRQ data. The mMRC 8 is commonly used to assess breathlessness, and response is classified into the five-point scale (0 dyspnoea with strenuous exercise, four too breathless to leave house or breathless when dressing). The UCSD-SOBQ 7 is a domain-specific tool assessing breathlessness associated with activities of daily living. It includes 24 questions, rating answers on a 0 (not at all breathless) to 5 (maximally breathless) point scale. The total scores range from 0 (lesser dyspnoea) to 120 (greater dyspnoea). HADS 18 contains seven questions for anxiety and seven questions for depression. Scores can be added to measure psychological distress. Patient Global Impression of Change (PGIC) is used to quantify perceived changes in health status over time scale. We used a 15-point scale in this study. 19
Spirometry 20 and a six-minute walk test (6MWT) 21 were performed unless undertaken for clinical reasons within the preceding 4 weeks. Additional physiological data were obtained from the medical notes, including static lung volumes and total lung diffusing capacity of carbon monoxide (DLCO), if these had been conducted as part of routine care within 4 weeks of questionnaire completion.
The visit schedule was aligned to routine care with follow-up measurements (PROMs and spirometry) collected approximately every 3 months. Participants unable or unwilling to attend for clinic visits were mailed the questionnaires after ensuring their vital status. Questionnaires were matched if they were completed within 14 days of each other. They were also matched with physiological data if they were within 28 days of each other. Questionnaires with no reliable date were not included in the analysis. Mortality data were obtained from serious adverse event logs. Deaths were verified by the principal investigator at each site, and vital status was checked prior to mailing PROMs but was not cross referenced to Office of National Statistics (ONS) data.
Data from all PROMs were captured even if the return date was not at 3-monthly intervals. All PROMs were electronically scanned using formic optical character recognition software (Formic Limited, Middlesex, Uxbridge, UK) and scored by the research team and analysed by the study statistician (AC).
Statistical analysis
A sample size of 200 has 99% power to detect a correlation of 0.3 at 5% significance for the convergent validity analysis, and a sample size of 150 provides 86% power to detect a difference of 0.5 of a standard deviation between the groups for the known groups analysis.
Internal consistency
Internal consistency was assessed using the Cronbach’s α coefficient at baseline, and the confidence interval was estimated based on a bootstrap with 500 replications. We considered a value greater than 0.7 to represent a homogenous scale. 22
Convergent validity
Convergent validity was evaluated by assessing the correlation between the different HQROL and breathlessness tools and physiological measurements including FVC % predicted and 6-min walk distance (6MWD) using Spearman correlation coefficients. Correlations were considered as weak if < 0.30, moderate, 0.3–0.6 and strong >0.60. We hypothesised that the correlation between K-BILD and SGRQ would be strong 4 and between K-BILD and FVC would be weak. 23 We expected there to be inverse relationships between PROMs with opposite directions and that low FVC related positively to poor health states identified by the PROMs.
Known-group validity
Known group validity was assessed by measuring the difference between the values, using a two sample t-test, for the first measurement obtained from people with FVC >75% and values from people with FVC ≤75%, that is, only one measurement was obtained for each person. 24
Relationship with 1-year mortality
The Kaplan–Meier curve was used to describe the survival time. A tertile analysis was undertaken to describe relationship between 1-year mortality and each PROM tool. Individuals were divided into three equal groups according to the baseline total score of each questionnaire: group 1 (better health status), group 2 and group 3 (worse health status). A comparison of survival between tertile groups was made. Cox’s proportional hazards model was used to estimate the hazard ratio for each variable separately with no adjustment for potential confounding factors. Analysis was repeated using FVC % predicted. Patients were divided into two groups based on lung function (FVC> 75% or FVC ≤75%).
Minimal clinically important difference
Anchor-based method was used to derive MCID of the PROMs. 25 To detect MCID within patients (longitudinal), patients were stratified into two groups based on difference in PGIC from the first visit to the second visit. 26 Data were restricted to those patients with visits less than 100 days apart. Participants were considered as unchanged if (−1, 0, 1) and minimally changed if (−2 and −3). Distribution-based methods were also used to derive estimates of the MCID. The standard error of measurement (SEM) method was used. The intraclass correlation coefficient for this method was estimated using data from participants who reported that their health did not change between visits. The 0.5 × SD method was also used based on the SD of the measurement at the start of the study.
Results
Participants
Characteristics of participants (n = 238).
The mean and standard deviation or percentage for the demographic variables. BMI: body mass index, COPD: chronic obstructive pulmonary disease, GORD: gastroesophageal reflux disease, HRCT: high resolution computerised tomography, IHD: ischaemic heart disease.
Descriptive statistics of variables observed over 12 months.
SGRQ: St George’s Respiratory Questionnaire, K-BILD: King’s Brief Interstitial Lung Disease Questionnaire, mMRC: Medical Research Council dyspnea score, HADS: hospital anxiety and depression scale, UCSD-SOB: University of California, San Diego shortness of breath questionnaire, EQ-5D-5 L: EuroQol 5 Dimension 5 Level, SGRQ-I: St George’s Respiratory Questionnaire–interstitial lung disease, FEV1: forced expiratory volume in 1 s, FVC: forced vital capacity, 6MWD: 6 min walk test distance.
Internal consistency
All questionnaires showed good internal consistency with Cronbach’s alpha coefficients of >0.8. The Cronbach’s alpha and 95% CI were EQ-5D: 0.86 (0.84, 0.87), HADS: 0.91 (0.90, 0.92), K-BILD: 0.95 (0.94, 0.95), SGRQ: 0.94 (0.93, 0.94) and UCSD-SOBQ: 0.98 (0.97, 0.98).
Convergent validity
Convergent validity.
Correlation coefficients for EQ-5D-5 L: EQ-5D: EuroQol five Dimension, FVC: forced vital capacity, HADS: hospital anxiety and depression scale, K-BILD: King’s Brief Interstitial Lung Disease Questionnaire, mMRC: Medical Research Council dyspnea score, SGRQ: St George’s Respiratory Questionnaire, SGRQ-I, St George’s Respiratory Questionnaire–interstitial lung disease, UCSD: University of California, San Diego shortness of breath questionnaire, 6MWD: 6 min walk test distance. Correlations were considered as weak if < 0.30, moderate, 0.3-0.6 and strong >0.60.
Known-group validity
Known group validity.
A comparison between those with a forced vital capacity (FVC) greater than or equal or less than 75% predicted for EQ-5D-5 L: EQ-5D: EuroQol five Dimension, HADS: hospital anxiety and depression scale, K-BILD: King’s Brief Interstitial Lung Disease Questionnaire, mMRC: Medical Research Council dyspnea score, SGRQ: St George’s Respiratory Questionnaire, SGRQ-I, St George’s Respiratory Questionnaire–interstitial lung disease, UCSD-SOBQ: University of California, San Diego shortness of breath questionnaire. All questionnaires other than UCSD-SOBQ were able to distinguish between subgroups. N: number of participants.
Relationship to 1-year mortality
There were a total of 26 (10.9%) deaths. The correlation between PGIC and PROMs was weak (r= 0.05–0.37). K-BILD showed a strong association with 1-year mortality. People with worse health status (in the lower tertile) had significantly increased risk of deaths than those with best health status (upper tertile) for K-BILD (HR 16.67; 95% CI 2.38–100), SGRQ (HR 4.65; 95% CI 1.32-16.62), EQ-5D-5 L (HR 3.70; 95% CI 1.23–11.11) but not for MRC (HR 1.72; 95% CI 0.58–5.12), HADS (HR 3.11; 95% CI 0.82–11.71), UCSD-SOBQ (HR 1.78; 95% CI 0.58–5.43) or SGRQ-I (HR 3.19; 95% CI 1.03–9.89) (Figure 1). There was no difference between the survival of those with an FVC higher or lower/equal to 75% predicted (HR 1.67; 95% CI 0.68–4.1). Survival curves for patient related outcome measures. Kaplan–Meier curves for survival (1-year mortality) for (a) SGRQ: St George’s Respiratory Questionnaire, (b) EQ5D: 5-Level Euroqol 5-dimension, (c) mMRC: modified Medical Research Council dyspnoea scale, (d) HADS: Hospital Anxiety and Depression Score, (e) K-BILD: King’s Brief Interstitial Lung Disease Questionnaire and (f) UCSD-SOBQ: University of California, San Diego shortness of breath questionnaire. Individuals were divided into three equal groups according to the baseline total score of each questionnaire: better, middle and worse health status. A comparison was made between tertile groups. Solid lines represent the better health status, dashed lines represent middle health status and dotted lines represent worse health status.
Minimal important clinical difference
The within patient minimal clinically important difference.
Minimal clinically important difference using anchor and distribution methods for EQ-5D-5 L: EQ-5D: EuroQol five dimension, FVC: forced vital capacity, HADS: hospital anxiety and depression scale, K-BILD: King’s Brief Interstitial Lung Disease Questionnaire, mMRC: Medical Research Council dyspnea score, SGRQ: St George’s Respiratory Questionnaire, SGRQ-I, St George’s Respiratory Questionnaire–interstitial lung disease, UCSD: University of California, San Diego shortness of breath questionnaire. SD: standard deviation, SEM: standard error of mean.
Discussion
Summary of main findings
This is the first study to report the psychometric properties, MCID and relationship with mortality of seven commonly used PROMs concurrently. We have shown that all questionnaires investigated in this study had good internal consistency and high degree of correlations with other tools, consistent with good convergent validity. However, the convergent validity between FVC and 6-min walk distance and the PROMs was poor suggesting that physiological measurements only capture a small component of HRQOL or breathlessness. The SGRQ, SGRQ-I, EQ-5D-5 L, mMRC, UCSD and K-BILD were able to discriminate individuals according to accepted physiological measure of disease severity in contrast to HADS suggesting that these reflect disease severity better. K-BILD, EQ-5D-5 L, SGRQ and SGRQ-I were associated with mortality at 1 year whereas mMRC, HADS, UCSD and FVC % predicted ≤75% were not. This suggests HRQOL is a major determinant of outcome; the multiple domains capture aspects of disease severity better than breathlessness or physiology alone. The median MCID (range) for K-BILD was 6.3 (4.1–7.0), SGRQ was 7.0 (3.8–9.6), mMRC was 0.4 (0.1–0.5), UCSD was 9.6 (4.1–14.2) and HADS was 2.4 (0.9–3.7).
Strengths and limitations
This was a prospective longitudinal study recruiting 250 patients varying in disease severity. We used a multicentre approach and wide entry criteria to reflect real world population. As a result, comorbidities were common in our population with half of patients having at least one comorbidity at a rate similar to the German registry. 27 We have not considered comorbidities and change in health status may have been due to factors, such as pulmonary rehabilitation or oxygen therapy that was not captured in our study. However, including individuals with comorbidities increases the generalisability of our findings.
Participants’ outcome measures were relatively static during study period and there was only a small number of individuals showed minimal change in PGIC (n = 36). We employed both anchor- and distribution-based methods as is recommended. 28 There was a large difference between the distribution-based MCIDs and the anchor-based MCIDs with the distribution-based methods giving significantly larger values, especially the 0.5 standard deviation (SD). However, 0.5 SD represents medium effect size, 29 giving greater values than the minimally important difference. For the standard error measurement (SEM) approach, we used threshold of one SEM which is the traditionally used value in assessing MCID. 28
The main limitation of this study is that there was large amount of missing data especially for the spirometry and 6MWT as many individuals agreed to complete the questionnaires as they were mailed to their home but declined to attend hospital for physiological tests. Furthermore, many participants did not complete questionnaires at pre-specified time intervals. We could not use supplemental oxygen status or DLCO % predicted in our analysis due to small number of patients receiving oxygen and small number of DLCO measurements. Additionally, only a small proportion of deaths occurred in our study population, which were not cross-referenced to national databases. Therefore, a careful approach needs to be taken when interpreting our results, and further study on the roles of PROMs in survival prediction is required. We might have achieved a stronger correlation between the physiological measurements and questionnaire data had we limited the analysis to a relevant questionnaire domain such as the activity domain of the SGRQ. The PGIC asked about change since the preceding visit, and therefore we were only able to report the MCID over a short timescale; there would have been more people with a change over a 1-year period, but there would have been more patient withdrawals and recall basis. The correlations between the PGIC and most of the PROMs assessed were weak and less than the recommended correlation coefficient >0.3 for determining MCIDs. Future studies should consider using alternative anchors such as the Patient Global Impression of Severity scale, which assesses severity of the condition in real time, as it is not susceptible to patients having to recall their previous health states.
This study was conducted shortly after National Institute for Health and Care Excellence (NICE) approval for pirfenidone and nintedanib, therefore, few people were receiving these treatments and we did not undertake subgroup analysis with those receiving anti-fibrotic therapy. Post hoc analysis of data from the INPULSIS trials showed that nintedanib treatment resulted in a less decline in PROMs than placebo treated group. 30 It is likely therefore that changes in PROMs will be different if patients are receiving anti-fibrotic medication.
Comparison with existing literature
The findings of our study are in keeping with a Danish IPF study which reported high internal consistency, good reliability and strong concurrent validity of the K-BILD 14 and the SGRQ-I. 15 The MCID of SGRQ reported here (4–10) was similar to that of people with COPD (7–10)5, 31 and connective tissue related ILD (4–13) 24 ,and the MCID of K-BILD (4–7) was similar to that of with ILD (4–7) 32 and IPF(3–5). 33 The Danish team 34 analysed the MCID for improvement separately from deterioration (SGRQ-I Total [4 and 5] and K-BILD Total [5 and 3]) suggesting that people perceive improvement and deterioration differently.
Other studies have supported the use of PROM in predicting mortality in IPF. The data from IPF-PRO Registry showed that worse health status on the SGRQ total score was associated with mortality or lung transplant over 1 year with stronger relationship than FVC% predicted. 35 The relationship between K-BILD and survival prediction in ILD has been previously explored and reported K-BILD total score of >34 was associated with longer estimated median survival (36.4 months) than K-BILD <34 (9.7 months, p = 0.02). 36
We did not find the UCSD-SOBQ to be associated with mortality at 1 year. However, recent study of the German registry 27 demonstrated greater change in UCSD-SOBQ scores from baseline (20.4 ± 1.5) in patients who died (n = 113) compared with survivors (7.0 ± 29.1). The HADS showed weak correlations with physiological measurements, which is not surprising as HADS aims to measure psychological distress. However, moderate to strong correlations between HADS and PROMs were observed, suggesting association between depression and anxiety and HRQOL. Our findings are supported by previous studies of people with ILD that reported associations between depression and UCSD-SOBQ 37 and mMRC. 38 However, depression (defined as HADS-D ≥ 8) and anxiety (defined as HADS-A ≥ 8) were not associated with survival rate or hospital admission rate. 39
Implications for future research or clinical practice
All the questionnaires used are related but capture different aspects of the disease. In clinical setting, mMRC dyspnoea scale rather than UCSD-SOBQ could be employed to assess breathlessness given its brevity and greater relationship to other measures of disease severity. Both the K-BILD and SGRQ were appropriate tools for evaluating HRQOL. In clinical setting, K-BILD might be more appropriate given its stronger relationship to mortality and its brevity. The SGRQ-I was not superior to the SGRQ in our study. Further studies should consider examining the psychometric properties of each domain to guide researchers to select PROMs most suitable to their needs.
Supplemental Material
sj-pdf-1-crd-10.1177_14799731211033925 – Supplemental Material for Psychometric properties of patient reported outcome measures in idiopathic pulmonary fibrosis
Supplemental Material, sj-pdf-1-crd-10.1177_14799731211033925 for Psychometric properties of patient reported outcome measures in idiopathic pulmonary fibrosis by Jee Whang Kim, Allan Clark, Surinder S Birring, Christopher Atkins, Moira Whyte and Andrew M Wilson in Chronic Respiratory Disease
Footnotes
Acknowledgements
The authors thank the following investigators for recruiting participants into the study Prof. Spiteri, Dr Steve Bianchi, Prof. Ann Millar, Dr Ian Forest, Dr Helen Parfrey and Dr Michael Gibbons.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: JWK has nothing to disclose. AC has nothing to disclose. SB reports payment to King’s College Hospital for use of K-BILD from Roche Pharma, Boehringer Ingelheim, and Galapogos outside the submitted work. CA received course fees and travel to attend conference/course; April 2018 (ERS ILD School, Heidelberg) and November 2019 (Learner to Leader Course) from Boehringer Ingelheim outside the submitted work. MW has nothing to disclose. AMW reports grants from Intermune during the conduct of the study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This projected was funded by an investigator-initiated grant by InterMune International Inc (now Roche Pharmaceuticals).
Supplementray material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
