Abstract
Background
The Multimorbidity Questionnaire (MMQ1) is a patient-reported outcome measure assessing quality of life (QoL) in people with multimorbidity. An English-language version was recently validated for use in the United Kingdom. This study examines: (1) Whether MMQ1 detects expected variations in QoL according to individual characteristics; (2) How MMQ1 compares with EQ-5D-5L in detecting such variations, and in discriminating between different levels of QoL.
Methods
A postal survey was distributed to 2753 patients with multimorbidity. Relationships between MMQ1 and EQ-5D-5L with six independent variables (long-term condition count, mental-physical multimorbidity, deprivation, self-rated QoL, age and sex) were examined using linear regression analyses. Discriminative ability was assessed using Receiver Operating Characteristic curves and sample size calculations with respect to consecutive classes of self-rated QoL.
Results
597 responses were received (22%). Respondents had a mean age of 69.5 years and 48% were men. Higher long-term condition count, the presence of mental-physical multimorbidity and increasing deprivation were associated with poorer QoL on both measures. In addition, three MMQ1 domains demonstrated age-related variations in QoL that were not detected using EQ-5D-5L. MMQ1 exhibited superior discriminative ability to EQ-5D-5L, especially in distinguishing between individuals with ‘Poor’ vs ‘Very Poor’ self-rated QoL, where EQ-5D-5L was particularly weak.
Conclusion
MMQ1 detected expected variations in QoL according to individual characteristics, supporting known-groups validity. It was superior to EQ-5D-5L in its ability to detect age-related variations in QoL and to discriminate between different levels of self-rated QoL. MMQ1 has the potential to improve the measurement of QoL in people with multimorbidity.
Introduction
There is a well-recognised need for patient reported outcome measures (PROMs) assessing quality of life (QoL) that are specifically tailored to people with multiple long-term conditions (also termed multimorbidity).1–4 In the absence of such bespoke PROMs, the measurement of QoL in this group has typically relied on generic utility measures such as EQ-5D-5L.5,6 Although EQ-5D-5L has well-established validity in many contexts, it was not designed for people with multimorbidity and has recognised psychometric limitations, including inadequate content validity.4,7,8 The reliance on generic QoL measures in multimorbidity research may be a contributory factor to the negative results of many multimorbidity intervention trials.5,6
To address this issue, the Multimorbidity Questionnaire (MMQ1) was recently developed and validated in Denmark. 9 Using a conceptual framework of needs-based QoL – which is widely used in other single-disease QoL PROMs10,11 – MMQ1 was designed with and for people with multimorbidity. It consists of 37 items divided into six separate scales: Physical Ability, Concerns and Worries, Limitations in Daily Life, Social Life, Personal Finances and Self-image. The original Danish MMQ1 has now been translated into English and adapted and validated for use in the United Kingdom (UK), demonstrating good psychometric properties including high content validity and high correlation with EQ-5D-5L. 12 By accounting for the complex and multifaceted impact of multimorbidity, MMQ1 aims to improve the measurement of QoL in people with multimorbidity compared with generic PROMs like EQ-5D-5L. 13 However, the comparative performance of these two measures is not known.
This study analysed data from the MMQ1 UK validation survey in order to address two objectives: 1) To examine whether MMQ1 is able to detect expected variations in QoL across groups based on health and sociodemographic characteristics, thereby demonstrating known-groups validity.
14
2) To compare MMQ1 with EQ-5D-5L in terms of (a) their respective ability to detect these variations across patient groups; and (b) their ability to discriminate between different levels of QoL as determined by a single-item global self-rating of QoL (Very Good, Good, Acceptable, Poor, Very Poor).
Methods
The methods for the cross-sectional validation survey have been published in full previously. 12 In summary, a postal survey including MMQ1 and EQ-5D-5L was sent to 2753 patients from eight primary care practices in the Lothian region of Scotland, UK. In participating practices, lists of registered patients were searched electronically to identify those on two or more long-term condition registers, or prescribed four or more repeat medications (a method used previously to identify patients with multimorbidity 15 ). Multimorbidity was defined as the presence of two or more long-term conditions, and mental-physical multimorbidity was defined as the coexistence of at least one mental illness with at least one physical condition. A total of 11,860 potentially eligible patients were identified across eight practices, from which a random sample of 2800 patients (350 per practice) was selected for screening by a general practitioner (GP) based at each practice. Patients deemed inappropriate for survey inclusion by their GP, such as those with dementia or approaching end-of-life, were removed from the list (47 in total), with 2753 patients invited to complete the survey by post between November and December 2023. Data collection stopped at the end of January 2024. No reminders were sent and no reimbursement provided.
Survey packs included a cover letter, a participant information sheet, the questionnaire and a pre-paid return envelope. The questionnaire contents included MMQ1, EQ-5D-5L, 16 two demographic items (age and sex), a single-item global self-rating of QoL, and a checklist of 17 common long-term conditions to assess multimorbidity,17,18 including free-text space to add conditions not listed (Supplemental Box 1). Each survey also included a code to identify the respondent’s GP practice. Practices were grouped according to whether they served areas of mainly low, mixed, or high deprivation, based on inspection of the distribution of Scottish Index of Multiple Deprivation for each practice’s registered population. 19
Analysis
Data was processed and analysed using R, version 4.4.1. EQ-5D-5L utility scores were calculated by converting the five item responses into a single summary value using a UK-specific value set, in accordance with guidance from the EuroQol group. 20 Scores ranged from −0.157 (the weighted score for a response pattern of level 5 on all 5 dimensions) to 1 (indicating no problems on all 5 dimensions), with scores of less than 0 representing health states valued as worse than death. MMQ1 scale scores were calculated for each of the six domains within the measure: Physical Activity (scale 0-18), Concerns and Worries (scale 0-18), Limitations in Daily Life (scale 0-30), Social Life (scale 0-18), Personal Finances (scale 0-9) and Self-image (scale 0-18). For the purpose of sensitivity analysis, a summed MMQ1 total score (scale 0-111), comprising the sum of the six scales, was also calculated for each respondent. As such, there were eight outcome scales included in the analyses for this study: EQ-5D-5L, the six MMQ1 domains, and MMQ1 total.
For MMQ1 total and its domains, a higher score indicates worse needs-based QoL, meaning the scale is in the opposite direction to that of EQ-5D-5L utility. For both measures, sociodemographic characteristics of non-completers were compared with those of completers, with no significant differences found (Supplemental Table). In line with previous psychometric evaluation of MMQ1, analysis was then conducted for complete responses only, which comprised 96-99% of responses for EQ-5D-5L and each MMQ1 domain.
Known-groups validity and association with health and sociodemographic characteristics
Based on existing literature,21–25 higher MMQ1 scores were expected to be associated with four known-group variables: increasing number of long-term conditions (LTCs), the presence of mental-physical multimorbidity, increased deprivation and poorer self-rated QoL. Significant differences were not expected for age or sex. To assess these hypotheses, firstly mean scores with 95% confidence intervals were calculated and plotted for all six independent variables (four known-group variables, plus age and sex). Secondly, simple linear regression models were fitted for all eight outcome scales, with each of the six independent variables as predictors (48 models in total). Effect sizes were reported using Cohen’s d for binary predictors (sex, mental-physical multimorbidity) and standardized β coefficients for continuous or ordinal predictors (age, LTC count, deprivation, self-rated QoL). Pearson correlation and R-squared were reported for each model. To aid interpretability, when reporting the output for EQ-5D-5L, the direction of effect was reversed to be consistent with MMQ1 (positive effect indicating worsening QoL). A p-value threshold of 0.01 was used to assess statistical significance, to account for multiple testing. If a relationship was demonstrated where it was not expected (i.e., for age or sex), this was explored using multiple linear regression to adjust for confounders.
Discriminative ability
Discriminative ability was assessed in two ways. Firstly, Receiver Operating Characteristic (ROC) curves were computed for each pair of consecutive categories of self-rated QoL (i.e. Very Good vs Good, Good vs Acceptable, Acceptable vs Poor, and Poor vs Very Poor). The area under the ROC curve (AUC) was calculated for each threshold, with values of 1.0 representing perfect classification, values >0.7 indicating good discriminative ability, and values of 0.5 indicating poor discrimination equivalent to random chance. An average ROC curve assessed overall discriminating ability across all categories of self-rated QoL. Secondly, a sample size calculation was performed to determine the number of individuals needed to distinguish between the consecutive response thresholds of self-rated QoL in a t-test with 5% significance and 80% power. A lower number of individuals needed to distinguish between consecutive response groups indicates a more discriminative scale. For sensitivity analysis, the discriminative ability of the EQ-5D-5L level sum score (the sum of responses without conversion into utility values) was also examined using both the ROC-AUC and sample size methods.
To further visualise discriminative ability, mean scores for each category of self-rated QoL were plotted along with the distribution of scores on EQ-5D-5L and MMQ1 total. Global self-rated QoL was used as the reference for assessing discriminative ability in order to maintain consistency with the approach used in the evaluation of the original Danish MMQ1 by Bissenbakker et al, 9 and because it provides a useful basis for future researchers to determine a minimally important difference.
Results
Characteristics of respondents.
LTC = Long-term condition.
aPatients who self-reported 0-1 conditions were included in the analysis on the basis of their GP electronic record.
Known-groups validity and association with health and sociodemographic characteristics
Figure 1 shows the variations in mean scores with increasing LTC count for each scale, demonstrating a clear dose-response relationship between increasing number of LTCs and worsening QoL. Expected differences were also demonstrated in figures for mental-physical multimorbidity, deprivation and self-rated QoL (Supplemental Figures 1-3). Relationship between mean outcome scores and number of self-reported long-term conditions (LTC). Note: Scale direction for EQ-5D-5L is opposite to that of MMQ1 and its subscales. Higher scores for MMQ1 indicate worse qulity of life, while higher scores on EQ-5D-5L indicate better quality of life. All MMQ scales have been standardised by diving mean score by number of items.
Effect sizes, p-value, correlation (r) and R2 output from 48 simple linear regression models.
Each cell contains the model effect size with 95% confidence interval, p-value, correlation coefficient (r) and R2. Effect sizes: Cohen’s d for binary predictors (sex, mental-physical multimorbidity), Standardized β for continuous predictors (age, LTC count, deprivation). LTC = long-term condition.
Direction for effect for EQ-5D-5L corrected to match MMQ1.
Green predictor variables on the left are those where an associated was expected, whereas no differences were expected for blue predictor variables.
Multiple linear regression model output showing adjusted effects for age across eight outcome scales.
All predictors included simultaneously. Values show standardized β coefficients from the adjusted model, with 95% confidence intervals in brackets. * p<0.01.
LTC = long-term condition.
Discriminative ability
Discriminative ability examined using i) area under the curve (AUC) for receiver operative characteristic curves, and ii) sample size (SS) calculations, for each pair of consecutive categories of self-rated QoL.
AUC: Area under the curve from the receiver operative characteristic (ROC) curves for consecutive categories of self-rated QoL. ROC curves are shown in Supplemental Figure 5. Higher values indicate better discriminate ability, with values above 0.7 considered good (1.0 = perfect classification, 0.5 = equivalent to random guessing).
SS: Sample size needed to discriminate between consecutive categories of self-rate QoL with 5% significance and 80% power. Lower values indicate more discriminating scales.
Rank gives the positional order (from 1-9) of each scale based on AUC and SS mean performance, in order to aid interpretation. Note that for AUC, higher values are better, whereas for SS, lower values are better.
At the level of the individual thresholds (pairs of consecutive categories of self-rated QoL), MMQ1 showed particularly strong discriminative ability at the Good vs Acceptable threshold (MMQ1 Total: AUC = 0.81, sample size = 24). Five of the six MMQ1 domains had an AUC >0.75, and sample size <50 at this threshold. EQ-5D-5L showed particularly weak discriminative ability at the Poor vs Very Poor threshold (AUC = 0.52, sample size = 7554). EQ-5D-5L was most discriminating at the Acceptable vs Poor threshold, with an AUC similar to that of MMQ1 Total (MMQ1 Total AUC = 0.78, EQ-5D-5L AUC = 0.77). However, all of the MMQ1 domain scales also demonstrated good discriminative ability at this threshold (AUC range = 0.69 to 0.77, sample size range = 34 to 82).
A further visual comparison of discriminative ability between MMQ1 and EQ-5D-5L is given in Figure 2, which shows the distribution of MMQ1 total and EQ-5D-5L utility scores for all 597 respondents, along with the position of the mean values for the five categories of self-rated QoL. While the mean values for MMQ1 total in these five groups are approximately evenly distributed along the scale, the five mean values for EQ-5D-5L are poorly separated, particularly between Poor and Very Poor, emphasising the weak discriminative ability of EQ-5D-5L at this threshold. Mean scores for five QoL categories across the distribution of responses for (A) MMQ1 and (B) EQ-5D-5L. Distribution of MMQ total and EQ-5D-5L scores, grouped by overall QoL rating (colours). Vertical dashed lines indicate the mean for each group. Poorly separated lines indicate weak discriminative ability. Note: Scale direction for EQ-5D-5L is opposite to that of MMQ1 and its scales. Higher scores for MMQ1 indicate worse quality of life, while higher scores on EQ-5D-5L indicate better quality of life.
Discussion
In this analysis of data from a cross-sectional validation survey in Scotland, MMQ1 demonstrated expected associations with number of long-term conditions, presence of mental-physical multimorbidity, deprivation and global self-rated QoL, thereby supporting the known-groups validity of this novel instrument. 14 The strongest association was for self-rated QoL, providing evidence of robust convergent validity for MMQ1. Across the other known-group variables, effect sizes were highest for mental-physical multimorbidity, indicating the substantial impact of mental health conditions on the burden of multimorbidity, particularly on the Self-image domain of QoL. More modest yet consistent effect sizes were seen for LTC count, with results indicating that for every one standard deviation increase in the number of LTCs, there was a 0.23-0.36 standard deviation increase in scale scores. Smaller effect sizes were seen with deprivation but given that this was measured using only three broad categories (low, mixed, high), these effects are still noteworthy, suggesting that MMQ is sensitive to socioeconomic gradients even when measured coarsely.
In addition, three MMQ1 scales (Concerns and Worries, Self-image, and Personal Finances) demonstrated unexpected and small but statistically significant age-related associations, suggesting that the negative impact of multimorbidity on these domains of QoL is greater in younger adults. These age-related differences were not detected using EQ-5D-5L, suggesting that the six-scale structure and needs-based conceptual framework of MMQ1 leads to greater sensitivity in capturing variations in QoL in people with multimorbidity than EQ-5D-5L.
Assessment of discriminative ability further supports this conclusion, finding that MMQ1 was generally superior to EQ-5D-5L in distinguishing between consecutive categories of self-rated QoL, and in particular between Poor and Very Poor, where EQ-5D-5L demonstrated especially weak discriminative ability. The certainty of these findings is strengthened by the concordance of results from the ROC-AUC and sample size methods.
The known-groups validity assessment in this study was underpinned by existing literature demonstrating a consistent association between multimorbidity and poorer quality of life, and in particular, associations between lower QoL and increasing number of LTCs, the presence of mental health conditions and increased deprivation are well established.21–25 Indeed, Jørgensen et al. recently demonstrated an almost linear dose-response relationship between number of conditions and MMQ1 scale scores using the original Danish version of this measure. 26 The relationship between age and QoL found in this study, while unexpected in the context of the known-group assessment, is supported by other studies which have reported a greater impact on QoL from multimorbidity in younger adults.21–25,27,28
Underlying differences between MMQ1 and EQ-5D-5L
There are several key differences between MMQ1 and EQ-5D-5L which underpin the comparative findings of this study. Firstly, MMQ1 has 37 items, compared with just five in EQ-5D-5L, the former allowing assessment of QoL across a wider spectrum and with greater granularity. While the five items in EQ-5D-5L represent distinct dimensions of QoL, this instrument is typically used to provide a single utility score or ‘profile’. By comparison, the 37 items in MMQ1 constitute six separate validated scales, providing six separate scores. This structure enhances MMQ1’s ability to detect subtle variations across different domains of QoL, which could be missed when using only one sum score – something evident in the age-related differences observed in this study.
Secondly, the conceptual and measurement frameworks of the two PROMs differ. MMQ1 is based on a reflective model, where the items within a scale reflect an underlying construct, whereas EQ-5D-5L is a composite measure, with its five dimensions combining to define the health state. This means that the measures have different theoretical relationships between observed items and the underlying construct. Furthermore, EQ-5D-5L measures health-related QoL, while MMQ1 measures needs-based QoL. The latter is a theory-driven, broader conceptualisation of QoL, which has an explicitly temporal, adaptive quality, and is rooted in the idea that QoL is determined by the extent to which an individual is able to meet their own needs and goals – something that changes over the life course. 27 This conceptual difference is particularly relevant to the fact that MMQ1 detected age-related variations where EQ-5D-5L did not. The fact that younger people with multimorbidity reported poorer QoL in particular domains of MMQ1 (notably those reflecting more psychosocial than physical dimensions of QoL), may be due to differences in expectations or perceived norms among younger adults, something that is not accounted for in the narrower, health-related conceptual framework of EQ-5D-5L. While a detailed examination of the pattern and implications of these age-related variations in QoL was beyond the scope of this paper, these findings highlight an important avenue for future research.
A third area of difference is in design and purpose. MMQ1 is a bespoke multimorbidity PROM, meaning that its items have been specifically tailored to address aspects of QoL deemed important by people with multimorbidity, and designed for use in multimorbidity intervention trials and observational studies. By contrast, EQ-5D-5L is a generic PROM (designed to be applicable to a range of health conditions and contexts), and a multi-attribute utility instrument (meaning that it can used to estimate utilities based on population preferences to calculate quality adjusted life years). As such, the two measures serve somewhat different purposes, and MMQ1 could not replace EQ-5D-5L in contexts where utility estimates or economic analyses are required. However, the head-to-head comparison provided by this study would be of interest to researchers who intend to measure quality of life as an outcome in people with multimorbidity, where previously only generic measures were available.
These differences between EQ-5D-5L and MMQ1 likely underpin their comparative performance in this study, including the marked superiority of MMQ1 in discriminating between those with Poor vs Very Poor global self-rated QoL rating. This particular finding suggests that trials targeting individuals with ‘complex multimorbidity’ (such as that involving both mental and physical conditions 29 ) may be especially better served using MMQ1 over EQ-5D-5L, given that lower baseline QoL would be expected, and therefore any improvement may often be in the Poor vs Very Poor QoL range. This finding also raises the hypothetical question of whether previous multimorbidity trials, such as the 3D study, 6 might have demonstrated improvements in QoL had MMQ1 been available to use as an outcome measure. That said, the choice of measurement instruments for any given context also depends on several wider factors including the design and purpose of the study, the mode of administration and need for economic analysis.
Strengths and limitations
In demonstrating known-groups validity, this study makes an important contribution to the psychometric validation of MMQ1. The use of regression analyses provided a robust assessment of known-group validity. The direct comparison between MMQ1 and EQ-5D-5L in terms of (a) variations across sociodemographic groups, and (b) discriminative ability, also provides insight into the comparative performance of these two instruments. Limitations include that the survey response rate in this study was relatively low (22%) and representativeness could not be assessed, although this would apply equally to MMQ1 and EQ-5D-5L. The paper-based mode of administration, which requires a certain level of function from respondents, may have introduced selection bias. This limitation would also apply to both measures, although the comparison does not account for the fact that response rates (and representativeness) may have differed had the two measures been distributed independently, given that EQ-5D-5L is considerably shorter and therefore has a lower respondent burden. The survey pack did not include the EQ-Visual Analogue Scale and the comparison focussed on the composite profile score of EQ-5D-5L, rather than individual dimensions. While this is consistent with how the instrument is most commonly reported, it does mean that there were important conceptual differences between the instruments being compared, as discussed above.
The analyses were intended to compare the psychometric properties of MMQ1 and EQ-5D-5L, rather than to make generalisable conclusions regarding QoL in people with multimorbidity. That said, the findings of this study are consistent with existing literature. For the purpose of comparison with EQ-5D-5L, this study included MMQ1 total scores as a crude aggregate of its six scales, but it is important to note that the intended use of MMQ1 is as six separate scales, used concurrently. In the context of a trial, this structure entails a more complicated approach to sample size calculation. However, the feasibility of using MMQ1 in a trial has been successfully demonstrated by the recent MM600 trial in Denmark, which used MMQ1 as its primary outcome measure and obtained a response rate of 22% from a study population of almost 160,000 patients.26,30
Finally, this was a cross-sectional study, assessing discrimination and known-group validity based on a single measurement timepoint. Repeated, longitudinal assessment would enable more robust psychometric evaluation of MMQ1 and comparison with EQ-5D-5L, in particular of responsiveness, which is arguably the single most important measurement property for intervention studies.
Conclusion
This study supports the use of MMQ1 as a robust measure of QoL in people with multimorbidity. In each of its six scales, it effectively demonstrated expected associations with number of LTCs, presence of mental-physical multimorbidity, deprivation and self-rated QoL. Moreover, it demonstrated superior performance to EQ-5D-5L in detecting age-related differences in QoL, and in discriminating between different levels of self-rated QoL. As such it has the potential to improve the measurement of QoL in people with multimorbidity, including clinical trials, prognostic studies and quality assessment. Future research examining the responsiveness of MMQ1 to changes in health status over time, such as post-interventions, would further substantiate its validity.
Supplemental Material
Supplemental Material - Variations in quality of life in people with multimorbidity: A cross-sectional survey comparing MMQ1 and EQ-5D-5L
Supplemental Material for Variations in quality of life in people with multimorbidity: A cross-sectional survey comparing MMQ1 and EQ-5D-5L by Kieran D. Sweeney, Dilek Coskun, Volkert Siersma, Lucy E. Stirland, Bruce Guthrie, Stewart W. Mercer, John Brodersen in Journal of Multimorbidity and Comorbidity.
Footnotes
Ethical considerations
North East (Newcastle & North Tyneside 2) Research Ethics Committee. Reference 23/NE/0053.
Consent to participate
Informed consent was obtained from all individual participants included in the study.
Funding
The primary study was funded by the Royal College of General Practitioners Scientific Foundation Board (SFB 2022-14), with this analysis supported by funding for Dr Kieran Sweeney from the Wellcome Trust Multimorbidity PhD Programme for Health Professionals (223499/Z/21/Z).
Declaration of conflicting interests
The authors declare no competing interests.
Data Availability Statement
Consent was not obtained for making survey data publicly available.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
