Abstract
Highlights
Advocates argue that some interventions, including but not limited to end-of-life (EOL) care, are valued by patients and the public but are systematically disadvantaged by the quality-adjusted life-year (QALY) framework, leading to an unfair and inefficient allocation of health care resources.
Using a discrete choice experiment, we find some support for this argument. Only a small proportion of public respondents prioritized survival in EOL scenarios, and most prioritized nonhealth aspects such as dignity and family relations.
Together, these results suggest that the QALY may be a poor measure of the value of EOL care, as it neglects nonhealth aspects of quality and well-being that appear to be important to people in hypothetical EOL scenarios.
This is a visual representation of the abstract.
Competing health care interventions are increasingly prioritized on the basis of relative cost-effectiveness, or the additional cost per unit of health benefit. This benefit is often measured in terms of quality-adjusted life years (QALYs) gained, which combine changes in quality of life and years of life into a single index measure. Any improvement in quality or survival, jointly or independently, is associated with a proportional increase in QALYs. Interventions that generate a greater number of QALYs for a given cost, or equivalently, have a lower cost per QALY gained, have greater priority for scarce funding. This evaluative framework has become known as “QALY maximization.” 1
Advocates of end-of-life care, though, argue that this framework unfairly favors curative interventions.2,3 They note that improvements in quality at the end of life are, by definition, of very limited duration, and therefore, the QALY gains from such improvements will always be smaller than similar quality improvements to patients with a longer life expectancy. Indeed, as extending survival is not typically the primary objective of end-of-life care, 4 some suggest it is inappropriate to consider a time element in its evaluation. 5 At the same time, advocates object to the principle of strict “additivity,” or the constant value of different time periods within the QALY maximization framework. 6 Additivity holds that the value of a year of life is determined solely by the health-related quality of that year. Advocates, however, argue that some periods of time may be more valuable than others. More valuable periods may include milestone moments such as weddings or birthdays or, in this context, the time before death: as an individual knowingly approaches the end of their life, the value of each year, month, or day may become increasingly greater, regardless of the health-related quality of that time. 7 The assumption of additivity, similar to the assumption of constant proportional time tradeoff, 8 excludes the possibility of an increasing value of time at the end of life. In this context, the conventional QALY framework may underestimate the value of small survival gains, although 2 systematic reviews suggest that any such bias is minimal.9,10
Advocates also argue that the QALY, as currently constructed, neglects nonhealth dimensions of end-of-life care.3,11 Whereas the dimensions of the widely used EQ-5D instrument, which include pain/discomfort, anxiety/depression, mobility, self-care, and usual activities, may be suitable for acute or chronic health states, these may be less relevant in the context of end of life. In an end-of-life context, aspects such as dignity, spiritual and psychosocial well-being, and bereavement support may be more relevant. 3 Using an instrument insensitive to changes in the quality of end-of-life health states risks undervaluing improvements in those states.
Taken together, these arguments—sometimes referred to as the “QALY problem” 2 —suggest that the QALY maximization framework may be systematically discriminatory. They suggest that there is a subset of health interventions, including but not limited to end-of-life care, that is highly valued by patients and the public but is unable to demonstrate value within the conventional QALY maximization framework.5,6,12 Such a bias could lead to an unfair and societally inefficient allocation of health care resources.
In 2009, the United Kingdom’s (UK) National Institute for Health and Clinical Excellence (NICE) acknowledged in its advice for the appraisal of life-extending end-of-life treatments 13 that some of the benefits provided by these treatments are not, or not sufficiently, captured by the QALY-based reference case. Notwithstanding this recognized limitation, the QALY has been used as the primary measure of value in a number of economic evaluations and NICE guidance documents in these settings. This is particularly true for evaluations of palliative treatments in metastatic cancer,14–17 but the QALY has also been used in valuing more traditional palliative care services. 18 Equally problematically, however, a systematic review 19 suggested that concerns about the limitations of the QALY in this context have led most studies in this area to avoid the QALY and limit themselves to simple cost comparisons or cost-consequence analyses, where costs and palliative-specific outcome measures are presented separately but not combined into a summary measure. This makes it difficult to assess value in cases in which palliative care may not be cost-saving or to compare the relative efficiency of palliative care with other interventions.
To test some of the specific arguments that make up the QALY problem around end-of-life care, we conducted a stated preference elicitation to understand public preferences for different end-of-life care scenarios, focusing on the relative importance of survival, conventional health-related quality of life (HRQOL) dimensions (especially physical symptoms and anxiety), and less-conventional dimensions such as family relations, dignity, and sense of control. Our primary objective was to test whether respondents give relatively more importance to the non-HRQOL dimensions than the conventional health-related dimensions in EOL scenarios. As secondary objectives, we administered different versions of the questionnaire to test the impact of different elicitation methods and disease contexts on respondents’ choices. We describe these different versions in more detail below. We did not seek to address the issue of whether QALY gains at the end of life are valued more highly than QALY gains at other points in life.
Methods
We developed an online survey with 2 parts: a discrete choice experiment (DCE) and a best-worst scaling (BWS) exercise. For our primary objective, we report on the methods and results of the DCE, as this method is more common and better understood in health economics. A comparison of the DCE and less common BWS is a secondary objective of the study and will be described elsewhere.
DCEs are a quantitative approach to eliciting individuals’ preferences over different scenarios. Respondents are asked to choose their most preferred option from a choice set of 2 or more alternatives described in terms of a common set of attributes and differing attribute levels. By presenting each respondent a series of choice tasks, it is possible to estimate the relative desirability, or utility, of different attribute levels and the willingness to trade off between these attributes to achieve the greatest overall utility. DCEs have previously been applied in the context of end-of-life care20,21 as well as in other health care contexts.22,23
DCE Design
The attributes included in the DCE were derived from a targeted review of qualitative studies of attitudes toward different aspects of end-of-life care. We identified 35 studies that described 33 distinct concepts. Two investigators (C.S., A.C.) independently combined these concepts into broader themes and resolved discrepancies via discussion. In this manner, we identified 5 broad themes of concern: 1) control of physical symptoms, 2) sense of fear or anxiety, 3) sense of control, 4) sense of dignity, and 5) good relations with family/friends. In addition to these qualitative attributes, we included survival gain to understand preferences relative to conventional life-year or QALY gains. Our categorization of the individual concepts into themes was independently reviewed and confirmed by a palliative care physician and a palliative care coordinator. A summary of this review and categorization is included in our Supplemental Materials.
Each theme was included in the DCE as an attribute and was assigned 4 levels. Survival gains were coded as 0, 2, 4 or 6 mo to be plausible in the context of end-of-life, and the other attributes were coded on a qualitative scale of “never,” “sometimes,” “often,” and “always.”
We used SAS macros 24 to develop a D-efficient fractional factorial experimental design from the full factorial set of scenarios, assuming noninformative priors and a main effects model. This process produced a 64-set paired design. A sample DCE task is shown in Box 1.
Sample Discrete Choice Experiment Task
Questionnaire Design
Each respondent was presented 6 DCE tasks and 4 BWS tasks for a total of 10 choice tasks, “dynamically” selected from the D-efficient experimental design. Under this dynamic approach, each respondent saw the 10 choice tasks with the fewest number of completed responses to that point in the data collection. Dynamic selection ensures that each task in the experimental design is seen a roughly equal number of times. Respondents were also asked demographic details on their gender, age group, and occupation.
The questionnaire, the Participant Information Sheet, and our statistical analysis plan were reviewed and approved by the University of East Anglia Faculty of Medicine and Health Science Ethics Committee, Norwich UK (reference 2015/2016-95).
A small online pilot based on a convenience sample within the University of East Anglia was administered prior to the main survey to identify any issues in the length or comprehensibility of the questionnaire. We included 5-point Likert scales and asked respondents to rate the perceived length and difficulty of the questionnaire. We also included a free-text field for respondent feedback. These scales and free-text fields were not included in the main questionnaire. On the basis of these results, as well as feedback from a palliative care coordinator, we made minor changes to the wording and presentation of the questionnaire.
Survey Administration
An age-gender representative sample of the UK population was recruited through a survey panel company (Dynata). Individuals who had previously registered with Dynata received an email inviting them to learn more about the study. An accompanying link took them to an online participant information sheet that outlined the purpose of the study and provided a link to the questionnaire.
Respondents were randomly assigned to 1 of 4 versions of the questionnaire: DCE or BWS tasks presented first and a “generic” or “cancer” version of the questionnaire. In the generic version, the introduction to the questionnaire explained that respondents should imagine an end-of-life scenario in which they have no more than 12 mo to live but did not specify the cause of this scenario. In the cancer version, respondents were given the same information but told that they should imagine that they had no more than 12 mo to live due to a diagnosis of terminal cancer. The generic and cancer versions were intended to test whether end-of-life preferences differed between cancer and other causes, particularly as there is some suggestion that cancer is a “dreaded disease” and therefore viewed differently than other conditions. 25
Statistical Analysis
Prior to modeling DCE preferences, we tested for nontrading or dominant preferences. Respondents with a dominant preference always choose the alternative that maximizes or minimizes the level of a particular attribute, such as survival, without regard to the level of other attributes. Strictly dominant preferences are inconsistent with the theory of compensatory decision making that underlies DCE methods, and an excessive proportion of dominant preferences may invalidate a DCE. However, such preferences are not irrational, and they are almost impossible to definitively identify in a fractional factorial design where respondents see only a subset of all possible attribute-level combinations.26,27 As such, we report the proportion of respondents with potentially dominant preferences but do not exclude these respondents from the analysis.
Survival was included in the analysis as a continuous variable, and all other attributes were effects coded to allow for nonlinear preferences over the levels of the different attributes. The “never” level of each attribute was used as the reference level. For all attributes except anxiety, the DCE tasks were phrased such that “always” was expected to be preferred to “never,” whereas for anxiety, “never” was expected to be preferred to “always.” In the statistical analysis, the sign on the anxiety coefficient was reversed to make comparisons with the other attributes more straightforward.
In the first instance, we used a multinomial logit (MNL) model to estimate part-worth utilities for all respondents. We also tested the impact of seeing the DCE tasks before the BWS tasks and the impact of seeing the cancer version of the questionnaire, using separate MNL models with an interaction term to test separately the 2 impacts.
Subsequently, we used a latent class multinomial logit (LC-MNL) model to allow for preference heterogeneity in responses. A simple MNL assumes that preferences are homogeneous across all individuals. 28 However, if unobserved factors influence the choices made by an individual, particularly as a result of random taste variation or unobserved heterogeneity, treating individual responses as independent observations can lead to biased regression estimates. 29 LC-MNL assumes that there are 2 or more classes of respondents who share unobserved (latent) tastes or characteristics that affect their choices. Critically, preferences are assumed to differ between classes but to be homogeneous within classes.30,31
We compared the goodness-of-fit of LC-MNL models with between 2 and 5 classes using the Bayesian information criterion (BIC). Based on the preferred model, we estimated part-worth utilities associated with changes in the levels of each attribute. The relative importance of each attribute was estimated as the absolute difference in utility between the most preferred and the least preferred levels of a particular attribute as a proportion of the sum of utility differences across all attributes. Under this approach, attributes with a greater absolute difference are considered relatively more important than attributes with a smaller absolute difference, conditional on the range of levels assigned to each attribute. We did not calculate marginal rates of substitution between survival and the other attributes, as using survival as a numeraire would preclude a direct understanding of its importance relative to the other attributes.
We probabilistically assigned each respondent to a specific latent class on the basis of predicted individual class shares. For each individual, we generated a random value between 0 and 1 and assigned them to the corresponding latent class on the basis of cumulative class shares (e.g., if an individual in a 2-class model had class shares of 40% and 60% for class 1 and class 2, respectively, we would assign that individual to class 1 if they had a random value ≤0.4 and otherwise to class 2). We used analysis of variance (ANOVA) to identify significant associations between the assigned latent class and respondent demographics and the questionnaire version (DCE/BWS first and generic/cancer version).
Analyses were conducted with R statistical software, version 4.1.0. 32 The MLOGIT 33 and GMNL 34 packages were used for the analysis of DCE responses, and GGPLOT2 35 was used to produce the figures.
Results
Eighteen respondents participated in the pilot survey. Ratings of length and difficulty did not indicate concerns, but we made minor changes to the introduction and task descriptions in response to free-text comments.
A total of 3,010 general population respondents completed the main questionnaire in October 2017. The age-gender distribution of the overall sample and the 2021 UK population are illustrated in Figure 1. The age and gender distributions were broadly representative of the UK population, although the 25- to 34-y age group was overrepresented in the sample relative to their population proportions, and the 55+-y age group was slightly underrepresented. The age-gender distribution by questionnaire version is illustrated in the online supplement. The largest socioeconomic class among all respondents was C1 (senior administrative/clerical) at 30%, followed by class B (intermediate managerial) at 23%. Of the respondents, 8% reported class A (senior managerial/professional) and 16% reported class E (unemployed/retired). The mean completion time for the 6 DCE tasks was 3:01 min.

UK and sample (N = 3,010) age-gender distribution.
The MNL model testing the impact of seeing the DCE tasks before the BWS tasks indicated a statistically significantly weaker preference for full symptom control and a significantly stronger preference for greater survival. The only other statistically significant attribute levels were a weaker preference for “always have a sense of control” and a stronger preference for “always have a sense of dignity.” Respondents seeing the cancer version of the questionnaire had fewer statistically significant differences. Most notably, they had a slight but statistically significantly stronger preference for greater survival. The coefficients from these models, as well as a simple pooled MNL, are provided in the online supplement. Overall, we judged that these differences by order of the tasks or version of the questionnaire were relatively minor and did not preclude a pooled analysis of responses.
A 4-class LC-MNL had a better Akaike information criterion than the pooled MNL model and the best fit by BIC across 1- to 5-class LC-MNLs we tested. The coefficients of this 4-class LC model are shown in Table 1 and illustrated in Figure 2 by attribute and latent class.
LC4-MNL Model Coefficients a
LC4-MNL, 4-class latent class multinomial logit model; SE, standard error.aPhysical indicates control of physical symptoms; anxiety: sense of anxiety; control: sense of control; dignity: sense of dignity; family: good relations with friends and family; survival: additional survival; reference level: an acceptable level of the attribute none of the time; .Some: an acceptable level of the attribute some of the time; .Most, an acceptable level of the attribute most of the time; .All, an acceptable level of the attribute all of the time.

Part-worth utilities by attribute and latent class.
The direction of preference was generally consistent across all latent classes, but the strength of preferences for levels within an attribute differed between classes, including some small and statistically insignificant preference reversals between the intermediate levels of some attributes. There were statistically significant differences in the strength of preferences over levels of anxiety, family relations, physical symptoms, and, most notably, survival gains. The part-worth utility of a 6-mo survival gain in LC4 (5.39) was more than 10 times greater than the utility in the next highest class (0.49; LC3), whereas the utility of survival gains was negative in classes 1 and 2.
Attribute relative importance, illustrated in Figure 3, demonstrates the differences between latent classes. Whereas the 13% of respondents probabilistically assigned to class 4 (LC4) strongly valued survival gains over other aspects, this attribute was much less important to most respondents, represented by the other classes. Individuals in LC3 (31%) gave relatively more importance to family relations and sense of dignity, whereas LC1 (31%) and LC2 (25%) deprioritized survival and balanced the other attributes.

Attribute relative importance by latent class.
In terms of the associations between latent class membership and respondent characteristics, ANOVA showed that relative to the other classes, the proportion of females was significantly higher in classes 2 and 3, whereas the proportion of males was significantly higher in classes 1 and 4. The proportion of elderly (≥65 y) respondents was significantly lower in class 1 than in the others, and the proportion of respondents seeing the DCE before the BWS tasks was highest in class 4, although there were statistically significant differences in this proportion across all 4 classes. There were no significant differences in class membership in the group of respondents seeing the cancer version of the questionnaire.
Discussion
This DCE reveals important heterogeneity in preferences for end-of-life care between latent classes of public respondents. We find that most respondents demonstrated a preference for what has been called “a good death” 36 over “a longer life.” LC4 showed a strong preference for survival over other aspects of the scenarios but represented only 13% of respondents. Most respondents gave more weight to nonhealth aspects such as relations with family, physical symptoms, and anxiety. At least some of these preferences appear to be correlated with demographics, as males were significantly more likely to be represented in LC4, whereas older respondents were less likely to be included in LC1.
The relatively low importance assigned to survival gains by most respondents compared with aspects such as control of physical symptoms, dignity, and family relations appears consistent with the notion expressed by Sinoff, 37 that most individuals fear the “dying process” rather than death itself. From this perspective, interventions that focus on extending survival at the expense of these other factors are unlikely to be valued by individuals at the end of life. At the same time, however, end-of-life interventions that focus on aspects such as dignity and family relations at the expense of survival or physical symptoms will not typically be valued within a conventional QALY maximization framework. This misalignment between preferences and value is the essence of the “QALY problem” 2 and appears to support the arguments of critics of the QALY in valuing EOL interventions.
From a technical perspective, we believe that the interpretability of the latent classes demonstrates the value of latent class analysis of DCEs. A latent class model allows for a fixed number of classes in which preferences are different between classes but identical within classes. In this sense, it is a compromise between the simple homogeneity of a pooled model, in which each respondent is assumed to have identical preferences, and the continuous heterogeneity of a random parameters model, in which each respondent can have unique preferences. 38 It is a simplified representation of individual heterogeneity, but we see it as a simplification that allows for a useful interpretation of heterogeneity.
Our results are broadly consistent with other studies of public and patient preferences at the end of life. Supiano et al. 39 found that a reluctance to burden others at the end of life was, by far, the most common value expressed by a sample of healthy older adults, followed by a sense of independence and control, including “controlling my own death.” Arguably, all 3 of these values align with the “sense of control” attribute we presented in the DCE. Avoiding pain and suffering was rarely mentioned, and extending survival was not mentioned outside the desire “to not live in pain.” Similarly, Steinhauser et al., 40 in asking what patients considered important at the end of life, found that the most commonly mentioned factors were associated with aspects of dignity and control. Pain control ranked ninth.
Engelberg et al., 41 in testing the correspondence between patient and surrogate preferences at the end of life, found that patients assigned the greatest importance to “time with spouse/partner” (average 9.62 out of 10), “ventilator/dialysis to prolong life” (9.23), “pain under control” (9.08), “avoiding worry about strain on loved ones” (9.03), and “keeping dignity/self-respect” (9.01). Prolonging life rated highly but lower than time with a spouse or partner and in a similar range to aspects such as pain control, strain on loved ones, and dignity/self-respect.
Bryce et al. 12 found that a general population sample was willing to sacrifice a median of 7.2 to 7.7 y of survival for a “better end-of-life experience,” including “empowerment to control over daily surroundings in the intensive-care unit,” “empowerment to participate in medical treatment and care decision,” “financial and emotional support for family members,” or up to 8.3 y for all three. Again, this suggests an emphasis on “the dying process” rather than extending life.
In a slightly different context, González-González 42 conducted a meta-analysis and found that only 21% of older patients with comorbidities were willing to receive “life-sustaining treatments” such as cardiopulmonary resuscitation, mechanical ventilation, or feeding tubes. Likewise, Liu et al. 43 found that only 8.2% of terminally ill cancer patients in Taiwan preferred “life-prolonging” end-of-life care compared with 48.4% who preferred “comfort-oriented” care and 33.2% who would accept their physician’s recommendation.
A limitation of our study is that we did not conduct primary qualitative research to inform the attributes of our DCE. However, given the extensive qualitative literature around the issue of values and preferences at the end of life, particularly the sources noted above, we felt that a small qualitative study as part of this primarily quantitative project would add little to our understanding of the issues. As noted, we validated our subjective identification of key themes with a palliative care physician and a palliative care coordinator. Given practical and ethical issues, we did not include any patients at end of life in our design or validation process.
Likewise, we did not (to our knowledge) elicit preferences from respondents at their end of life. Including such respondents would have presented substantial ethical and practical difficulties. Further, it is a principle of publicly funded health care systems such as the National Health Service in the United Kingdom that resource allocation decisions should be based on the values and preferences of society rather than patients. However, in light of the criticisms of additivity that we noted earlier, it is possible that the preferences of individuals closer to their end of life may be different from what we observed in this public sample. In particular, it may be the case that survival becomes increasingly more important as individuals knowingly approach death. In this context, however, we note that our latent class analysis found that older respondents were not significantly more likely to be represented in LC4, which held the strongest preference for survival.
Finally, we note that there were small preference reversals between the intermediate levels of some attributes in the latent class results. This may reflect ambiguity in wording between “sometimes” and “most times” as well as genuine indifference between these levels among some respondents. Overall, however, we see logical and statistically significant differences in preferences between the extreme ends of the attribute scales (“never,” “always”). It is these extreme levels that inform our estimates of attribute importance.
Our study builds on an extensive but primarily qualitative literature showing the importance of non-QALY considerations at the end of life and adds a quantitative understanding of the relative importance of these factors. We find that although some respondents strongly prioritized survival, most assigned at least as much importance to non-QALY factors such as good family relations and a sense of dignity and control as they did to aspects such as survival, physical symptoms, and anxiety. This appears to support the notion of “the QALY problem” 2 and suggests that valuing end-of-life care on the basis of QALY gains may systematically undervalue such interventions. To more appropriately capture the value of health care interventions that focus on aspects other than survival gains or the direct physical or mental symptoms of a condition—including but not limited to end-of-life interventions—we call for an approach that values broader well-being. We suggest moving beyond the relatively narrow HRQOL QALY and toward some version of a “well-being–adjusted life-year” (e.g., the “WALY” 44 ) that can account for broader dimensions of well-being while maintaining comparability between different technologies and even between interventions in different sectors.45,46
Supplemental Material
sj-docx-1-mpp-10.1177_23814683241252425 – Supplemental material for A Longer Life or a Quality Death? A Discrete Choice Experiment to Estimate the Relative Importance of Different Aspects of End-of-Life Care in the United Kingdom
Supplemental material, sj-docx-1-mpp-10.1177_23814683241252425 for A Longer Life or a Quality Death? A Discrete Choice Experiment to Estimate the Relative Importance of Different Aspects of End-of-Life Care in the United Kingdom by Chris Skedgel, David John Mott, Saif Elayan and Angela Cramb in MDM Policy & Practice
Footnotes
Acknowledgements
The authors would like to acknowledge the assistance of Krishnali Parsekar with the targeted literature review.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided by a grant from the National Institute of Health Research, East of England Collaboration for Leadership in Applied Health Research and Care (EoE CLAHRC), reference HE-06.
Presentations
This work was presented as a virtual podium session at the ISPOR European Congress 2020.
Author Contributions
Chris Skedgel and Angela Cramb were responsible for study conception. Chris Skedgel and Saif Elayan conducted the data analysis. Chris Skedgel, Angela Cramb, Saif Elayan, and David Mott participated in data interpretation. Chris Skedgel and Saif Elayan drafted the initial manuscript, and Angela Cramb and David Mott made key editorial contributions. All authors approved the submission of the manuscript.
Most of this work was completed while C.S. and A.C. were with the Health Economics Group of the University of East Anglia, Norwich, UK, and S.E. was an MSc student. D.M. was with the Office of Health Economics (OHE). A.C. retired, C.S. moved to a position with the OHE, and S.E. moved to the University of Groningen over the course of producing this manuscript. S.E. is now at the Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht University, the Netherlands.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
