Abstract
Study Design
General population utility valuation study.
Objective
To develop a technique for calculating utilities from the Neck Disability Index (NDI) score.
Methods
We recruited a sample of 1200 adults from a market research panel. Using an online discrete choice experiment (DCE), participants rated 10 choice sets based on NDI health states. A multi-attribute utility function was estimated using a mixed multinomial-logit regression model (MIXL). The sample was partitioned into a training set used for model fitting and validation set used for model evaluation.
Results
The regression model demonstrated good predictive performance on the validation set with an AUC of .77 (95% CI: .76-.78). The regression model was used to develop a utility scoring rubric for the NDI. Regression results also revealed that participants did not regard all NDI items as equally important. The rank order of importance was (in decreasing order): pain intensity = work; personal care = headache; concentration = sleeping; driving; recreation; lifting; and lastly reading.
Conclusions
This study provides a simple technique for converting the NDI score to utilities and quantify the relative importance of individual NDI items. The ability to evaluate quality-adjusted life-years using these utilities for cervical spine pain and disability could facilitate economic analysis and aid in allocation of healthcare resources.
Keywords
Introduction
The number of cervical spine procedures performed for common pathologies such as cervical radiculopathy and cervical spondylosis in the United States have been steadily increasing from the mid 1990s.1,2 Given the potential risks of surgery, it is critical to demonstrate the value of these procedures to patients and policy makers. The ability to calculate quality-adjusted life-years (QALYs) for patients undergoing cervical spine surgery would help in this regard.
Quality-adjusted life-years analysis could help patients and clinicians jointly assess the trade-offs between prognosis, health-related quality-of-life (HRQoL) benefits, recovery, and potential complications to reach an optimal treatment decision.3,4 QALYs also aid in economic analysis as economic decisions are based on the incremental cost-effectiveness ratio, which is the cost per QALY gained. 3 QALYs are calculated using utilities, or HRQoL weights. Utilities are a number, typically between 0 and 1, that quantifies the preference for (ie desirability of) a health state. 3 The utility of perfect health is set at 1 and the utility of a “dead” state is set at 0. If a patient’s current health state is measured at a utility of .7, it means that the general population would regard 10 years of life in Patient A’s health as equivalent to 7 years of life in perfect health (10 years x .7 = 7 years).
Utility values provide the foundation for determining and comparing health-care interventions. For example, how does the federal health agency of a country determine whether a carotid artery endarterectomy or an anterior cervical discectomy and fusion is more valuable to society and thus which procedure to prioritize for funding? To do this, one needs a metric to compare “apples and oranges” and utilities and its translation to QALYs are currently the most optimal method of doing so. Although utilities can be calculated using generic outcome measures such as the SF-36, 5 there is concern that these generic measures have psychometric limitations.6,7 Furthermore, disease-specific measures may better capture smaller changes and are more sensitive and responsive for certain conditions.7-9
Utilities calculated from an instrument designed for neck pain and disability, such as the Neck Disability Index (NDI) score,10,11 could increase the sensitivity and specificity of HRQoL assessments for this condition. The NDI, modeled from the Oswestry Low Back Pain Index, 12 is the most widely used self-report instrument to assess neck pain and disability.13,14 Patients rate symptoms on a 10-item scale (pain intensity, personal care, lifting sleep, driving, sex life, headache, concentration, reading, work) with each item scored out of 5 for a maximum score of 50 (complete disability).10,11 The NDI score has been psychometrically validated across multiple cultural groups, proved to be highly reliable and valid, and has had minimal clinically important difference values established (3-5 points).11,13
In this paper we develop and validate a technique for directly calculating utilities for the NDI score using a discrete choice experiment with a general population sample.
Material and Methods
Subjects
Participants were recruited from an online market research panel (Toluna Influencers, Wilton, CT). 15 Panel members were recruited from across the United States (US) through random-digit-dialing, internet banner advertisements, and partnerships with corporations. 16 We did not provide an incentive for participating in our study; however, the market research company managing the panel does award monthly prizes to panel members based on the number and length of surveys completed. Quota sampling was used to ensure that the study sample was representative of the general US population with respect to region, gender, and age based on 2017 United States Census Bureau Population Estimates Program data. 17
Health States
Neck Disability Index.
Discrete Choice Experiment Valuation Task
Utility valuation was conducted using an online self-administered discrete choice experiment (DCE) questionnaire.
18
DCE methodology is simpler than traditional utility valuation with standard gamble and time-trade-off methods and is therefore better suited for online studies.24,25 In the DCEs for this study, participants were presented with pairs of health states (choice sets) and asked to select the more desirable health state. Choice sets were presented in a table with differing attributes highlighted (Figure 1).
19
Choice set presentation in online discrete choice experiment. Differing attributes are highlighted in green.
Choice Set Selection
As there exist over 700-trillion 1 unique choice sets, it was necessary to select a manageable subset for this study. A D-efficient collection of 120 non-dominated choice sets was organized into blocks of 12 using the modified Federov algorithm with Ngene software (supplemental material Table S2).24,26 The design was developed using parameter values from a general population utility valuation study for the Spine Oncology Study Group Outcomes Questionnaire using DCE methodology. 27 To assess whether participants understood the DCE task, 1 dominated choice set (ie where 1 health state is clearly preferable) was added to each block to test for logic. To assess whether participants engaged in the DCE task and test for internal consistency, 1 choice set was repeated in each block with health state order reversed. Therefore, there were a total of 12 choice sets in each block (10 experimental choice sets, 1 dominated choice set, and 1 repeated choice set). There were 3 levels of randomization in the survey. First, participants were randomized to 1 of the 12 blocks. Second, the order of choice sets in each block was randomized. Third, the health state order was randomized among the participants.
Survey Procedures
The market research company sent panel members an e-mail invitation to participate in our study. Interested panel members were redirected to a secure website hosting the utility valuation exercise.15,28 Participants first read brief background information on neck pain and disability that had been scaled to a US grade 6 level. Next, they were provided with an explanation of DCEs and shown a worked example. Participants then completed a practice DCE and provided feedback before completing the study DCEs. At the end of the survey, participants were asked to provide a five-point Likert rating for the statement “this survey was difficult.”
Statistical Analysis
Participants who spent an average of at least 8 seconds per choice set (to screen those responses derived from limited engagement), selected the clearly preferable alternative in the dominated choice set, and provided consistent responses for the repeat choice set were deemed to have engaged in and understood the DCE tasks. Only these participants were included in analyses.15,18,29
A multi-attribute utility function was estimated from DCE responses using a mixed multinomial-logit regression model (MIXL) using the “mixl” library in the statistical programming language R.18,30‐33 The regression model incorporated the main survival duration effect, and two-way interactions between survival duration and each NDI item.
18
Each parameter was treated as a random effect to account for participant heterogeneity in the repeated DCE tasks. The random effects were modeled with 1000 draws from a normal distribution. In the base regression model, all NDI items were coded as nominal categorical (dummy) predictors to avoid assumptions of linear or extra-linear effects. The base regression model was simplified by removing non-significant predictors and combining adjacent predictors to maintain a monotonic decreasing relationship. For example, for the reading attribute, no levels greater than 0 were significantly different from 0 and so were excluded (supplemental material Table S3). Model performance during the simplification procedure was monitoring using McFadden’s
In an effort to strengthen the generalizability of the regression analysis, we implemented validation by allocating participants to a training set and validation set in a 1:1 ratio. 35 Regression models were fit using only the training set. The performance of the simplified regression model was assessed via prediction accuracy for choice set selections by participants in the validation set using 1000 draws from the MIXL model. Prediction accuracy was quantified using the area under the curve (AUC) interpreted using the following thresholds: excellent, .9 – 1; good, .8 – .9; fair, .7 – .8; poor, .6 – .7; and failed, .5 – .6. 36
Regression coefficients quantify the impact of dysfunction in a particular NDI item on utility. Since the lowest level for all NDI items is non-dysfunctional, this level (0) imparts no change in utility. Under this scheme, utilities can be calculated by substituting the sum of the product of predictors and coefficients for each NDI item in the formula:
A worked example is provided in the Results section.
Since, the MIXL model treats each coefficient as a normal (“bell-curve”) random variable, regression results consisted of a mean and standard deviation for each coefficient. In this way MIXL techniques model heterogeneity (differences between individuals) of the utility impact of dysfunction in the NDI items. Thus, in order to predict how a single individual values the utility of each NDI item, a random draw is made from the normal distributions estimated by the MIXL model. The mean coefficient values are the expected values for a single individual. In accordance with best practices in health economics, a NDI utility scoring rubric was developed using mean values. 3 The importance of individual NDI items was quantified by calculating the difference in utilities between the best and worst levels of the attribute (importance score). 37
Sample Size Calculation
Three estimates of sample size were considered. S-efficiency is a measure of the minimum sample size to estimate statistically significant regression parameters at the 95% level. 38 Based on S-efficiency, the minimum sample size for the DCE design shown in Supplemental material Table S3 is 192 participants. Johnson and Orme proposed a simple rule of thumb that considers the number of attribute levels, number of choice sets and alternatives. 39 Based on this rule, the minimum sample size is 600. Furthermore, as we planned to implement a test set and validation set in a 1:1 ratio, we required a total 1200 participants.
Results
Respondent Demographic Characteristics.
Multiple adjacent coefficients for several NDI items were collapsed to simplify the base regression: pain 3 and 4; lifting 3/4/5; headache 1/2, headache 3/4; concentration 2/3/4; work 1/2, work 3/4; drive 3/4/5; sleep 4/5; recreation 3/4. Model simplification did not have an adverse effect on performance with the training set as McFadden’s
NDI Utility Scoring Rubric.
To use this table, NDI (Neck Disability Index) responses by subcategory must be converted to numerical levels using Table 1. The appropriate values from this table are then substituted in Equation (1) to calculate utilities. Pain, Pain Intensity; PerC, Personal Care; Lift, Lifting; Read, Reading; Head, Headaches; Conc, Concentration; Work, Work; Drive, Driving; Sleep, Sleeping; Rec, Recreation.
Discussion
In this study, we estimated a multi-attribute utility function for the NDI score for neck pain and disability for the US general population using DCE methods. We validated the regression model by assessing prediction accuracy on an independent set of DCE responses that were not used to develop the regression model. The regression model demonstrated fair prediction accuracy with an AUC of .77 (95% CI: .76-.78). This paper makes 2 clinically useful contributions.
First, we provide a technique for calculating utilities for NDI health states. We have shown a worked example for a hypothetical patient to illustrate how to calculate utilities using equation (1) and Table 3. This utility value quantifies the desirability of patient A’s health state relative to perfect health (pain 4, personal care 2, lifting 5, reading 1, headaches 3, concentration 0, work 3, driving 4, sleeping 2, recreation 3) from the perspective of the general population. An overall utility of .69 in our example means that the general population would regard 10 years of life in Patient A’s health as equivalent to 6.9 years of life in perfect health (10 years × .69 = 6.9 years). In other words, if given the option between living 10 years in Patient A’s health state, or only living 6.9 years plus 1 day in perfect health, members of the general population would, on average, choose to live a shorter duration with better health (the latter option). Since utilities are anchored on perfect health and dead, our data can be used to compare the value of health care interventions across diseases and conditions to aid in prioritization and resource allocation.
The second contribution is the quantification of the importance of each NDI item. Importance scores are listed in Supplemental material Table S3 and quantify the how much individuals discount life in the worst level of each NDI item relative to the best. For example, an importance score .11 for pain intensity items means that individuals would be willing to trade 11% of their remaining life to reverse pain intensity from its worst state to its best state. In contrast, individuals are only willing to trade 8% of their remaining life to reverse personal care ability (importance score .08) from its worst state to its best state. It is important to note that for the reading attribute, because no levels greater than 0 were significantly different from 0 in terms of their utility (ie Levels 0-5 all had a utility of 0), it was determined to be the least important in the eyes of the general public. Based on our data, the general US population ranks the importance of the NDI items (from most important to least important) as pain intensity = work; personal care = headache; concentration = sleeping; driving; recreation; lifting; and lastly reading. Clinicians should heed these findings and offer treatments that maximize function in the most important attributes.
Utility conversions for the Neck Disability Index (NDI),40‐42 Oswestry Disability Index (ODI), 43 and Scoliosis Research Society 22-item (SRS-22r) 44 that have been previously developed use an indirect “cross-walk” protocol. This protocol involves collecting patient responses using both the condition-specific PROM and a generic PROM and fitting a regression model relating the 2 scores. This allows another regression model to be used to convert the predicted generic PROM score to a utility. 3 This cross-walk protocol has 2 important limitations. First, this technique is complicated and may introduce errors through the use of serial regression models. Second, by only considering the aggregate condition-specific score, this technique cannot differentially weigh the importance of individual items in the condition-specific PROM. It is important to appreciate that ex ante utilities are not equivalent to ex post utilities obtained from patients who have experienced the health states. Patients tend to provide higher valuations for health states which predominantly affect physical health than the general population for the same health state. 45 Previous work by Richardson et al. presented a regression model for translating NDI scores to ex post utility values (the study used a population who had previously undergone surgical treatment of cervical disc disease). 41 These utility values may not be appropriate for global healthcare decision making. Although it may seem that applying lower ex ante utilities may infringe on patient autonomy and deny care, healthcare system decision making impacts patients with various conditions. If the objective of healthcare decision making is to maximize the benefit of all patients, utilities across different disease must be comparable to set priorities. Rawls argues that ex ante utilities can be used ethically if valued under a “veil of ignorance”. 46 If we assume that the general population providing ex ante utility valuations may eventually develop the condition of interest, out of self-interest, they should provide fair valuations. Utilities obtained from generic health surveys such as the EuroQol-5D, Short Form-6D, and Health Utilities Index 3 are actually ex ante valuations. 3 Therefore, the ex ante utilities derived in this study may not be appropriate for use for individual patient decisions because they do not quantify patient preferences, but they are highly appropriate for facilitating population level healthcare decision making. 47
One important limitation to our study is that these results are unlikely to be applicable to other countries as median inter-country utility differences for identical health states is over .4. 48 Although differences between value sets within geographic regions are smaller than differences between geographic regions, 49 attempts at explaining these differences through sociodemographic factors, methods of utility valuation, and cultural values have been unsuccessful. 50 Consequently, our results are applicable to the US general population only, and the NDI scale multi-attribute utility functions need to be developed in other regions of the world for use in those areas. Another limitation of our study is that our methodology excludes people who do not have access to the internet. As of 2021, 7% of the United States population do not have access to the internet and therefore could not participate in this study. 51 Lack of internet access is associated with lower socioeconomic status and education. 52 Therefore, due to this “digital divide,” these demographic groups may be underrepresented in our sample.
We have quantified and validated a general population multi-attribute utility function for the NDI used for neck pain and disability. equation (1) and Table 3 can be used together to covert NDI responses to utilities. The regression modeling exercise revealed the relative importance of NDI items to the general population. Together, these data can be used to inform population level healthcare decision making, such as the allocation of limited resources for specific treatments.
Supplemental Material
Supplemental Material - Calculating ex-ante Utilities From the Neck Disability Index Score: Quantifying the Value of Care For Cervical Spine Pathology
Supplemental Material for Calculating ex-ante Utilities From the Neck Disability Index Score: Quantifying the Value of Care For Cervical Spine Pathology by Eric X. Jiang, MD, Joshua P. Castle, MD, Felicity E. Fisk, MD, Kevin Taliaferro, MD, and Markian A. Pahuta, MD, PhD in Global Spine Journal
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Department of Orthopedic Surgery, Henry Ford Health System, Detroit, Michigan, USA.
IRB Approval/Exemption
Exemption was granted by the institutional review board at Henry Ford Health System, Detroit, Michigan, USA.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
