Abstract
Objectives
Physician preference items (PPIs) are high-cost medical devices for which clinicians express firm preferences with respect to a particular manufacturer or product. This study aims to identify the most important factors in the choice of new PPIs (hip or knee prosthesis) and infer about the existence of possible response biases in using 2 alternative stated preference techniques.
Methods
Six key attributes with 3 levels each were identified based on a literature review and clinical experts’ opinions. An online survey was administered to Italian hospital orthopedists using type 1 best-worst scaling (BWS) and binary discrete choice experiment (DCE). BWS data were analyzed through descriptive statistics and conditional logit model. A mixed logit regression model was applied to DCE data, and willingness-to-pay (WTP) was estimated. All analyses were conducted using Stata 16.
Results
A sample of 108 orthopedists were enrolled. In BWS, the most important attribute was “clinical evidence,” followed by “quality of products,” while the least relevant items were “relationship with the sales representative” and “cost.” DCE results suggested instead that orthopedists prefer high-quality products with robust clinical evidence, positive health technology assessment recommendation and affordable cost, and for which they have a consolidated experience of use and a good relationship with the sales representative.
Conclusions
The elicitation of preferences for PPIs using alternative methods can lead to different results. The BWS of type 1, which is similar to a ranking exercise, seems to be more affected by acquiescent responding and social desirability than the DCE, which introduces tradeoffs in the choice task and is likely to reveal more about true preferences.
Highlights
Physician preference items (PPIs) are medical devices particularly exposed to physicians’ choice with regard to type of product and supplier.
Some established techniques of collecting preferences can be affected by response biases such as acquiescent responding and social desirability.
Discrete choice experiments, introducing more complex tradeoffs in the choice task, are likely to mitigate such biases and reveal true physicians’ preferences for PPIs.
Keywords
Introduction
In several countries, policymakers are increasingly adopting evidence-based decision-making models such as health technology assessment (HTA) and value-based procurement to decide whether innovations enter routine practice.1–3 However, clinicians’ preferences in selecting health care technologies are still relevant, especially in the field of surgical devices. For example, a recent study 4 showed that in England, surgeons, rather than hospital purchasers, are the main drivers of the increase in laparoscopic colectomy use. Some medical technologies are more sensitive to physicians’ choices and therefore are called physician preference items (PPIs). For PPIs, it is the physician who chooses the product and the supplier, typically based on personal experience with the device and relationships with the vendor’s sales representative.5,6 PPIs usually include orthopedic prostheses (e.g., hip and knee replacement components), coronary stents, and some spine surgery devices, and they are responsible for up to 60% of a hospital’s total expenditures on supplies. 7 The medical decision-making process is inherently multidimensional, encompassing medical, technological, economic, and experiential elements. Therefore, a deep understanding of the factors that influence physicians’ preferences for PPIs is of paramount importance for policy makers and hospital managers, who must guarantee high-quality health care and efficient allocation of scarce resources at the same time. 8
In recent years, the literature has started exploring the variables that affect clinicians’ preferences for medical devices but mostly through qualitative interviews or Likert-type scales using direct questions (e.g., how important is cost?).8–12 A relevant issue in measuring noncognitive characteristics through Likert scales is that, in case of social pressures, respondent’s answers are susceptible to response biases 13 such as social desirability (i.e., the tendency of respondents to reply in a manner that will be viewed favorably by others) 14 and acquiescent responding (i.e., the tendency for survey respondents to agree with statements regardless of their content if disagreeing is seen as problematic).15,16 In PPI surveys, physicians may be influenced by their scientific community, hospital managers, and health care policy makers, with the risk of providing incorrect and misleading information to decision makers about the factors that truly influence their choices.
The objective of this study was to collect clinicians’ preferences on PPIs and, relying on the existing literature, infer about the existence of possible response biases in using two alternative stated preference techniques, one of which makes socially desirable answers more evident to the respondent.
Methods
We designed an online survey using 2 different stated preference methods, namely, best-worst scaling (BWS) and discrete choice experiment (DCE). Their common basic assumption is that choices are made within the random utility model, according to which the frequency of choices between 2 (or more) alternatives provides an estimate of the utility associated with each element on a latent scale of preferences. 17 For both BWS and DCE, the options that are relevant to a choice situation are first described by their attributes. The BWS of type 1 (object case) simply asks the participant to identify the couple of attributes considered, respectively, “best” and “worst” (or “more important” and “least important”) within a single option.18,19 BWS is similar to Likert ratings but with the advantage that participants are not required to calibrate their responses to the scale range and there are no differences in scale use or interpretation between participants. 20 In this study, we compared DCE with BWS to provide a benchmark method that is less affected by other biases typically concerning Likert scales such as the tendency to choose the middle option or to avoid selecting the extreme ends of the scale (i.e., end-of-scale bias). Moreover, the BWS forces respondents to select explicitly both the best and the worst items, thus encouraging more careful consideration of the attributes and reducing the tendency to provide biased responses. This is particularly valuable in case all attributes appear socially desirable and responses on the Likert scale might be skewed toward the upper extreme.
In the DCE, the attributes are further described by their associated levels, and participants are required to choose among 2 (or more) hypothetical choice options (or scenarios) including all attributes but with different levels. 21 By forcing respondents to tradeoff some elements for others, the DCE incorporates opportunity costs into the elicitation process 22 and reveals more about “true” preferences in the welfarist sense. 23 The DCE is even more effective than the BWS in mitigating response biases since the increased task’s complexity makes it difficult for the respondent to simply opt for the most socially desirable response. In preference elicitation exercises, indeed, the ease of identification of a socially desirable behavior increases the probability of being affected by this bias. Unlike the BWS (of type 1), which simply requires to rank a list of items, the DCE adds levels to each attribute, thus reducing the influence of acquiescent responding. Moreover, preferences are expressed for a combination of attribute levels (scenarios) instead of single attributes, thus making respondents feel less judgeable about their choices. So far, few studies have addressed the impact of the design of choice experiments on social desirability; among them, Huls et al. 24 found poor evidence of social desirability when using a DCE.
In this study, we used, first, BWS (object case) to capture direct preferences (without tradeoffs), thus leaving physicians potentially exposed to the influence of socially desirable or acquiescent responding and, second, a binary DCE that, exposing participants to tradeoffs among choice dimensions, makes response biases less identifiable and might reduce their impact on stated preferences.
Case Study
The experiment targeted the orthopedist community by focusing on hip and knee implants, which in 2019 accounted for 3.6% of public expenditures (182 million Euros) on medical devices in Italy. 25 Moreover, orthopedic clinicians in Italy are potentially exposed to well-identifiable pressures on cost containment by hospital managers and policy makers on one hand and, on the other, on independence from industry and evidence-based decision making by the scientific societies.26–29 The online survey was administered to Italian orthopedists over a 2-month period to identify the most relevant factors in the choice of a new hip or knee prosthesis. All participants were required to complete a 12-item questionnaire on professional information before responding to a single BWS question followed by a DCE task composed of 10 questions.
Selection of Attributes/Levels
The choice of the attributes (K) and corresponding levels (Lk) to be included in the experiment were inspired by previous studies on PPIs5,8,12,30–47 and particularly referring to some of the factors (i.e., clinical evidence, cost of implant, and physicians’ past experience with suppliers or device manufacturers) tested in US-based surveys.8,12,47 The list of attributes and levels was finalized after consultation with the Italian Society of Orthopaedics and Traumatology (SIOT) Directors’ Council and especially with one of its members, the clinical expert author (F.B.), who also piloted the survey. The final design was balanced with each attribute (K = 6) having the same number of levels (LK = 3), as reported in Table 1.
Attributes and Levels
Study Design: BWS
The first task was designed as a type 1 BWS (object case) that does not involve the decomposition of attributes into levels and is frequently adopted to analyze the characteristics of a new product or service. In this survey, a single question provided the list of the 6 mutually exclusive attributes (K) identified together with a brief description of each, asking participants to indicate the 2 considered respectively the “best” (i.e., most important in adopting PPIs) and the “worst” (i.e., least important in adopting PPIs), that is, the best-worst pair of preference (Appendix Figure A1). 18
Study Design: DCE
The second task was designed as a binary DCE, in which respondents were invited to imagine themselves in a situation in which they had to decide whether to adopt a new hip or knee prosthesis and to choose their preferred option from a series of pairwise unlabeled alternatives (A and B) obtained by a unique combination of different attributes’ levels (Appendix Figure A2). The scenarios were conceived to reflect the previous literature8,12 and everyday clinical practice. A fractional factorial design was applied to obtain a manageable number of choice questions from all possible combinations of attributes and levels (LKK = 36 = 729). 48 The dcreate command was run in Stata 49 to create an efficient design using the modified Fedorov algorithm. 50 The choice set was reduced to 20 paired scenarios, split into 2 blocks of 10. The respondents were randomly assigned to each block using the function provided by the Web-based survey tool used for the survey administration. The questions within each block were randomized as well to rule out any possible effects that ordering may have on the estimation.
Data Collection
The study was approved by the Ethics Committee Review of Bocconi University on October 2, 2020. The online survey was designed via Qualtrics XM software (Qualtrics, Provo, UT) and consisted of a self-administered questionnaire. A mailing list was created by mapping all the orthopedic units in Italy through the analysis of the 2016 Italian National Hospital Discharge Records and manually searching hospitals’ Web site and Google to identify the orthopedists operating in each unit and their contacts. The potential respondents (n = 2,202) were invited to participate by an e-mail including a brief description of the research and the survey link. They were assured of confidentiality and anonymity and required to provide their informed consent. The survey was conducted between October and November 2020, with reminder e-mails being sent at 2- to 3-wk intervals.
Data Analysis: BWS
The completed questionnaires were downloaded in .csv format by Qualtrics, and the database was structured for statistical analysis conducted using Stata/SE (version 16, StataCorp LLC, College Station, TX). A P value of 0.05 was considered statistically significant.
First, the answers to the BWS question were analyzed through descriptive statistics by counting the number of times each attribute was selected, respectively, as the “most important” (i.e., best total score) and the “least important” (i.e., worst total score). Therefore, a best-minus-worst score (B-W score) was calculated as the difference between the best total score and the worst total score.
Second, we ran a conditional logit model (clogit in Stata) that treats each best-worst pair as a possible outcome of the respondent’s decision-making process (model 1). In a question containing K = 6 attributes, there are K (K − 1) = 30 possible best-worst combinations to choose from. 51 The dependent variable is equal to 1 for the selected pair and 0 otherwise. The explanatory variables (i.e., the attributes) are coded, for each possible pair, as 1 for the best, −1 for the worst, and 0 otherwise. The attribute most frequently chosen as the least important is taken as a reference value (equal to 0) in the model, with respect to which all other coefficients must be interpreted. The statistically significant coefficients indicate the importance of each attribute in determining the overall utility for the participant. 52 Akaike information criterion (AIC), Bayesian information criterion (BIC), and conditional Akaike information criterion (CAIC) statistics were computed to assess models’ fit.
Lastly, a further model (model 2) was performed to investigate heterogeneity in preferences, by adding to the explanatory variables the participants’ characteristics (e.g., gender) that interacted with the experiment’s attributes. The interaction terms represented the additional utility of each attribute for the subgroup under consideration (e.g., males v. females). Continuous variables and categorical variables with 3 or more response options in the questionnaire were dichotomized to increase the subsamples’ size; for example, 2 classes were created for age and professional experience (in years) around their median value. A univariate regression analysis was preliminarily performed to identify the characteristics that showed at least 1 significant interaction. Then, a backward selection was applied to identify the final model including only significant interactions.
Data Analysis: DCE
The DCE data were analyzed using a mixed logit regression model (gmnl command in Stata, with the specification mixl) in which the dependent variable was the dichotomous choice of the hypothetical scenarios, and independent variables were the factor/level combinations (model 1). One hundred iterations using Halton draws were applied. A dummy variable coding was applied to all attributes except device cost, which was treated as a continuous variable. The worst level of each attribute (e.g., the least robust clinical evidence, the lowest product quality) was considered as the reference case and omitted from the regression. The regression coefficients should be interpreted as the increment in utility associated with moving from the reference level of each attribute (the worst level) to the other levels. Under the assumption that clinicians may have heterogeneous preferences for a new device’s characteristics, we specified all factors as random parameters with normal distributions. As for BWS, interaction terms between the respondents’ observable characteristics and attribute levels were added to the model (model 2). AIC, BIC, and CAIC statistics were computed to assess the models’ fit. A likelihood ratio test was performed to assess whether the extended model (model 2) improved the explanatory power compared with model 1. The willingness-to-pay (WTP) for a change in each attribute level was computed as the ratio of mean attributes’ coefficients to cost coefficient from model 1. 53 Marginal rates of substitution between attributes were computed to investigate the rate at which clinicians would be willing to give up high-quality clinical evidence or high-quality products to obtain a gain in “softer” attributes (i.e., relationship with the supplier and experience of use). The responses to DCE were analyzed along 2 dimensions to assess their internal validity: 1) frequency of choice of the same scenario (A or B) by each respondent or 2) attribute dominance (i.e., whether respondents choose the alternative with the better level of one attribute in all or nearly all choice questions). 54
Results
A total of 240 questionnaires were collected from the online survey. Of these, 121 were discarded because they were unfinished and a further 11 because participants did not give consent to participation and/or did not implant hip or knee prosthesis in the last year. Finally, 108 questionnaires were in usable form and robust to validity checks; therefore, they were all retained for the analyses. The completion rate was 45%. We collected responses from 85 Italian hospitals, mostly public (63), located in 18 regions (out of 21), and accounting for 22% of hip implants and 18% of knee implants performed in Italy in 2016 (based on 2016 Italian hospital discharge records).
Sample Description
The mean age of the participants was 52.8 (±10.1) years, and the great majority (93.5%) were men. The average postgraduate work experience was 25.9 (±10.4). Of the orthopedists, 42.6% were second-level medical managers (i.e., head of the orthopedics unit) and 45.4% were first-level medical managers (i.e., fixed-term clinicians in an orthopedics unit) in public hospitals or covered equivalent positions in private hospitals. In the last year, 47.2% and 34.3% performed more than 50 hip and knee implants, respectively. A small percentage had an experience as prosthesis designer or proctor (Table 2).
Respondents’ Characteristics (N = 108)
This includes both first-level managers in public hospitals and similar professional figures working in private hospitals.
This includes both second-level managers in public hospitals and similar professional figures working in private hospitals.
Respondents were asked to express their perceived autonomy on a scale from 0 (no autonomy) to 5 (complete autonomy).
BWS Results
In BWS, the highest valued attribute was “clinical evidence” followed by “quality of products,” whereas the item with the lowest B-W score was “cost,” followed by “relationship with the sales representative.” The 2 remaining attributes obtained an equal number of “best” and “worst” responses (Appendix Table A1).
The conditional logit regression coefficients (model 1) were aligned with the BW frequency counts. The importance of each attribute was estimated relative to “cost,” which was most frequently selected as “worst.” All attributes had coefficients that were significantly different from 0 and of expected positive sign. The attribute presenting the highest utility coefficient was “clinical evidence” followed by “quality of products,” while “relationship with the sales representative” was the lowest rated item before “cost.” The conditional logit model with the addition of interaction terms (model 2) showed that “HTA recommendations” was particularly important for first-level medical managers and for those who implanted more than 50 hip prostheses in the past year. Conversely, “quality of products” and “previous experience” were valued as less important by clinicians who implanted more than 50 knee prostheses compared with those who implanted fewer than that (Table 3).
Best-Worst Scaling Results from Conditional Logit Regression
AIC, Akaike information criterion; BIC, Bayesian information criterion; CAIC, conditional Akaike information criterion; CI, confidence interval; SE, standard error.
In the DCE, all attributes/levels (except for “average learning needed”) had a significant influence on the orthopedist’s decision to adopt a new hip or knee prosthesis (Table 4, model 1). The directions of the coefficients were in accordance with our hypotheses (e.g., positive sign for all attributes’ levels with respect to their reference level; negative sign for cost). In model 2 (with interaction terms), 2 categories of more experienced participants (i.e., those who implanted ≥50 prostheses over the past year and those acting as “proctor”) reported significantly different preferences in relation to “clinical evidence” and “HTA recommendations,” respectively (Table 4).
Discrete Choice Experiment Results a
AIC, Akaike information criterion; BIC, Bayesian information criterion; CAIC, conditional Akaike information criterion; CI, confidence interval; HTA, health technology assessment; RCT, randomized controlled trial; SE, standard error.
The sign of the estimated standard deviations is irrelevant; they should be interpreted as being positive.
Table 5 reports the mean WTP estimates for changes in attributes’ levels calculated from the restricted model (model 1). The marginal WTP for a device with robust clinical evidence (i.e., safety study + randomized controlled trial (RCT)/observational study with bias balance) compared with a device with only 1 safety study available was €1,829. The WTP for a high-quality product was €1,733, €2,843 for a good relationship with the supplier’s sales representative, and €3,495 for a device with a positive HTA recommendation. These findings suggest that, in contrast with the BWS, the existence of a positive HTA recommendation and the relationship with the supplier’s sales representative are more important than having a high-quality product or a product with an RCT/observational study (with bias balance) of comparative efficacy. The differences between BWS and DCE preferences rankings are highlighted in Figure 1 (cost is not reported for DCE because it is used to calculate the WTP).
Mean Willingness-to-Pay Estimates for Changes in Attributes’ Levels
CI, confidence interval; HTA, health technology assessment; RCT, randomized controlled trial; SE, standard error.

Differences between best-worst scaling and discrete choice experiment items ranking.
Marginal rates of substitution between attributes are reported in Appendix Table A2. They revealed that respondents would be willing to bear a higher sacrifice in the products’ quality or in the evidence level (negative sign) to obtain a gain in the relationship with the supplier’s sales representative (all coefficients in absolute values are greater than 1) than to obtain a gain in experience of use (almost all coefficients in absolute values are lower than 1).
Discussion
Synthesis of Results
The process of uptake and diffusion of technological innovations in health care, starting with marketing authorization normed by regulation systems in different jurisdictions and ending with purchasing decisions at the local level, encompasses a broad range of stakeholders including HTA agencies, physicians, purchasers, providers, and patients’ associations. The new EU Medical Device and HTA Regulations place the provision of robust clinical evidence at the heart of the approval procedure to make the whole market-access process more evidence based, less fragmented, and, therefore, less influenced by local and/or specific stakeholders’ expectations.55,56 Nevertheless, physicians are the end users of medical technologies and, especially for PPIs, undoubtedly keep playing a pivotal role at the time of purchase, which is also the most decisive as to the diffusion.
So far, studies dealing with physician’s preferences have mainly used traditional rating scales and implicitly assumed that no distortions affected the survey responses, so that true preferences coincided with the ones declared in the survey. This study inferred that simply asking physicians to rank the importance of choice dimensions bears the risk of collecting preferences affected by acquiescent responding and social desirability, while exposing them to repeated tradeoffs in the choice of multifactorial scenarios allows to capture the true perceived relative importance of several dimensions. Therefore, we collected preferences from Italian orthopedists within the same choice context (i.e., the adoption of a new hip or knee prosthesis) but using 2 distinct stated preference methods, of which the BWS might expose them more to potential responses biases. 15 The sample size (N = 108) was in line with most DCE studies, which enroll between 100 and 300 participants. 57 First, BWS asked respondents to simply rank the importance of attributes in the choice to adopt a new prosthesis in orthopedics. The BWS object case is less cognitively demanding (compared with DCE and other more complex stated preference techniques) and increasingly adopted in health care surveys, together with less sophisticated approaches to analyze data (e.g., best-worst count analysis).18,58 We retrieved all attributes from the literature on PPI except for “HTA recommendations,” which was deemed crucial for our study as it conveys a different concept with respect to “clinical evidence” and “cost.” In fact, despite HTA recommendations being based on the assessment of both clinical and economic evidence, each of these elements influences decisions in different directions. Indeed, people generally have a negative preference for costs and a positive preference for clinical evidence, while preference direction for HTA is less clear and depends, for example, on the trust in the HTA authority that issued recommendations. In detail, “clinical evidence” and “HTA” differ under several dimensions, which are likely to impact differently on the overall judgment about a medical technology. For example, they use different value domains and related measures (i.e., efficacy and safety for the former, a broader range for the latter—including economic, social, ethical, and organizational implications, as outlined in the HTA core model by EUnetHTA 59 ), evidence standards (i.e., RCTs for the former, real-world studies for the latter), and time horizon (i.e., shorter for the former, longer for the latter). In addition, it should also be noticed that cost and HTA are not correlated in principle. The cost attribute, indeed, refers to the individual product’s cost, while HTA considers incremental costs over a long time (typically lifetime) horizon (and, occasionally, net costs in the short-term for a budget impact analysis) but also broader cost categories including family’s costs and productivity losses. Thus, a credible scenario can include low-cost, good evidence, and a negative HTA recommendation if an alternative product has even lower costs and at least noninferior clinical evidence. For example, in the United Kingdom, a technology appraisal guidance from the National Institute for Health and Care Excellence (NICE) comparing 8 different biological drugs for rheumatoid arthritis, all presenting a comparable cost-effectiveness profile, recommended starting treatment with the least expensive product considering a variety of costs (i.e., price per dose needed and administrative costs). 60
The analysis of BWS responses revealed that clinicians assigned the highest value to ‘clinical evidence’, and the lowest to ‘cost’. Moreover, participants with a higher volume of activity (i.e., ≥50 prostheses implanted over the last year) had significantly different preferences compared to clinicians performing less implants, while ‘HTA recommendations’ were particularly considered by first level medical managers that, likely due to their younger age, are more sensitive to pharmacoeconomic and HTA topics.
Second, DCE forced physicians to simultaneously evaluate different attribute/level combinations in randomly assigned hypothetical scenarios, with questions in a random order, thus making the choice more complex and less directly exposed to response biases. In DCE, clinicians’ choices revealed different preferences than those assessed through the BWS. In fact, “clinical evidence” was not the most important factor, and device “cost” had a small but significant influence on the choice of adopting a new prosthesis. Moreover, orthopedists would be willing to pay more for a good relationship with the supplier’s sales representative than for a high-quality product or for a product with robust clinical evidence, and HTA recommendations play a major role in driving their decisions. Overall, a low degree of heterogeneity was observed in physicians’ responses collected with both techniques, as revealed by the small number of significant interactions.
Policy and Research Implications
This study has several managerial and policy implications, as well as implications for future research. First, it showed that collecting preferences through ranking exercises, as done so far in research on PPIs, might produce results that are more aligned with the socially accepted opinions of different stakeholders (e.g., scientific community, hospital managers, and policy makers)25–28 but, at the same time, might not necessarily reflect individuals’ true preferences. 23 Previous studies comparing DCE and BWS reported different preference estimates regardless of the health context, thus suggesting that the 2 methods may be measuring different constructs. However, no comparison was made with BWS of type 1 61 that is even more different from the DCE since attributes are not articulated into levels. This study attempted to fill this literature gap. In our study, the differences between BWS and DCE results can be explained in 2 different ways according to the respondent’s willingness. If physicians intentionally completed the BWS task not following their real thought, this means that they are aware of what they prefer but choose not to disclose it to get into alignment with hospital managers and policy makers’ expectations. Otherwise, physicians might not be fully aware of what they really prefer. If simply asked about the importance of choice factors, they honestly believe they are driven by clinical evidence and quality of products whereas, when exposed to a more complex choice (such as the real multidimensional one), they show themselves to be more sensitive to HTA and the relationship with the sales representative. Indeed, the DCE, compared with the BWS of type 1, provides participants with more information (for example, asking to consider a range of realistic costs instead of “cost” as an abstract concept), which inevitably influences their individual preferences. Further research is needed to understand to what extent the difference between BWS and DCE choices is intentional and the main drivers underlying this difference.
Second, DCE results highlighted the importance of the sales representative’s reliability. A potential interpretation of this finding is that physicians could use the relationship as a “proxy” for product quality, although in principle, the 2 dimensions are not necessarily correlated. This is one way of acknowledging the value of the service component in the medical technology supply, which can make a difference in manufacturers’ competitive advantage. 62 The “servitization” process (i.e., a company shifting from a product-centric to a service-centric business model and logic) in the medical technology industry is usually slower compared with other industries, and the regulatory framework can both favor and hinder this transition. 63 In Italy, public tenders for the purchase of medical devices mostly focus on purchasing goods rather than integrated bundles of goods and services, with the result of considering service a “nice to have” or a “given for granted” optional and leading physicians to conceal (or not be aware of) their sensitivity to this component. This is mainly due to complexities in managing tenders for public health authorities rather than to legal constraints, as the guidelines of the Italian Ministry of Health on public procurement recommend splitting the product price from the service price, within the same offer. 64 This study suggests using new contractual models that integrate service as an explicit component of products’ quality in public procurement, in line with existing guidelines.
Third, HTA is a much-needed guidance for clinical decision makers, although currently very few examples of a direct link with public procurement have been reported, mainly in gray literature and conference proceedings. HTA represents a crucial component of the rising value-based paradigm, whose final aim is to improve decision-making processes in health care by identifying the alternative that brings the highest value for the system as a whole. This trend is particularly evident and tangible for the MedTech ecosystem, as witnessed by the recent approval of the new EU Medical Device and HTA Regulations55,56 and the ongoing debate on HTA in the United States. 65 The results of this study show that a positive (full) HTA recommendation would help clinicians to select the best option for the patient and at the same time deal with cost-containment pressures from hospital managers and policy makers without directly negotiating the tradeoff between additional costs and benefits. In this perspective, the Italian National Program for HTA of Medical Devices is a promising decisional framework whose importance is implicitly claimed by physicians too, with the potential to foster the integration of HTA into medical device procurement and turn the current purchasing system into a value-based procurement approach. 66
Study Limitations
This study presents several limitations. First, the sample was based on voluntary participation, and the individual response rate was only 5% (the hospital response rate was 15%). Therefore, the participants might not be fully representative of Italian orthopedists, despite being at sufficiently different ages and career stages and coming from all 3 geographical areas (i.e., north, center, south) and 18 different regions. The male-female ratio was comparable to national data (female SIOT members were 11.4% in 2021). Second, the choice of attributes was limited to 6, with 3 levels each, to avoid an excessive cognitive burden to respondents, 22 although these may not entirely capture the complexity of the decision-making process. Third, clinicians are usually not familiar with BWS and DCE question formats, and this might have led some potential participants to opt out the survey (i.e., nonresponse bias). 58 Fourth, since respondents always completed the BWS before the DCE, we cannot exclude that responses were somehow affected by ordering effects. However, this order of tasks was chosen to allow participants to become familiar with the attributes before performing the much more complex DCE involving attribute-level combinations. Moreover, only few studies in the literature randomized which task (BWS or DCE) was presented first. 61 Fifth, experiments relying on stated preference techniques investigate hypothetical behaviors instead of actual ones and therefore may lack of external validity. 67 Thus, this study, results could gain more credibility if compared with real-world data (e.g., about in-hospital purchasing procedures) or evidence collected in broader contexts. 68 Lastly, the tradeoffs that are intrinsic in stated preferences techniques (and particularly in DCEs, where choices require weighing multiple attributes at a time) can help mitigate response biases but not completely avoid them. Indeed, respondents might still choose the more socially accepted options, especially in case of sensitive choice tasks including health-related attributes.
Conclusions
This study suggested that collecting physicians’ preferences with different methods can lead to considerably divergent conclusions. In BWS (object 1), which is like a ranking exercise, clinicians might be influenced by acquiescent responding and social desirability. The DCE, instead, by introducing tradeoffs in the choice task, is likely to reveal more about true preferences (and indeed is generally preferred by economists). Therefore, the use of DCE is encouraged, although more research is needed to identify the most appropriate methods to collect undistorted preferences for medical devices that can ultimately facilitate the HTA process and the diffusion of a value-based health care.
Supplemental Material
sj-docx-1-mdm-10.1177_0272989X231201805 – Supplemental material for Collecting Physicians’ Preferences on Medical Devices: Are We Doing It Right? Evidence from Italian Orthopaedists Using 2 Different Stated Preference Methods
Supplemental material, sj-docx-1-mdm-10.1177_0272989X231201805 for Collecting Physicians’ Preferences on Medical Devices: Are We Doing It Right? Evidence from Italian Orthopaedists Using 2 Different Stated Preference Methods by Patrizio Armeni, Michela Meregaglia, Ludovica Borsoi, Giuditta Callea, Aleksandra Torbica, Francesco Benazzo and Rosanna Tarricone in Medical Decision Making
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The financial support for this study was provided by an unrestricted grant from the Directorate General of Medical Devices and Pharmaceutical Service, Ministry of Health, Italy. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. This work was presented at the following conferences: International Pharmacoeconomics and Outcome Research (ISPOR), May 17–20, 2021 (virtual event); International Health Economics Association (iHEA), July 12–15, 2021 (virtual event); and Italian Health Economics Association (AIES), December 2–3, 2021 (Milan).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
