Abstract
Highlights
A previous confirmatory factor analysis established 10 dimensions of the European Organisation for Research and Treatment of Cancer (EORTC) core quality of life questionnaire (QLQ-C30) and its breast module (BR45).
In this study, we selected 1 item per dimension based on fit to the Rasch model, patient- and clinician-rated item importance, breadth of item thresholds, and clinical relevance.
These items form the core of the future Breast Utility Instrument (BUI).
The future BUI will be a novel breast cancer–specific preference-based instrument that potentially will better reflect women’s preferences in clinical decision making and cost utility analyses.
This is a visual representation of the abstract.
Keywords
The EORTC QLQ-C30 (QLQ-C30) is an established general cancer health-related quality-of-life (HRQoL) questionnaire. 1 In patients with breast cancer (BrC), the QLQ-C30 and EORTC breast module BR45 2 are used together to describe general cancer and BrC-specific functional and symptomatic impairments resulting from the disease and/or its treatments in clinical trials. There is currently no satisfactory method of using EORTC QLQ-C30 and BR45 questionnaire responses to inform drug reimbursement decisions.
Decisions regarding the public reimbursement of drugs rely largely on the incremental cost-effectiveness ratio of an economic evaluation, the additional cost of a new treatment relative to the quality-adjusted life-years (QALYs) gained. QALYs are health utilities multiplied by time, a key measure of effectiveness in cost-utility analyses. One definition of utility is a preference-based measure of HRQoL, anchored at 0 and 1 (dead and full health). Utility instruments include valuations for health outcomes. The validity and outcomes of cost-utility analyses are sensitive to the utility value,3,4 particularly for preference-sensitive decisions, when evidence for alternative outcomes is conflicting or weak.5–7
Generic preference-based instruments such as the EuroQol-5-dimension (EQ-5D)8,9 are widely used yet may be less sensitive at detecting condition-specific changes in milder health states10,11 and also lack condition-specific content. To address this limitation, condition-specific preference-based instruments (CSPBIs) have been developed, commonly derived from existing psychometric instruments. 12
To create a CSPBI, a reduced health state classification system composed of representative items amenable to valuation is often a precursor step. 13 Reduced form instruments whether preference based or psychometric are also in high demand as they improve feasibility, 14 whether completed in clinical settings routinely or part of clinical trials. At least 3 preference-based instruments have been derived from the general cancer QLQ-C30,15,16 which are used to measure utilities in cost-utility analyses. However, these instruments are limited as they lack content from symptoms specific to patients with BrC reflected in the BR45 module. 2
A CSPBI is important in BrC because its treatments are preference sensitive. These include 1) the choice between mastectomy and breast-conserving surgery, which is related to how a woman values the preservation of the breast 17 ; 2) timing of reconstruction, where immediate reconstruction postmastectomy shows better psychological outcomes but at a higher complication rate18,19; and 3) the benefits of adjuvant therapy in modestly reducing recurrence rate needs to be weighed against hormonal and fertility effects of treatment. 20
There is also strong evidence that decision and cost-effectiveness analyses are often sensitive to utilities for health outcomes in BrC. Breast reconstruction decisions are sensitive to utilities for reconstruction outcomes, 21 and treatments for early and advanced disease are sensitive to utilities for health outcomes.22,23 As such, decisions about BrC treatments are preference sensitive. This applies across the spectrum of BrC and at individual and health care system levels.
We previously expanded the conceptual framework of Brazier et al. 13 to derive a preference-based instrument. Our framework has 4 phases of instrument development: I) develop initial questionnaire items, II) establish dimension structure, III) reduce items per dimension, and IV) value and model health state utilities. 24 Following this framework, Rasch analysis and other psychometric criteria are applied in phase III to reduce items per dimension after establishing a dimension structure.
Our overall objective is to develop the Breast Utility Instrument (BUI), a BrC-specific preference-based instrument. This will be the first BrC-specific preference-based instrument derived from the EORTC QLQ-C30 and BR45. This would allow responses from the EORTC QLQ-C30 and BR45 to be derived as a utility score in the future. In this study, we describe selecting 1 item representing each dimension using the same sample of our previous confirmatory factor analysis of the QLQ-C30 and BR45. 25
Methods
Overview
We evaluated the fit of item responses from the QLQ-C30 and BR45 to the Rasch model. For items that fit the Rasch model, we then selected 1 item per dimension based on patient- and clinician-rated item importance, 26 the range of item thresholds, and clinical relevance.
Parent Instruments
We started with patient responses to the QLQ-C30 version 3, 1 a 30-item general cancer HRQoL patient-reported instrument with functioning and symptom subscales, and global health items. The BR45 is a BrC-specific module updated 2 from the BR23 27 to reflect current treatments.
Data
A cross-sectional convenience sample of women 18 y and older with invasive BrC attending medical oncologists’ outpatient clinics at the Sunnybrook Odette Cancer Centre were enrolled to complete paper questionnaires of QLQ C30 and the BR45. Patients were stratified in 1 of 5 mutually exclusive health states: 1) first year after primary BrC diagnosis treated with curative intent (I), second to fifth year after primary BrC diagnosis (II–V), sixth year onward after primary BrC diagnosis (VI), metastatic BrC (M), and first year after recurrence of BrC or new BrC.28,29 Patients had no other primary cancer within the past 5 y and understood English or had a translator.
A random subset of patients with BrC (n = 81) and clinicians working with patients with BrC (n = 13) rated the importance of all items in EORTC QLQ-C30 and BR45 on a 5-point scale (0 = not applicable, 5 = very important; Appendix A). Clinicians were asked to rate the importance of items as relevant to patients’ experiences. Clinicians completed item importance ratings on a secure web form. No demographic characteristics of clinicians were collected to protect their anonymity.
The mean importance of items rated by patients and clinicians were converted to rankings (1 = most important) to improve interpretability.26,30
This study was approved by research ethics boards at the University Health Network, Sunnybrook Health Sciences Centre, and the University of Toronto. All participants provided informed consent.
Ten dimensions of the BUI
The 10 dimensions of the BUI derived from the QLQ-C30 and BR45 include functioning dimensions (physical and role, emotional, social, body image, sexual functioning and enjoyment) and symptom dimensions (fatigue, pain, systemic therapy side effects, arm and breast, endocrine therapy; Table 1). The response options were maintained from the original QLQ-C30 and BR45, scored 1 = not at all to 4 = very much. High scores for a functioning scale represent high functioning; high scores for a symptom scale represent a high level of symptoms or problems.2,27
Patients rated the importance of items on a 0 to 5 scale (0 = not applicable, 5 = very important).
Clinicians rated item importance on the same 0 to 5 scale as relevant to patients’ experiences.
Significant at
Analyses
We used a priori criteria to select items. We first conducted Rasch analysis as recommended by Tennant and Conaghan 31 and applied by developers of CSPBIs.15,16,32–62 Data that fit the Rasch model satisfy the condition for interval-level data. 31 In addition, item and person parameters are independent of the sample. 63 We used the partial credit model for multiple response options. 64
For each of the ten dimensions, we iteratively examined overall model and item fit as follows:
Ordering of response thresholds. We examined item characteristic curves and item threshold maps to visualize the difficulty and hierarchy of items. Item response categories were examined to assess if they produced sequentially ordered item thresholds (i.e., the point at which there is a 50% probability of the higher or lower response). Disordered response categories were collapsed and recoded. These items were retained and considered for selection.
Rasch model fit. We evaluated the χ2 goodness-of-fit statistic with 100 simulated values and Bonferroni-adjusted significance levels. The critical P value was 0.05/(number of items per dimension). Given the relatively large sample size, the χ2 statistic can overestimate lack of model fit 65 ; therefore, we also considered the person separation index (PSI), where >0.7 was considered acceptable fit. 31
Item fit statistics. We examined item residuals and infit and outfit statistics, where infit is the weighted mean of the standardized squared residuals and outfit is the mean of the standardized squared residuals. 66 Misfitted items had large residuals >2.5 or <−2.5 or infit/outfit statistics beyond acceptable values, that is, <0.7 or >1.3. 31
Differential item functioning (DIF). Item bias or DIF occurs when individuals with the same level of HRQoL systematically respond differently based on specific characteristics. 31 DIF dependent on age (≤ 50 y and >50 y) was evaluated by specifying logistic regression equations to predict item responses from person parameters and the age group variable. Proportional odds models were compared using likelihood ratios. 67
Unidimensionality. We performed principal components analysis (PCA) of the item residuals, expecting that no meaningful factors would remain in the residuals after fitting the data to the Rasch model. The t-statistics of the person scores of positive (>0.30) and negative (<−0.30) factor loadings were also compared to determine the percentage significant at the 0.05 level. Less than 5% of t-statistics >1.96 or <−1.96 indicate strict unidimensionality.31,68
Items considered for deletion (misfitted items, DIF) were iteratively removed one at a time, and overall model fit and item fit were reevaluated. We retained both items for pain and social functioning since 2 items is the minimum number for a dimension. 69
We used R version 4.1.0 for the analysis, 70 with packages psych, 71 eRm, 72 ltm, 73 and lordif. 67
Item selection criteria included the following:
range of item thresholds, where a wider range was a better representation of construct severity;
item goodness of fit, particularly infit 0.7 to 1.3;
patient-rated item importance and clinician-rated item importance (items ranked most important by patients were prioritized over clinician rankings); and
psychometric criteria: Absence of floor and ceiling effects. Floor effects occur when a large proportion of respondents select the worst possible score (e.g., “very much” difficulty in performing physical activity). Ceiling effects occur when a large proportion of respondents select the best possible score (e.g., “not at all” for pain), and Correlation of item to dimension (>0.70).
Items meeting the a priori criteria were reviewed by the multidisciplinary research team with expertise in patient outcome measurement, biostatistics, health economics, general internal medicine, and breast medical oncology. The clinical expert (MT) provided input on clinical relevance, particularly when there was no clear single item to represent a dimension.
Sample size
For the Rasch analysis, the sample size of more than 400 participants allows for very stable item calibration, where the width of the 99% confidence interval is smaller than 1 logit. 74 For patient and clinician item importance, we aimed to recruit 50 patients and 10 experts30,75,76 to provide a spectrum of importance ratings.
Results
Sample Characteristics
Table 2 outlines the details of all patients (N = 408) and the subset of patients who completed item importance (n = 81). Overall, they represent women with BrC who were well educated and predominantly on adjuvant endocrine therapy. The mean (s) age was 59.1 (11.6) y, with 80% completing at least college education. Most patients were diagnosed in pathological stage 1A (37.0%) or IIA (25.5%), which is comparable to the incidence of BrC stages in Ontario, Canada,77,78 with 64.2% on a form of adjuvant therapy and 57% on endocrine therapy. Most patients were in their second to fifth year postdiagnosis (31.1%). A higher proportion of patients in the sample had metastatic disease than the patients in a development study of the BrC health states (25.2% v. 19.4%). 28 Appendix B lists the specific therapies that patients were taking.
Participant Characteristics and Comparator Population-Level Characteristics
Population comparators were mostly from women with breast cancer, except that marital status and highest level of education comparators were drawn from the 2016 Canadian Census.
From 2016 Canadian census data: never married.
Referral from another center. Date and month was approximate.
Mutually exclusive health states: 1) first year after primary BrC diagnosis treated with curative intent (I), second to fifth year after primary BrC diagnosis (II–V), sixth year onward (VI), metastatic diseases (M), recurrence of BrC (R).
Five breast tumors were HER-2 equivocal.
Four hundred five people had a combined 522 surgeries. Three people did not receive surgery.
One hundred fifty postmenopausal and 28 treatment-related menopause, for a total of 178.
One hundred four breast-conserving surgeries and 13 oncoplastic breast-conserving surgeries, for a total of 117.
Forty-nine simple mastectomies and 47 mastectomies and reconstruction surgeries, for a total of 96.
The subset of patients (n = 81) who rated item importance were of comparable age, biomarker status, and comorbidity status as all participants (Table 2). The item-importance sample consisted of a smaller percentage with a graduate or professional degree than the full sample (29.6% v. 37.3%), fewer in the metastatic health state (16.0% v. 25.2%), and a larger percentage diagnosed with BrC from 5 to 9 y (34.6% v. 22.3%). Appendix C shows the mean item importance and rank importance for patients and clinicians.
The 13 clinicians who completed importance ratings were 5 medical oncologists, 1 radiation oncologist, 1 surgical oncologist, 2 medical oncology fellows, 2 nurses, 1 physician assistant, and 1 social worker, predominantly representing the medical oncology clinical staff.
Patient and Clinician Item Importance ratings
Overall, within each dimension, patients and clinicians had comparable item importance ratings based on Welch t tests. When ratings differed (e.g., pain dimension), clinicians rated item importance as significantly higher than patients did (Table 1). The highest rated items for patients were on sexual functioning and enjoyment and hair loss. For clinicians, the highest rated items were needing help with eating, dressing, etc.; feeling depressed; and pain interfering with daily activities. Across all items, 30% of patients rated items as 0 (not applicable), 1 (slight), or 2 (mild); therefore, these scores were removed prior to calculating the mean item importance of responses rated 3 (moderate), 4 (strong), and 5 (severe) of all 81 patients to place more emphasis on items with greater impairment on HRQoL. 30
Overview Fit to Rasch Model
Appendix D and Table 3 show the initial and final Rasch models and fit statistics for each dimension, respectively. Appendix E shows the person-item map and histogram of person location estimates by dimension. Appendix F shows the item characteristic curves of items in each dimension. Overall, the global Rasch model fit was good in 7 and borderline in 3 dimensions. Person separation reliability was acceptable in 4 dimensions after item removal and response-level collapsing. DIF by age (≤ 50, >50) was present in 8 items, described within their dimensions below and in Appendix G. The magnitude of DIF items was all negligible, with McFadden’s R2 < 0.13. 80
Summary of Final Fit of the Data to the Rasch Model a
Global model fit was borderline in 3 dimensions.
PSI <0.70 in 6 dimensions.
More than 5% of t statistics were significant in the final ET model.
Item Selection for Each Dimension
Table 4 summarizes the results of mean item importance rankings and psychometric and Rasch criteria used to select 1 item per dimension. Subsequently, we describe the results for each dimension.
Summary of Item Importance, Psychometric Criteria, and Rasch Criteria from Final Models to Select 1 Item per Dimension
X is item with differential item functioning.
tied rank importance
Physical and role functioning dimension
The initial 8-item model had good global fit to the Rasch model (χ2P = 0.04) and reliability (PSI = 0.811). All items had small mean standardized fit residuals (−0.019) and good infit. PF3 (Trouble taking a short walk) had low outfit (0.497) and ceiling effects (84.8%). PF5 (Need help eating, dressing etc.) had low outfit (0.275) and ceiling effects (96.6%). Two items had uniform DIF by age: PF2 (Trouble taking a long walk) and PF7 (Limited in pursuing hobbies), although the magnitude of DIF was negligible in both items (likelihood ratio χ2P = 0.007, McFadden’s pseudo R2 = 0.006−0.020). Removal of PF7 (Hobbies) improved the global model fit (χ2P = 0.29) without degrading reliability significantly (new PSI = 0.776). PF2 (Long walk) had negligible DIF by age, with a significant likelihood ratio χ2 P <0.001 comparing proportional odds models with and without the age parameter. For significantly different response categories, McFadden’s pseudo R2 was small (0.006–0.020).
PF2 (Long walk) was chosen to represent this dimension as it was rated most important by patients and tied fourth by clinicians, had the highest item-to-dimension correlation (0.779), had a moderately wide threshold range (3.855), and is clinically relevant based on clinician input.
Emotional functioning
The initial 5-item model from Rasch analysis had good global model fit (χ2P = 0.01) and reliability (PSI = 0.774). All items had small mean standardized residuals (−0.006) and good infit and outfit. Two items had negligible DIF by age. EF24 (Depressed) had nonuniform DIF (likelihoodratio χ2P = 0.006, McFadden’s pseudo R2 = 0.004−0.013), and ET55 (Mood swings) had uniform DIF (likelihood ratio χ2P < 0.001, McFadden’s pseudo R2 = 0.004 −0.018). Each item was iteratively removed and the Rasch model refitted. We chose the model without ET55 (Mood swings) because removing it had no effect on global model fit (P = 0.01) and a slight reduction in reliability (new PSI = 0.736). In contrast, removing EF24 (Depressed) further reduced model reliability (PSI = 0.719). All items had a high item-to-dimension correlation (>0.74), without floor or ceiling effects.
We chose EF22 (Worry) to represent emotional functioning because it was rated tied third most important by patients, second most important by clinicians, had the highest threshold range (6.705), and is clinically relevant.
Social functioning
The initial 2-item model had borderline global model fit (χ2P = 0.01) and reliability (PSI = 0.603). Mean standardized item fit residuals were negligible (0.00). Both items SF26 (Condition or treatment interfered with family life) and SF27 (Condition or treatment interfered with social activities) had low infit (both 0.499) and outfit (0.478, 0.466, respectively) contributing to low global model fit. However, both items had a high item-to-dimension correlation (>0.86), without floor or ceiling effects. We chose SF27 (Interfering with social activities) because it was rated most important by patients, second most important by clinicians, had the higher item threshold range (7.433), and is clinically relevant.
Fatigue
The initial 3-item dimension had good global model fit (χ2P = 0.02) and suboptimal reliability (PSI = 0.526). Mean standardized item fit residuals were low (−0.015). Two items, FA12 (Felt weak) and FA18 (Tired), had low infit (0.524 and 0.563, respectively). Item ET56 (Dizzy) had ceiling effects (70.4%) and a low item-to-dimension correlation (0.497); therefore, this item was not chosen. We chose FA18 (Tired) to represent the Fatigue dimension because it was rated second most important by patients and clinicians alike, had the highest item threshold range (7.015), and is clinically relevant.
Pain
The initial 2-item dimension had suboptimal global model fit (PSI = 0.551) and reliability (χ2P = 0.01). Mean standardized item fit residuals were low (−0.010), although infit and outfit of both items PA9 (Had pain) and PA19 (Pain interfered with daily activities) were suboptimal, with infit 0.688 and 1.55, respectively; outfit 0.523 and 0.495, respectively. Both items had a high item-to-dimension correlation (0.792), and neither item had floor or ceiling effects. We chose PA19 (Had pain) because it was rated most important by patients and second most important by clinicians, had the largest item threshold range (7.571), and is clinically relevant.
Body image
This initial 4-item dimension had good global model fit (χ2P = 0.01) and reliability (PSI = 0.767). All items had small mean standardized item fit residuals (−0.007) and high item-to-dimensional correlation (>0.81). All except for 1 item had good infit and outfit: BI40 (Less feminine) had borderline infit (0.691) and outfit (0.662). BI40 (Less feminine) also had uniform DIF by age, although the magnitude was negligible (likelihood ratio χ2P <0.001, McFadden’s pseudo R2 = 0.0049−0.020). BI41 (Problems looking at yourself naked) had ceiling effects (50.9%). We chose BI42 (Dissatisfied with your body) because it was rated tied most important by patients and third most important by clinicians, had the largest item threshold range (5.218), and is clinically relevant.
Sexual functioning and enjoyment
This initial 3-item dimension had borderline global model fit (χ2P = 0.01) and suboptimal reliability (PSI = 0.543). Mean standardized item fit residuals were low (0.001), although SE 46 (Has sex been enjoyable?) had borderline low infit (0.628) and outfit (0.632). All items correlated highly with the dimension (>0.74), and none had floor or ceiling effects. We chose SX 44 (Interest in sex) because it was ranked most important by patients and clinicians and had the second largest range of item thresholds (5.016). This item is the most inclusive and clinically relevant in this dimension, since interest in sex does not assume sexual intercourse.
Systemic therapy side effects
This 6-item dimension had good global model fit (χ2P = 0.01) but poor reliability (PSI = 0.355). Reliability improved slightly, although it remained poor (new PSI = 0.392) after recoding response options of 4 items: SYS32 (Food and drink tasted different), SYS34 (Lost hair), SYS36 (Felt ill or unwell), and SYS38 (Headaches). Two dichotomous items had ceiling effects: SYS32 (Food and drink; 90.4%) and SYS34 (Lost hair; 80.3%). Mean item standardized fit residuals were low (−0.012), and item fit was good, except that SYS32 (Food and drink) had low outfit (0.565). Three items had DIF of low magnitude: SYS31 (Dry mouth) had nonuniform DIF (likelihood ratio χ2P < 0.001, McFadden’s pseudo R2 = 0.008–0.018), and both SYS33 Eyes painful, irritated, or watery and SYS38 (Headaches) had uniform DIF (likelihood ratio χ2P <0.001 and P <0.001, respectively; McFadden’s pseudo R2 = 0.003−0.025 and 0.011−0.029, respectively). All items had overall low item-to-dimension correlations (0.336–0.621). For this dimension, we chose SYS34 (Lost hair), which was recoded as dichotomous based on overlapping response options, with a point threshold at 0.499. SYS34 (Lost hair) was ranked highest among patients, ranked second among clinicians, and is clinically relevant.
Arm and breast symptoms
This 7-item dimension had good global model fit (χ2P = 0.06), and its reliability was borderline (PSI = 0.631). Item fit had low mean standardized fit residuals (−0.003) and good infit and outfit, except that ARM49 (Problems raising your arm or moving it sideways) had borderline outfit (0.677). Four items had ceiling effects: ARM48 (Swollen arm or hand; 69.3%), ARM49 (Problems raising arm; 62.4%), BR51 (Area of your affected breast was swollen; 74.1%), and BR53 (Skin problems on affected breast; 68.4%). After recoding ordered, overlapping response options of ARM48 (Swollen arm or hand), BR50 (Pain of your affected breast), BR51 (Affected breast oversensitive), BR53 (Skin problems on the effected breast), discrimination worsened (new PSI = 0.594); therefore, we kept the original response options. We chose BR52 (Affected breast oversensitive) because it had the second highest ranked importance by patients, was ranked third by clinicians, had an acceptable item-to-dimension correlation (0.68), had the second largest threshold range (3.078), and is clinically relevant.
Endocrine therapy symptoms
This 9-item dimension had initial good global fit (χ2P = 0.05) and reliability (PSI = 0.760). Mean item standardized fit residuals was low (0.30). Three items were iteratively removed because they had high infit (>1.3): ET54 (Sweated excessively), ET68 (Gained weight), and ET69 (Weight gain a problem). This resulted in improved reliability (PSI = 0.784). SYS37 (Hot flushes), assigned to endocrine therapy from the confirmatory factor analysis, 25 was retained because it was rated most important by patients despite its high infit (1.81) and outfit (3.12) and uniform DIF (likelihood ratio χ2P = 0–0.419, McFadden’s pseudo R2 = 0.001–0.023). Three items with low infit and outfit (<0.70) were retained ET63 (Problems with your joints), ET64 (Stiffness in your joints), ET65 (Pain in your joints) since iterative removal lowered model reliability (PSI < 0.7). No items had floor or ceiling effects. Although a PCA of item residuals revealed no factors, 15.5% of t statistics were significant. We chose ET63 (Problems with your joints) because it had the second highest range of item thresholds (4.533), with a high item-to-dimension correlation (0.826), was ranked third most important by patients and clinicians, and is clinically relevant.
Table 4 presents a summary of the item selection criteria for each dimension. Table 5 presents the item selected to represent each dimension.
Breast Utility Instrument Items Selected from the EORTC QLQ-C30 and BR45 Dimensions Using a Priori Criteria
Discussion
The contribution of this study is the identification of 1 representative item per dimension for the future BUI. The BUI will be a novel BrC-specific preference-based instrument derived from the EORTC QLQ-C30 and BR45. The representative items were chosen based on a priori psychometric and Rasch criteria and investigator judgment. We chose the following 10 items: 1) Physical functioning (Trouble taking a long walk), 2) Emotional functioning (Worry), 3) Social functioning (Interfering with social activities), 4) Pain (Having pain), 5) Fatigue (Tired), 6) Body image (Dissatisfied with your body), 7) Systemic therapy side effects (Hair loss), 8) Sexual functioning (Interest in sex), 9) Breast symptoms (Oversensitive breast), and 10) Endocrine therapy symptoms (Problems with your joints).
Comparing our selected items with the general cancer preference-based instruments EORTC-8D 15 and QLU-C10D, 16 4 items were the same and 6 were different. Four items that we selected were included in the QLU-C10D: PF2 (Long walk), SF27 (Interfered with your social activities), PA9 (Pain), and FA18 (Tired). The EORTC-8D selected PA19 (Pain interfered with daily activities). Both QLU-C10D and EORTC-8D selected EF24 (Depressed), while we chose EF22 (Worry) for the emotional functioning dimension. These generic cancer preference-based instruments covered 8 and 10 dimensions, respectively, so their items spanned a wider range of cancer-specific symptoms, including nausea, and constipation or diarrhea. The QLU-C10D included an item for trouble sleeping. Unlike previous studies, this study selected representative items from BrC-specific dimensions from the BrC-specific QLQ-BR45.
Our approaches differed based on our derivation samples and study design. The EORTC-8D was derived from patients with multiple myeloma, 15 and the QLU-C10D was derived from diverse cancer sites from European countries. 16 Both instruments included item responsiveness from longitudinal clinical trial data as item selection criteria, which was inherently not possible with our cross-sectional sample.
While some researchers prefer generic preference-based instruments over CSPBI, there is a compelling reason for a CSPBI in BrC. CSPBIs can arguably create classification systems that are difficult to compare because of excluded side effects, focusing effects, and naming effects, and comorbidities may influence valuation of the health state condition. 13 The BUI, a future CSPBI derived from the EORTC QLQ-C30 and BR45, however, can potentially better reflect preference-sensitive outcomes affecting body image, hair loss, loss of interest in sex, breast sensitivity, or problems with the joints, content not directly captured in generic preference-based instruments.
Our procedure for selecting items for the BUI has several strengths. We applied clinimetrics, an approach from clinical medicine, 81 by drawing on the judgment of patients and clinicians using their item importance ratings as criteria for item selection. We also applied psychometrics, mathematical techniques that originated in psychology and education, 82 with decades of applications in health instrument development. After fitting the Rasch model, we considered the range of item thresholds across the latent construct, thus improving precision. 83 These approaches were also adopted by developers of other CSPBIs: FACT-8D for general cancer, 84 CORE-6D for mental health, 85 CP-6D for cerebral palsy, 86 and CARIES-QC for dental caries. 33 Our approach could also be applied to the development of other generic preference-based instruments.
There are several limitations to this study. All participants attended medical oncology clinics from 1 tertiary cancer center, limiting generalizability. Future validation samples should accrue patients undergoing a wider range of treatments, across all 5 health states, and a variety of locations. There are limitations to the item importance approach, which relies on the judgment of a subset of patients and clinicians with respect to the importance associated with each item, either from their individual perspective (patients) or according to patients (clinicians). The sequence of applying the item importance approach could have influenced the final section of items. Patients and clinicians were asked to complete item importance ratings prior to our confirmatory factor analysis, 25 and the ranking of mean item importance ratings were considered in our final selection of each item per dimension.
Conclusions
The proposed 10 items for the BUI are parsimonious dimensions of a novel BrC-specific preference-based instrument. Reducing QLQ-C30 and BR45 items to 10 items will greatly reduce respondent burden during clinical practice, clinical trials, and future studies that administer multiple concurrent questionnaires. The 10 items of the BUI will be amenable to valuation in future studies towards developing the BUI.
Future development of the BUI will include assessing measurement properties (reliability, criterion and construct validity, and responsiveness) prior to eliciting direct utility weights (e.g., time trade off) from patients and community members and modeling utilities of health states as a function of health state dimensions (attributes). 87 The future BUI potentially will better allow women’s preferences to be reflected in clinical decision making and cost utility analyses.
Supplemental Material
sj-docx-1-mpp-10.1177_23814683221142267 – Supplemental material for Developing the Breast Utility Instrument to Measure Health-Related Quality-of-Life Preferences in Patients with Breast Cancer: Selecting the Item for Each Dimension
Supplemental material, sj-docx-1-mpp-10.1177_23814683221142267 for Developing the Breast Utility Instrument to Measure Health-Related Quality-of-Life Preferences in Patients with Breast Cancer: Selecting the Item for Each Dimension by Teresa C. O. Tsui, Maureen E. Trudeau, Nicholas Mitsakakis, Murray D. Krahn and Aileen M. Davis in MDM Policy & Practice
Footnotes
Acknowledgements
We are grateful for the contributions of the following individuals who made this research possible: Dr. Sofia Torres laid the groundwork for developing a breast cancer preference-based instrument. Dr. Andrea Eisen, Dr. Kataryna Jerzak, Dr. Rosanna Pezo, Dr. Ellen Warner, and Dr. Sonal Ghandi kindly allowed us to recruit their patients during their busy clinics. Dr. Donilo Giffoni, Elizabeth Matheson, Kim Nguyen, Dr. Neda Stepanovic, Lisa Verity, and the M6 nursing staff took time out of their busy schedules to help create opportunities for patient interactions. Doyoung (Kelly) Kim assisted with patient recruitment and data entry. Arcturus Phoon assisted with chart abstraction. We also thank the Biomatrix team, Nim Li, Cordelia He, and Dr. Martin Yaffe, for ensuring that our data was well captured and secure.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided in part by Dr. Kathleen Pritchard, CM, MD, FRCPC, the Toronto Health Economics and Technology Assessment Collaborative (THETA) Fund for Excellence (No. 5790 6839 0706), Toronto General Hospital Research Institute, and support from the Sunnybrook Research Institute. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report. AMD and MDK are joint senior authors.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
