Assessing the quality of health-related quality of life measures

Abstract

In the current issue of Cephalalgia, Manhalter and colleagues present their validation of a new headache-specific health-related quality of life (HRQoL) instrument, the Comprehensive Headache-related Quality of life Questionnaire (CHQQ) (1). Primary headache disorders can have a significant impact on all aspects of life, including daily functioning, productivity and HRQoL (2,3). There are a range of commonly used and psychometrically valid instruments developed to measure HRQoL in headache clinical care and research (4). Each of these instruments has strengths and limitations (4). Manhalter et al. reviewed existing generic and migraine-specific HRQoL instruments and concluded that there was the need for an instrument validated for both migraine and tension-type headache that better captured the reported experience of patients.

As comparative effectiveness research increases in importance, patient-reported outcomes, including measures of HRQoL, will play an expanding role in assessing the burden of illness and measuring the benefits of treatment (5,6). This is particularly true for conditions that influence quality of life more than quantity of life (i.e. mortality), including the primary headache disorders. Cost effectiveness metrics often place HRQoL measures in the denominator and cost in the numerator to compute the cost per unit improvement in health outcomes. These economic indices will increasingly inform policy decisions regarding regulatory approval and reimbursement for therapeutics (7).

Manhalter et al. point out that there are two broad categories of HRQoL measures: generic and disease specific. Generic measures are designed to assess the impact of a wide range of medical conditions while disease-specific measures capture the burdens of particular conditions. Generic measures facilitate the comparisons across disease categories by providing a common yardstick. They can be used to contrast the burden of a variety of disorders and to determine which treatments, across therapeutic areas, deliver the most HRQoL bang for the buck. Using these measures may help stakeholders, including healthcare professionals (HCPs), third-party payers, and funders of research to assess the relative benefits of investing in migraine care as opposed to other chronic illnesses (e.g. asthma or multiple sclerosis). Disease-specific instruments more accurately reflect the burden of specific conditions. They can be used to compare the benefits of different treatments for a condition but not to compare migraine with other disorders.

In the present report, Manhalter et al. engaged in an orderly process of instrument development to create a novel headache-specific HRQoL instrument. Below, we review commonly employed procedures for developing HRQoL measures, highlighting the work done by Manhalter and colleagues in the development of the CHQQ.

Begin with the end in mind. The optimal tool for any job is determined by the nature and the goals of the task at hand. For condition-specific HRQoL measures in headache, potential goals include measuring the overall burden of headache, assessing the effects, disease characteristics or comorbidities on HRQoL, characterizing the natural history of HRQoL changes or measuring the benefits of therapy. Depending on the goal, developers make choices regarding the targeted psychometric properties, the domains of measurement, the nature of the items both in terms of content and question structure, the recall period, the length of the scale (number of items overall and for each domain) and the external validators. Manhalter and colleagues (1) aimed to develop an instrument for broad use in primary headache and pilot tested the CHQQ on 168 outpatients with episodic migraine and 34 with tension-type headache at the headache center of a neurology department.

Develop an item pool or list of candidate questions for possible inclusion in the measure. Items should select or create a sample from the domains of interest. Manhalter and colleagues developed their item pool by conducting a literature review and interviewing clinicians and patients. An item was considered for inclusion if at least two clinicians and two patients from each headache subgroup considered the item important. Eleven individuals with migraine were asked to complete the resulting questionnaire and were interviewed about their perceptions.

Assess item properties. Data collected on the item pool are used to define the relative performance of items and to identify subscales. In this paper Manhalter and colleagues report long-used assessments of item and scale properties based on classical test theory measures of internal consistency (like Cronbach’s alpha). Item-response theory (IRT) models (8) provide a powerful alternative to these traditional assessments. While Cronbach’s alpha assumes that items are replicate measures of an underlying construct and normally distributed, IRT recognizes that some items are stronger than others at different points along an underlying scale. A whole host of software exists for the implementation of IRT models. We note with pleasure that in the development of a new headache-specific HRQoL measure, this group used IRT methods to assess item and scale properties (9). We would have liked very much for this information to have been presented in the manuscript, as the abstract referenced by the authors does not convey sufficient information on the results of their IRT analysis of the scale.

Engage in an item-reduction process through pilot testing (administering the instrument to focus groups and/or patients) and assessing measures of item quality as obtained from IRT models. Often used assessments evaluate model paramter strength, preliminary plots of item characteristic curves as well as item and scale information curves (10). Items exhibiting poor properties are eligible for deletion. Accurate assessment of the impact of item removal may necessitate refitting IRT models to the reduced item pool to determine the impact of item removal. Items that sample each target domain should be retained. Manhalter and colleagues administered the resulting 25-item questionnaire to a group of 117 individuals with migraine, removing two items.

Conduct validation studies of the revised instrument in the population of intended use. Depending on objectives, target populations for HRQoL research in headache could include clinical trial participants, patients from specialty-care settings, primary-care settings or individuals in the general population. In addition, within each target population, the sociodemographic characteristics of the sample may influence item properties. For example, items related to menstrual migraine burden matter more in women of child-bearing potential. Manhalter and colleagues derived their validation sample from a subspecialty clinic in Hungary. Results are likely applicable to headache clinics but the generalizability to primary-care settings, to the population and in languages other than Hungarian remains to be assessed.

Assess test-retest reliability and optimize recall interval. Test-retest reliability for an instrument is determined by administering the instrument on two separate occasions and examining the agreement across administrations (correlation). Test-retest reliability is influenced by at least two distinct issues of time: the recall interval of the questions (e.g. “Over the last four weeks, how much have headaches interfered with….”) and the time between test administrations (test-retest interval). The four-week recall interval is an attribute of the scale itself. Many of the currently existing migraine-specific instruments have a four-week recall interval but some use a three-month interval (e.g. the Migraine Disability Assessment Scale (MIDAS)). The CHQQ uses a two-week recall period. Short recall intervals provide more accurate recall for the period of measurement but are prone to temporal sampling error. A two-week period may not be long enough to capture a representative sample of headache experience. In a person with low-frequency episodic migraine with two attacks per month on average, any two-week interval could include no headaches at all or many headaches, by chance alone. Estimates of HRQoL are likely to differ from one sampling period to the next. In a woman with pure menstrual migraine, measures of HRQoL might be influenced by the timing of the assessment relative to menses. Longer recall intervals are likely to capture a more representative sample of a patient’s headache experience but are prone to errors in recall. In addition to the recall interval, a second temporal factor is the test-retest interval (the length of time between test administrations for determination of test-retest reliability). This is an attribute, not of the test, but of the design of the reliability study. If the test-retest interval is short, agreement may be inflated if respondents remember and report their previous answers. If it is too long, agreement may be under-estimated because of real change. Measuring test-retest reliability is important because high reliability is required for measures used to assess change over time. Herein, data on test-retest reliability are not reported. We are concerned that the two-week recall interval may lead to unreliability in settings where headache attack frequency is modest.

Assess validity. If there is a gold standard, the validity of a test is defined by its agreement with the gold standard. For HRQoL measures there is not an obvious gold standard. Many forms of validity have been defined, including face validity, predictive validity, and criterion validity. Assessing the validity often consists of demonstrating the instrument behaves as expected under the assumption that it measures what it is hypothesized to measure. For example, Manhalter et al. show that the HSQQ scores show the expected correlation with subscale scores on the Short Form 36 Health Survey questionnaire (SF-36), a widely used generic measure of HRQoL. They show that, as expected, HSQQ scores are lower in migraine than tension-type headache, reflecting greater reductions in HRQoL. Finally, they show that HSQQ items show the expected correlations with headache characteristics such as attack frequency and duration.

The authors are to be commended for providing an important new headache-specific measure of HRQoL for individuals with headache including migraine and tension-type headache. The item selection and reduction process and the validity studies were well done and the instrument performs well among outpatients in Hungarian headache centers. Next steps include assessing the test-retest reliability of the instrument, perhaps using both two- and four-week recall intervals. In addition, the generalizability of these findings to primary care and population-based samples and in languages other than Hungarian should be assessed.

Footnotes

Conflict of interest

Dawn C. Buse, PhD, has received honoraria and/or research funding from Allergan Pharmaceuticals, Merck, Inc., MAP Pharmaceuticals, NuPathe, and Novartis.

Daniel Serrano, PhD has received grant support from National Headache Foundation via funding from Allergan Pharmaceuticals, GlaxoSmithKline, ENDO Pharmaceuticals, MAP Pharmaceuticals, OrthoMcNeil, and Merck, Inc. for data collection and/or analysis of the AMPP data set. Daniel Serrano has received Investigator Initiated grant funding from GlaxoSmithKline, Colucid Pharmaceuticals, and Novartis.

Richard B. Lipton, MD, has received research support from the NIH [PO1 AG03949 (Program Director), PO1AG027734 (Project Leader), RO1AG025119 (Investigator), RO1AG022374-06A2 (Investigator), RO1AG034119 (Investigator), RO1AG12101 (Investigator), K23AG030857 (Mentor), K23NS05140901A1 (Mentor), and K23NS47256 (Mentor)], the National Headache Foundation, and the Migraine Research Fund; serves on the editorial boards of Neurology; has reviewed for the NIA and NINDS; holds stock options in eNeura Therapeutics (a company without commercial products in the US); serves as consultant, advisory board member, or has received honoraria from: Allergan, American Headache Society, Autonomic Technologies, Boston Scientific, Bristol Myers Squibb, Cognimed, Colucid, Eli Lilly, Endo, eNeura Therapeutics, GlaxoSmithKline, MAP, Merck, Nautilus Neuroscience, Novartis, NuPathe, and Pfizer. Dawn C. Buse, PhD, has received honoraria and/or research funding from Allergan Pharmaceuticals, Merck, Inc., MAP Pharmaceuticals, NuPathe, and Novartis.

References

Manhalter

Bozsik

Palásti

. The validation of a new comprehensive headache-specific quality of life questionnaire. Cephalagia 2012; 32: 668–682.

Leonardi

Steiner

Scher

. The global burden of migraine: Measuring disability in headache disorders with WHO’s Classification of Functioning, Disability and Health (ICF). J Headache Pain 2005; 6: 429–440.

Buse

Rupnow

MFT

Lipton

. Assessing and managing all aspects of migraine: Migraine attacks, migraine-related functional impairment, common comorbidities, and quality of life. Mayo Clin Proc 2009; 84: 422–435.

Buse

Sollars

Steiner

. Why HURT? A review of clinical instruments for headache management. Curr Pain Headache Rep 2012; 16: 237–254.

Fayers

Machin

. Quality of life West Sussex, UK: Wiley, 2007.

Mann

Gilbody

Richards

. Putting the Q in depression QALYs: A comparison of utility of measurement using EQ-5D and SF-6D health related quality of life measures. Soc Psychiatry Psychiatr Epidemiol 2009; 44: 569–578.

Weatherly

Drummond

Claxton

. Methods for assessing the cost-effectiveness of public health interventions: Key challenges and recommendations. Health Policy 2009; 93: 85–92.

Birnbaum

. Some latent trait models and their use in inferring an examinee’s ability. In: Lord

Novick

(eds) Statistical theories of mental test scores, Reading, MA: Addison-Wesley, 1968 295–479.

Ertsey

Palasti

Bozsik

. Item response modeling in the development of a new headache-specific quality of life instrument. Cephalalgia 2007; 27: 1193–1193.

10.

Thissen

Wainer

. Test scoring, Mahwah, NJ: Lawrence Erlbaum and Associates, 2001.