Abstract
We developed a 103-item self-reporting questionnaire to assess the burden of primary headache disorders on those affected by them, including headache characteristics, associated disability, co-morbidities, disease-management and quality of life. We validated the questionnaire in five languages with 426 participants (131 in UK, 60 in Italy, 107 in Spain, 83 in Germany/Austria, and 45 in France). After a linguistic and a face-content validation, we tested the questionnaire for comprehensibility, internal consistency and test–retest reliability at an interval of one month. In the different countries, response rates were between 73% and 100%. Test–retest reliability varied between –0.27 to 1.0 depending of the nature of the expected agreement. The internal consistency was between 0.69 and 0.91. The EUROLIGHT questionnaire is suitable for evaluating the burden of primary headache disorders, and can be used in English, German, French, Italian and Spanish.
Introduction
Headache disorders, including migraine, are common and disabling (1) but under-recognised and under-treated (2,3). Consequently, they impose a substantial population burden of ill-health. It is well documented that migraine impairs work and social activities (4,5). The World Health Report 2001 (3) ranks migraine twelfth in women and nineteenth overall amongst all causes of disability in the world. Less is known about other primary headache disorders, but tension-type headache (TTH), being more prevalent, may impose an even higher population disability burden than migraine (6). Yet this is poorly acknowledged, along with the physical and emotional impact of headache on those directly affected, their carers, family and colleagues, and the socio-economic burden of headache. For example, fewer than half of people with migraine are correctly diagnosed, a prerequisite for receiving adequate treatment (7–11). In comparison with other, less prevalent, neurological disorders, headache attracts little attention and is generally accorded low priority (10,12–14).
The EUROLIGHT project (<www.eurolight-online.eu>) is an initiative supported by the EC Public Health Executive Agency and a partnership activity within Lifting The Burden: The Global Campaign to Reduce the Burden of Headache Worldwide. One of its main objectives is to gather up-to-date and reliable knowledge of the prevalence and impact of migraine, TTH and chronic daily headache across Europe. There is no validated instrument for collecting the data that will achieve this. Therefore, the EUROLIGHT questionnaire has been developed.
This instrument is based largely on the BURMIG questionnaire, and has additions from instruments developed by Lifting The Burden (15). The BURMIG questionnaire was developed in 2004 for a population-based survey of the burden of migraine in the Grand Duchy of Luxembourg. It incorporated previously validated tools for diagnosis, disability assessment and recognition of depression, and added questions on disease management and impact on quality of life (16). It proved to be consistent and reliable for the Luxembourg population. In order to develop the EUROLIGHT questionnaire for use in different European countries, and also to encompass other headache disorders, the BURMIG questionnaire was revised. We integrated sections to assess disability burden, measure general and disease-specific quality of life (QoL), detect anxiety and depression, and enquire into disease management.
The aim of the present study was to assess the test–retest reliability and validity of the EUROLIGHT questionnaire for use throughout Europe. A pilot validation study in the UK was followed by a multicountry study in France, Luxembourg, Germany, Austria, Italy and Spain.
Materials and methods
Questionnaire development
The content of the BURMIG questionnaire was reviewed and thoroughly revised by the steering committee of the EUROLIGHT project. Priority areas for revision had been defined in a pilot study (16), with support from several patient organisations (Migraine Action Association UK, Switzerland and Luxembourg), international headache experts (see Acknowledgements) and the Luxembourg Ministry of Health. The additional or amended items were incorporated into the EUROLIGHT questionnaire after a full literature review of studies on headache burden (17).
The final EUROLIGHT questionnaire (see Appendix) contains 103 items, 7% of which are open questions, 15% numerical questions (i.e. requesting a number for the answer) and 78% categorical (requesting the respondent to place a tick in a box). The first section is biographical (age, gender, language and employment). Next are screening questions for headache (life-time and 1-year prevalence), followed by a section on chronic daily headache. The following questions diagnose the headache that the patient considers to be the most bothersome (if more than one headache type is identified). This approach recognised the virtual impossibility of accurately diagnosing, by self-administered questionnaire, more than one headache type in the same individual. The diagnostic questions, for migraine and TTH, were based on the criteria of the International Classification of Headache Disorders, 2nd edition (ICHD-II) (18). Further questions relate to age at onset and frequency of headache during the previous 3 months. This section is followed by questions about headache yesterday (point prevalence), and then by sections on the use of healthcare resources (medicines, investigations, consultations, etc.) and the impact of headache on work, family life and social activities (including the Headache-Attributed Lost Time [HALT] Index (19)), both for those with headache and for their household partners. A set of questions determined body mass index (BMI), a risk factor if high, for frequent headache. Finally, there were questions on general health derived from the World Health Organization Quality of Life BREF (WHO QoL -BREF) (20) and the Hospital Anxiety and Depression Scale (HADS) (21).
Evaluation of the questionnaire
The EUROLIGHT questionnaire was assessed for: (i) face, content and language validity; (ii) test–retest reliability over one month, a period of time during which little or no change in the respondent’s headache is expected; (iii) the extent to which it could discriminate between respondents with more or less severe disease (construct validity); and (iv) the extent to which individual items correlated with other items relating to the particular area of enquiry (internal consistency). The respective methods are detailed below.
All parts of the study conformed to the ethical standards described in the Declaration of Helsinki. Ethics committee approval was obtained from the National Ethics and Research Board of Luxembourg.
Study population
People with headache were recruited by different means in five countries. In England, they were recruited from the members of Migraine Action UK. In France, consecutive patients were recruited in the Department of Evaluation and Treatment of Pain within the Neurosciences Clinic, University Hospital, Nice. In Luxembourg, people with headache were recruited from the French-speaking employees of CRP-Santé by email. The subjects from Luxembourg and France participated in the evaluation of construct validity. The sample from Germany was derived from an existing data bank of the German Headache Consortium, University Hospital of Essen, a population-based cohort including people with and without headache. In Austria, consecutive patients were recruited in the Department of Neurology and Pain Medicine, Konventhospital Barmherzige Brüder, Linz; healthy subjects were enrolled from the personnel working at the hospital and their families. In Italy, 50% of subjects with headache came from the waiting list of the Applied Neurological Research Centre of the C Mondino Foundation and 50% were members of the headache patient organization, AI.Ce. Healthy subjects were enrolled from the staff of the research centre. In Spain, respondents with or without headache were recruited from people attending general practitioners for reasons other than headache.
Face, content and language validity
Initial content validity was explored through systematic review by experts, and face validity was tested by pre-piloting with 23 volunteers. All questions not used previously in validated questionnaires in a particular language were forward-and-back translated by two native translators, with reconciliation by a bilingual headache expert. Comprehensibility was tested by native language-speaking volunteers.
Test–retest reliability
Questions were categorized by the amount of change expected within the relevant time frame, as described previously for the development of a comparable questionnaire (22), as follows: ‘no change expected’; ‘change unlikely’; ‘up to 1 unit change expected’; ‘up to 2 units change expected’; and ‘up to 3 units change expected’. Respondents in this study completed the questionnaire twice, the second time after an interval of 1 month. At retest, they were blinded (beyond what they might have recalled) to their responses on the first occasion.
To assess test–retest reliability, the two sets of answers were compared. For categorical data, agreement measures were the percentage agreement rate, Kappa values, McNemar’s S-test and Bowker’s S-test. Percentage agreement measures absolute within-patient agreement. The Kappa coefficient indicates whether this agreement exceeds what might be expected by chance: a value >0.6 is generally considered acceptable. For the questions with discrete integer data, the intraclass correlation coefficient (ICC) was calculated using a 2-way random effects model for agreement.
Construct validity and internal consistency
Construct validity was intended to be assessed partly by comparing headache-free participants with headache sufferers and partly by measuring the internal consistency of answers to related questions. In the course of this part of the study, it transpired that some participants recruited as ‘healthy’ were, in fact, reporting occasional headaches. Construct validity assessment was, therefore, based on headache frequency rather than presence or absence (low frequency = 0–3 and high frequency >3 headache-days per month). Comparisons between low-frequency and high-frequency headache sufferers were made for the total scores of the WHO QoL, HALT index and HADS. Comparisons between categorical scores of those diagnosed with migraine, other episodic headache and chronic daily headache were performed by chi-squared test. Continuous scores were compared by one way-ANOVA, with the score as dependent variable. Normality was assessed by Kolmogorov–Smirnov test; if this was significant, data were log-transformed and re-analysed if normally distributed; otherwise the Kruskall–Wallis test was used.
Where appropriate, cross-tabulations were used to check for internal consistency. Blocks of questions corresponding to the ICHD-II criteria, WHO QoL, HALT index and HADS were explored for consistency using Cronbach’s alpha coefficient: the larger this coefficient, the more likely it was that items contributed consistently to a scale, with a value of >0.70 suggesting acceptable consistency. Recalculating the alpha coefficient after deleting each question within a set determined how each contributed to the reliability of the scale: when the coefficient increased after a question was deleted, its responses were not highly correlated with those to other questions in the set; conversely, if the coefficient decreased, they were highly correlated.
Sample size calculation
To our knowledge, there is no method to calculate the sample size needed to assess face content, language validity, construct validity and internal consistency in a questionnaire validation study. Therefore, the sample size calculation was based on the test–retest reliability. Assuming an absolute Kappa precision of 0.18 (based on parts of the BURMIG questionnaire that had been validated previously), we estimated that 73 responses to the main questions in the second test would enable a Kappa value of ≥ 0.5 to be detected with a power of 0.95 (two-tailed α = 0.05). Thus allowing for a 60% response rate, 135 subjects were considered necessary.
Results
UK pilot study
Sociodemographic and headache variables for the validation samples in different countries
Not all subjects answered the question about gender.
Completion rates were ≥90% for 86% of single questions at both test and retest. Questions with < 90% completion rate were those related to income, questions from the HALT Index and those related to impact on children. One question about the ‘level of control’ over headaches seemed especially difficult to answer, with completion rates of 49% and 55% at test and retest. A question on preventative medications had three response fields (name of medication and how long it had been taken in weeks or months); the first field had completion rates of 45% and 40% for test and retest, respectively, while the two other fields fell below 10%. Questions on investigations such as magnetic resonance imaging (MRI) and computed tomography (CT) also showed completion rates below 10%.
Test–retest reliability of questionnaire (percentage agreement; Kappa values and intraclass correlation coefficient [ICC] values for variables of two modalities; McNemar’s coefficient for 2 × 2 tables and Bowker’s coefficient for variables with more than two response options)
Among the questions categorized as ‘no change expected’, two of those analyzed by Kappa coefficient were responsible for lowering the rate of agreement (< 30%) while all others analyzed in this way showed test–retest agreements of 40–100% (Table 2). The Kappa coefficient varied from 0.26 to 1, with questions from the HADS contributing most (from 0.36 to 0.55) to a low value. For questions with quantitative responses, analyzed by ICC, the rate of agreement varied from 1% to 74%, with the extreme low value due to a diagnostic question asking the number of days with headache (Appendix 1, Question 18). Most of these questions were in the range 20–25%. The ICC was good for these questions.
For the questions categorized as ‘up to 1 unit change expected’, only a third had agreement rates of <60%. Age had the highest value (98%). These questions were also associated with low Kappa coefficients: only one quarter of them had coefficients >0.5.
Only two questions were categorized ‘up to 2 units change expected’; these had 12% and 100% agreement rates with Kappa coefficients of 0.16 and 0.66. Six questions were categorized ‘up to 3 units change expected’: one had an agreement rate of 36%, with a Kappa coefficient of 0.21, which is not a good result, and 5 HALT Index questions showed agreement rates of 25–52%, with an ICC varying from 0.83 to 0.92, which is a good result.
For questions with two response options, McNemar’s S-test showed a significant difference for one question, which asked whether the respondent had a headache yesterday (Appendix 1, Question 32). A change of response to all questions about headache yesterday is expected between test and retest. Only three items were significant (P < 0.05) on Bowker’s S-test: no agreement was observed for questions attempting to measure lost work due to headache (Appendix 1, Questions 36 and 37) and the question about how headache was accepted at work (Appendix 1, Question 53).
Internal consistency was evaluated independently for the blocks of questions derived from WHO QoL, the HALT index and HADS. The standardized values of Cronbach’s alpha were, respectively, 0.93, 0.88 and 0.90.
Following this pilot study, the phrasing and the response options of some questions were modified. In general, however, the pilot study showed that the questionnaire was well understood and yielded satisfactory completion rates; therefore, no questions were deleted or added.
Validation study in other countries
The slightly amended questionnaire was translated for validation in the other countries.
Populations
The numbers of subjects participating in each country is given in Table 1. There was a female preponderance in all countries. Most respondents were full- or part-time employed or self-employed, while students, unemployed and retired people accounted for 10–20%. Average age was 40 years except in France where it was 50 years.
Response rates
Numbers of responders in each country are given in Table 1, varying between 66% and 100%. In Spain, one questionnaire was deleted from the database as it was incomplete.
Completion rates
Completion rates for each question were adjusted according to expectation. A rate > 100% meant that the participant was not expected to answer a particular question but nevertheless did: for example, some respondents answered that they had not had headache yesterday, but still had taken medicines to relieve headaches on that day. Per country, the percentages of respondents with completion rates over 90% were: Germany-Austria, 69%; Spain, 75%; Italy, 65%; and France, 82%.
Certain questions had low completion rates. For the question on duration of use of preventative drugs (Appendix 1, Question 45), the rate was < 30% in Italy and < 10% in the other countries. The questions concerning MRI and CT scans had completion rates of < 10% in Italy, < 30% in Spain and < 20% in Germany/Austria. The HALT-Index questions had low completion rates in France, ranging from 52% to 61%.
Test–retest reliability
In Italy, 141 questions (including some sub-questions) were used to assess reliability (open questions were excluded, as they could not be quantified). Of these, 42% (n = 59) showed > 80% agreement, 10% (n = 14) ranged from 40–80% and 48% (n = 66) had < 40% agreement (Table 2). In Spain, 149 questions were used (again including sub-questions and excluding the open questions). Of these, 46% (n = 69) had > 80% agreement, 16% (n = 23) ranged between 40–80% and 38% (n = 57) had < 40% agreement. In Germany/Austria and in France, 116 questions were used (including sub-questions and excluding open questions), of which 36% (n = 42) showed > 80% agreement, 21% (n = 24) ranged between 40–80% and 43% (n = 50) had < 40% agreement.
Two ‘no change expected’ questions were identified as largely responsible for lowering the rate of agreement below 40%. The first (Appendix 1, Question 15) asked for the medication usually taken to treat chronic daily headache; some participants may not have understood well enough the accompanying text to the question. The second (Appendix 1, Question 56) question asked how well subjects were able to control their headache. In this category, two other questions had low reliability scores. The first asked for the number of days with headache (Appendix 1, Question 18), giving respondents the reply options of ‘every day’ or stating the number of ‘days/month’ or ‘days/year’. The second asked the duration of headache in minutes, hours or days (Appendix 1, Question 20).
Of ‘up to 1 unit change expected’ questions, 26 out of 48 in Italy had agreement rates of < 60% (only 11 having Kappa coefficients of > 0.5). Corresponding numbers were 29 of 48 in Spain, 24 of 46 in Germany/Austria and 22 of 48 in France; in all countries these questions accounted for the low Kappa coefficients. The question about investigations (MRI, CT, etc.; Appendix 1, Question 48) unsurprisingly also had low agreement rates. Questions on the effects of headache on education, career and family planning (Appendix 1, Questions 50–76), with 4–6 possible response options, had agreement rates of < 10% in Italy. As multiple responses could be chosen, completion rate was calculated for each possibility. As a consequence, percentage changes were very low for all responses other than ‘no’. Three questions of the WHO QoL and one from the HADS showed significant Bowker S-tests in Germany/Austria, Spain and France, meaning that there was lack of reliability over time.
There was one question with ‘up to 2 units change expected’, and this had very low agreement rates: Italy 13%; Germany/Austria 10% with Kappa = 0.47; France 33% with Kappa = 0.27; and Spain PA = 13% with Kappa = 0.35.
Of questions in the category ‘up to 3 units change expected’, only one had low agreement rates: in Italy (3%, Kappa = 0.46), Spain (12%, Kappa = 0.28) and France (30%, Kappa = 0.17). However, the poorest agreement was for the HALT Index, the reliability of which was measured by the ICC associated with the percentage agreement rate: in Italy, 58–65% with ICC = 0.60–0.97; in Spain 90–100% with ICC = 0.88–0.95; in Germany/Austria 28–38% with ICC = 0.55–0.94 and in France 23–36% with ICC = 0.58–0.94.
Construct validity and internal consistency
Internal consistency of question blocks (WHO QoL, HALT, HADS)
Standardized values of Cronbach’s alpha.
Construct validity for WHO QoL, HADS and HALT index in relation to headache status
One subject was excluded due to a high score of 261.
It is indicative of good construct validity that the mean scores for WHO QoL, HADS overall, HADS anxiety (HADS-A) and HADS-depression (HADS-D) were significantly different between those with and those without headache in each country. In addition, the HALT index, used to compare groups with low and high headache frequencies in France/Luxembourg, showed significantly higher scores in the latter.
Construct validity for WHO QoL, HADS and HALT in relation to headache diagnoses
HADS, Hospital Anxiety and Depression scale; HADS-A, HADS-anxiety; HADS-D, HADS-depression.
Discussion
This paper describes the development and testing of the EUROLIGHT questionnaire to evaluate the burden of headache disorders in different European populations. The questionnaire originated in the BURMIG questionnaire, and was revised after a systematic literature review and discussions among headache experts and lay persons in the EUROLIGHT steering committee. The English version was tested in a UK pilot study and, after some minor amendments, the resulting questionnaire was translated and tested in a German version in Austria and Germany, a French version in France and Luxembourg, a Spanish version in Spain and an Italian version in Italy.
As to test–retest reliability, good response rates were achieved, and completion rates for each question were generally good with the majority (65–80%) above 90%. A small number of questions required modification in the light of likely causes for low completion rates. Sub-questions asking for the total number of days or occasions were deleted as they were not completed by respondents. Questions with text field for respondents to fill in also had to be avoided. Questions from WHO QoL and HADS showed good completion rates, and good reliability. This was not the case for the HALT Index, especially in France.
For methodological purposes, we had defined the amount of change expected for each question before administering the questionnaire. Questions where a change had been expected did show higher amounts of change, indicating that these items were understood correctly and, therefore, can be used as part of the EUROLIGHT questionnaire.
The reliability coefficients also showed convincing results. Kappa and ICC showed values above the defined significance threshold (see Materials and methods). However a small number of questions needed to be modified to increase the reliability of the questionnaire.
Internal consistency was found to be excellent for WHO QoL, HADS, HADS-A, HADS-D and the HALT Index.
Construct validity was found to be acceptable in different countries as the relevant questions were able to discriminate between groups of respondents with different headache frequencies and diagnoses. The tools WHO QoL, HADS and HALT Index used within the questionnaire discriminated well between those with and those without headache. In headache sufferers alone, questions from the HADS showed a low discrimination between headache types, which is unsurprising, as co-morbidity is known to differ little between headache types but more depending on headache frequency (23–25). The headache-specific tool HALT showed good discriminative power in most counties, although not in France and Luxembourg.
For questions on disease management, test–retest agreements ranged from 77% to 98% (except for questions with multiple response options). Kappa coefficients ranged from 0.68 (0.62 for questions with multiple response options) to 1.00, which indicates good agreement.
The majority of questions about private and social impact were of the type with several response options, and these scored poorly in terms of agreement rate (10–30%) but had a good test–retest reliability (Kappa coefficients ranging from 0.52 to 0.97). As the responses to these questions were stable over time, we believe that they truly reflected the headache impact on patients’ lives over a certain period and not only how they perceived it on that day.
It is a weakness in the development of the questionnaire that the diagnostic questions have not yet been validated against a gold standard method for diagnosing headaches (interview and examination by a headache expert), which is mandatory when diagnostic accuracy is of paramount importance (26). Diagnostic validation should be done in the population to be studied and, since the present study was mostly performed among headache patients who had already been diagnosed and treated, this was not done. When the population-based studies with the EUROLIGHT questionnaire are performed, some sort of validation in the different countries is planned in order to assess the diagnostic precision of the questionnaire.
Conclusions
The EUROLIGHT questionnaire was developed in order to estimate the burden of headache disorders in Europe. Established and recently validated tools for diagnosis, disability and co-morbidity were supplemented with more detailed questions on disease management and impact on school, work, family, social life and quality of life. The resulting questionnaire was tested in UK, Italy, Spain, Germany, Austria, France and partly in the Grand Duchy of Luxembourg. Reliability and consistency were found to be comparable to those of previously published questionnaires (16,22). The validation process resulted in relatively minor changes. We believe the final EUROLIGHT questionnaire, at least in the five languages that have been tested, will give a reliable and valid picture of the impact and burden of primary headache disorders in European populations and offer additional valuable information to the results of the American Migraine Prevalence and Prevention (AMPP) questionnaire (27–29).
Since headache is a considerable burden for people everywhere, we hope that the questionnaire can be adapted for use in many other countries and cultures.
Footnotes
Acknowledgements
The authors are indebted to patient organizations within the World Headache Alliance for their contributions to this study, and are especially grateful for the help offered by S Chatterji, R Lipton, J Schoenen and A MacGregor.
EUROLIGHT is a European initiative supported by a grant from the EC, Executive Agency for Health and Consumers (EAHC) and promoted by the Centre of Public Research, Luxembourg.
JB, MLL and MV are staff members of the CRP-Santé; the authors alone are responsible for the views expressed in this publication and they do not necessarily represent the decisions, policy or views of the CRP-Santé.
