Abstract
The North Staffordshire Headache Survey aims to measure the effect and impact of headaches, medicine use and healthcare utilization in a general population sample. A self-reporting questionnaire was piloted in a general population sample, with reliability being tested in a sample of pilot responders after one month and validity by comparing pilot responders with primary and secondary care headache consulters. One hundred and twenty-two (61%) responded to the pilot survey, with 56% of items having completion rates of 90% or more, and tests showed good internal consistency (>90%). One-month test-retest data showed good agreement, though questions relating to specific time periods (with partial or no overlap between survey periods) showed expected lower agreement. The headache consulters reported greater frequency, duration and severity of headaches than the population sample suggesting good construct validity. Results from these studies indicate that the questionnaire is a reliable and valid instrument to collect data about headaches in the general population.
Introduction
Headaches are common, with more than 90% of men and women in one Danish study (1) reporting having suffered the problem at some time in their lives. There have been no recent population-based studies of the frequency and impact of headaches in the UK general population. The objectives of the North Staffordshire (UK) Headache Study were to measure the prevalence of headache in an unselected adult population in one area of the UK and to investigate its effects on sufferers and on the use of healthcare. In this paper we describe the development and testing of the survey instrument used in this study.
Method
A questionnaire was designed for the self-reporting of headache, disability and healthcare utilization in a general population. This was done with reference to the literature, and with advice from physicians and researchers interested in headache. The questionnaire contained a maximum 124 items, the majority of which were closed questions with categorical answers, and took an estimated 15–20 min to complete. The questionnaire consisted of five sections: headache experiences, effects of headaches and medications, advice about headaches, general health and demographic information. Respondents were asked to include in their responses all their headaches of all types and they were firstly asked if they had ever experienced a headache and then if they had also experienced a headache in the previous 3 months. Those reporting headache in the previous three months were asked about the characteristics of their headaches in that period in terms of frequency, duration and pain, together with symptoms associated with their headaches. The effects of those headaches were then measured in terms of actions taken, medication or other therapies used. Disability was assessed using the Migraine Disability Assessment (MIDAS) categorizing respondents based on their MIDAS score into mild (0–5), moderate (6–10), severe (11–20), or very severe headache related disability (≥21) (2, 3). Respondents who had ever experienced a headache were asked if, and from whom, they had sought advice about their headaches and the reasons for and expectations of those consultations. All respondents were asked questions relating to their general health using SF-12 (4) and Hospital Anxiety and Depression Scale (HADS) (5), and also questions about quality of their sleep, and alcohol and caffeine consumption. They were finally asked their gender, date of birth, ethnic group, employment status and occupation. Ethical approval for the study was obtained from the Local Research Ethics Committee.
The testing of this instrument addressed the following issues:
Face and content validity (the extent to which the questionnaire measures the characteristic it is intended to measure);
Test-retest reliability (the stability of a questionnaire over a period of time during which little or no change is expected);
Internal consistency (the extent to which individual items in a questionnaire correlate with other items relating to the particular area of investigation);
Construct validity (the extent to which the questionnaire is able to discriminate between respondents with different disease levels);
Completion and response rates.
Initial content validity was explored through systematic review by experts, and face validity was tested by prepiloting with colleagues and friend volunteers.
The questionnaire contained some items which had previously been tested for reliability and validity, namely MIDAS, SF-12 and HADS. However, the new questions and the questionnaire as a whole instrument needed to be tested for validity and reliability. The methods used are described below.
Pilot study
A random sample of 200 patients aged 18 and over was selected from the practice register at the pilot general practice. In the UK about 98% of the population are registered with a general practitioner (GP) regardless of health status and use, and so such registers provide a useful representative sample of a local population (6). The practice patients had a mix of socio-economic backgrounds but North Staffordshire has a low number of residents from non-white ethnic groups. Patients were excluded if they had current serious illness or poor mental state as determined by the GP. The questionnaire was mailed, with an accompanying letter from one of the general practitioners explaining the study and requesting participation. A repeat mailing was sent to non-responders after two weeks. Completion rates were calculated for each of the questions or group of questions to identify areas of the questionnaire where comprehension may be difficult. Where appropriate, cross-tabulations were used to check for internal consistency.
Reliability study
Ninety responders to the pilot general population study were randomly selected and mailed a second questionnaire four weeks after the first was mailed. We estimated that 52 responses to the main questions in the second questionnaire would enable a Kappa value of ≥0.5 to be detected with a power = 0.95 (two-tailed α= 0.05) (7). Thus, allowing for a 60% response rate, 90 subjects were considered necessary. Questions were categorized by the amount of change expected, primarily based on the time frame of the question and blinded to the results as follows:
‘no change expected’– unchanging information such as gender;
‘change unlikely’– little change expected in one month, for example, lifetime headache;
‘some change expected’– questions that ask about the previous 3 months will have two months of overlap in the repeatability period, for example, headache characteristics in the previous 3 months;
‘change likely’– time period of questionnaires does not overlap, that is, one month or shorter.
Data from the two questionnaires were compared to assess test-retest reliability. For categorical data, this was estimated in two ways: percentage agreement and Kappa values. These were calculated for all questions with categorical answers, where at least 20 respondents answered the question at both time points, and where the distribution of the data allowed. Percentage agreement gives an estimate of within-patient agreement. However unlike the Kappa coefficient (8), it does not consider the fact that some of the agreement can be expected to occur purely by chance. For the one question with discrete integer data, intraclass correlation coefficient (ICC) was calculated using 2-way random effects model for agreement. Statistical analysis was performed in SPSS 10.0 (9).
Validation study
Two populations were used for comparison with the pilot study: Firstly, primary care patients who had consulted with migraine were selected from the practice register of the pilot practice using the computerized morbidity code for migraine and mailed a questionnaire and reminder in the same way as above and Secondly, secondary care patients attending a specialist headache clinic were asked to complete the questionnaire during their visit to the clinic and their headache diagnosis was then recorded by the clinician after their consultation. The study was designed to identify any major differences between the groups, and not for the purpose of statistical testing of those differences. Ten patients from each of these populations were approached. The data from these validation samples were compared to the pilot sample with the hypothesis that patients consulting for headache were more likely to experience more severe, frequent and disabling headaches than the general population sample. This tested construct validity.
Results
Pilot study
One hundred and twenty-two completed questionnaires were received after one reminder, a response of 61%; 78 (64%) women and 44 (36%) men. Median age of respondents was 51 years (range 20–84).
Response rates were lower in men, particularly younger men, with 19% of men aged 18–35 years responding compared with 60% aged 36–55 and 57% aged over 55. Younger women were also less likely to respond compared with older women but the difference was not as marked, with 62% of women aged 18–35 years responding compared with 78% aged 36–55 and 76% aged over 55.
Completion rates for the items in the questionnaire varied from 37% to 100%; 56% of items had completion rates of 90% or more. For some sections of the questionnaire all, or most, of items within the section must be completed in order to obtain a total score: high completion rates were obtained for HADS (93%) and sleep questions (92%) but rather lower for SF-12 (83%).
Questions with lower completion rates were reviewed for their inclusion in the main study. These questions were:
Pain scale – 88% completion;
Possible reasons for medicine not taken – 67% completion;
Reasons for their last consultation – 77% completion;
Expectations of consultation with GP and pharmacists – both 75% completion.
Four areas were checked where consistency between responses was expected, and consistency ranged from 90% to 97% (Table 1). The lowest area was pain reported to interfere with work and pain lasting more than 24 h shaded on the pain manikin.
Tests of internal consistency of questionnaire responses
Reliability
Forty-eight of the 90 who were sent a repeat questionnaire returned it (53% response). Median time between receipt of completed questionnaires was 28 days (minimum 12 and maximum 66 days) and 58% were received between 26 and 33 days apart. Question completion rates varied from 33% to 100% and were similar to the initial questionnaire. Percentage agreement was calculated for 69 questions and ranged from 0 to 100% (Table 2); 39 items (57%) had over 80% agreement, 18 (26%) had 60–80% agreement and 12 (17%) had less than 60% agreement. Kappa values were calculated for the 53 questions where this was possible. Twenty-eight of the 53 questions (53%) achieved a Kappa coefficient of at least 0.6, indicating good agreement. The pain scale showed good agreement, ICC = 0.79.
Test-retest reliability – percentage agreement and Kappa values (n = 48)
∗Options for discrete integers from 0 to 10, reliability tested using ICC.
∗∗These figures describe ranges of n, agreement and kappa for a set of items, and therefore 95% confidence intervals not shown.
All questions categorized as ‘no change expected’ or ‘change unlikely’ had over 80% agreement and Kappa value of at least 0.6. For questions categorized as ‘some change expected’ or ‘change likely’, percentage agreement ranged 0–100 and Kappa values ranged 0.17–1.00.
There was 100% agreement as to whether the respondents had ever suffered from a headache and 93% agreement for headache in the last three months. The headache characteristics questions had reasonable agreement overall, ranging from 53% to 100%. There were two exceptions: 44% agreement for the number of headache days and 40% for pain rating (on a 0–10 scale).
The consultation questions had good agreement. All those who reported consultation in the first questionnaire also reported consultation in the second questionnaire. However, there were a number of responses (8 of 35, 23%) where consultation with a particular person was reported in the first questionnaire but ‘never consulted’ that person was reported in the second.
Validation
In the primary care migraine group, six of the 10 questionnaires were returned after one reminder. All 10 were returned from the secondary care group. The secondary care patients had specific diagnoses: 2 migraine with aura, 5 migraine without aura (1 who also had aura alone), 1 migraine without aura and tension-type headache, 1 migraine without aura and daily headache (neck problems) and 1 episodic cluster headache.
The two validation samples combined were more likely to experience a headache once a week or more (38%) compared to the pilot sample (28%). They were also more likely to report more than 6 days with headache in the last 3 months (50%) compared to the pilot sample (28%). The validation sample were more likely to report headaches lasting more than 24 h (53% without medication and 36% with) than the pilot sample (5% without medication and 6% with). The headache pain severity scores of the validation sample were higher than the pilot sample: median scores were 7 (range 5–10) and 5 (range 1–10), respectively. All the validation respondents and 93% of the pilot respondents reported associated symptoms with their headache. However, 88% of the validation sample reported 5 or more associated symptoms compared to 25% of the pilot sample.
Disability due to headaches using MIDAS, a previously validated disability instrument, was greater in the validation sample with 81% reporting at least moderate disability, compared with 19% of the pilot sample. Headaches in the last 3 months that resulted in lying down and resting for more than one hour were reported by 88% of the validation sample compared with 35% of the pilot sample.
All validation sample respondents stated that they had taken headache medication in the past 3 months. The validation sample was more likely to take both acute (94%) and prophylactic (51%) headache medication than the pilot sample headache sufferers (82% and 1%, respectively).
Additionally within the pilot sample, there is broad evidence of internal construct validity for frequency, pain severity and number of associated symptoms because pain scores and percentage of respondents reporting more severe characteristics were higher in those with MIDAS grade of at least moderate compared with MIDAS grade of mild.
Discussion
The pilot study response rate was good overall and better in women than men. The lower response in younger males is a finding common in postal health surveys (10, 11). Completion rates for questions were generally good with the majority being over 90%. Reasons for individual question completion being low can be speculated upon. In the pain scale, varying pain from different headaches may lead to difficulty selecting one value. Respondents who had not taken medication may have thought when they answered ‘no’ that they had finished the question or the selected reasons for not using medication may not have been appropriate. Respondents most recent consultation may have occurred some time ago, and may have been difficult to recall in detail, or the suggested answers may not have been appropriate. An important explanation for the low completion rates might be one of small total numbers of those eligible to answer the questions about medication not taken, and reasons and expectations of consultations, and thus low precision for the response rate.
Internal consistency checks found very few respondents reporting inconsistent answers. Pain reported as interfering with work and the absence of corresponding shading on the pain manikin was the largest discrepancy. Possible reasons for the discrepancy could be that, in some subjects complaining of pain that interferes with work, the pain had not lasted for one day or more and therefore was not recorded on the manikin.
As anticipated, the validation sample reported greater frequency, duration and severity of headaches than the pilot population sample. Moreover, they also reported more accompanying symptoms and higher disability compared to the pilot sample. Since all but one of the validation sample was a migraineur, this would be expected as it is likely that in the pilot general population sample the most common type of headache was episodic tension-type headache (1) which has been shown to be of shorter duration and less disabling compared with migraine (12). It is also likely that people with more frequent and severe headaches are more likely to seek help for those headaches. As predicted, a group of patients attending health care for their headaches had more frequent, more severe and more disabling headaches, as judged by the questionnaire. This provides limited but important construct validation of the questionnaire.
We had expected that all of the validation sample would report having consulted their GP about headache since the secondary care respondents would have been referred by their GP and the primary care respondents had migraine recorded in their GP records. However one primary care respondent did not report consulting their GP about headaches. This confirms other work which suggests a mismatch between recorded consultation and patient recall of a practice visit (13). Possible explanations are that the respondent did not remember consulting about headaches or that the respondent had mentioned migraines but no discussion had taken place.
Headache is a symptom that people experience. We are reliant on self-reports of the frequency, duration and pain experienced. The problem of having no gold standard against which to measure people's experience of headache reflects the need to rely on indirect tests of validity such as the comparison with primary and secondary care consulters. Previously validated items were confirmed as being reliable in the context of this whole headache instrument.
Almost half of the questions did not achieve good reliability. However, an individual's symptoms may vary over time, and so in two questionnaires completed one month apart some change would be expected, and poor reliability does not necessarily mean low validity. We attempted to take this into account by categorizing the questions into those where we expected stability over time and those where we envisaged actual change could have taken place. All of the questions where good reliability was not achieved were those where we expected some actual changes would have occurred after one month. For some of the items which only a proportion of responders were eligible to answer, their low frequency in the reliability study makes their kappa values less precise – this is illustrated by the 95% confidence intervals – and this would be another explanation of poor reliability.
When prevalence is high this can result in a reduced kappa coefficient. Thus the distribution of answers needs to be taken into account when interpreting results. This would apply to a number of questions from our questionnaire such as headache in the last three months and whether medication had been used for headache.
There were some minor inconsistencies within individual questions which could not have been expected – notably, that whilst respondents were consistent in reporting consultation, they were not consistent in the reported person consulted.
The results from all the testing led to the following alterations to the instrument:
Pain Scale was amended to categorical rating for pain (none, mild, moderate, severe and very severe);
Strategy for headache at work and at home questions had extra instruction added to help with the clarity of the question –‘please tick the one box that you do most often’;
Questions about expectations of last consultation with GP and pharmacist had extra instructions added ‘please think about each item in turn and circle one number for each item’;
Additionally questions with subparts had those numbered.
In summary, a questionnaire has been developed using a mixture of new and established items to measure prevalence and impact of headache in the general population. We have considered content validity, response and completion rates, internal consistency, reliability and construct validity. In general our conclusion is that this questionnaire will be reliable and valid for use after minor amendments in a large population survey which is currently underway. The testing led to minor changes to the instrument which were incorporated in the main survey.
Footnotes
Acknowledgements
We would like to thank the North Staffordshire GP Research Network and the City of London Migraine Clinic for their assistance with this study. The study was funded by the Department of Medicines Management and the Primary Care Sciences Research Centre, Keele University.
