Abstract
Aim
To systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.
Design
Articles were eligible for inclusion when the diagnostic accuracy (sensitivity/specificity) was established for measurement instruments for headaches associated with musculoskeletal symptoms in an adult population. The databases searched were PubMed (1966–2018), Cochrane (1898–2018) and Cinahl (1988–2018). Methodological quality was assessed with the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) and COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist for criterion validity. When possible, a meta-analysis was performed. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) recommendations were applied to establish the level of evidence per measurement instrument.
Results
From 3450 articles identified, 31 articles were included in this review. Eleven measurement instruments for migraine were identified, of which the ID-Migraine is recommended with a moderate level of evidence and a pooled sensitivity of 0.87 (95% CI: 0.85–0.89) and specificity of 0.75 (95% CI: 0.72–0.78). Six measurement instruments examined both migraine and tension-type headache and only the Headache Screening Questionnaire – Dutch version has a moderate level of evidence with a sensitivity of 0.69 (95% CI 0.55–0.80) and specificity of 0.90 (95% CI 0.77–0.96) for migraine, and a sensitivity of 0.36 (95% CI 0.21–0.54) and specificity of 0.86 (95% CI 0.74–0.92) for tension-type headache. For cervicogenic headache, only the cervical flexion rotation test was identified and had a very low level of evidence with a pooled sensitivity of 0.83 (95% CI 0.72–0.94) and specificity of 0.82 (95% CI 0.73–0.91).
Discussion
The current review is the first to establish an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal factors. However, as most measurement instruments were validated in one study, pooling was not always possible. Risk of bias was a serious problem for most studies, decreasing the level of evidence. More research is needed to enhance the level of evidence for existing measurement instruments for multiple headaches.
Introduction
Primary headaches like tension-type headache (TTH) and migraine are associated with various musculoskeletal factors. TTH is, for example, associated with pericranial tenderness, myofascial trigger points and lower muscle coordination of the upper neck flexors (1–4). Furthermore, migraine may be triggered by myofascial trigger points or bruxism (1,5–7). These primary headaches are not caused by musculoskeletal disfunction but are associated with different musculoskeletal symptoms (8). There are several secondary headaches that are actually caused by musculoskeletal problems, such as cervicogenic headache (CGH), headache after whiplash trauma and secondary headache attributed to temporomandibular dysfunction (TMD) (8). The physiotherapist (PT) is a specialist in the musculoskeletal field, and often treats patients with headaches associated with musculoskeletal symptoms. The type of headache must be diagnosed within the physiotherapeutic diagnostic process to choose the proper treatment options and collaborate with medical specialists when needed (9).
The International Headache Society (IHS) published the International Classification of Headache Disorders – 3rd edition (ICHD-3), which contains clear diagnostic criteria for all types of headache (8). Several headache measurement instruments are developed for PTs and other health care professionals to classify different headache types (10–14). The ability of a test to discriminate between the target condition and health or not having the target condition, is called the diagnostic accuracy of the test (15). The diagnostic accuracy is often quantified through measures of sensitivity and specificity (15). Insight into the diagnostic accuracy of these instruments for headaches associated with musculoskeletal symptoms is needed to determine the type of headache. Currently there is, to our knowledge, no overview of diagnostic accuracy of the different headache measurement instruments related to the level of evidence. Therefore, the aim of this study was to systematically review the available literature on the diagnostic accuracy of questionnaires and measurement instruments for headaches associated with musculoskeletal symptoms.
Methods
Protocol and registration
This review has been performed according to the PRISMA statement (17) and registered in PROSPERO (registration number: CRD42017062472). Due to the magnitude of articles found within the original search strategy, there were two review questions created. The focus of the current review is the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. A second review (in preparation) will focus on the clinimetric properties of the instruments that measure other outcomes, based on the International Classification of Functioning, Disability and Health (16); for example, measurement instruments for pain, range of motion, limitations in activity, and quality of life.
Eligibility criteria
Only full text original articles were included concerning the diagnostic accuracy, expressed in sensitivity and specificity, of diagnostic headache tests usable for PTs. Further inclusion criteria were: a) adult patients ( ≥18 years) and b) patients that experienced headaches associated with musculoskeletal symptoms. These include migraine, TTH, CGH, headache after whiplash and headache attributed to TMD (8,19,20). There was no minimum sample size for inclusion. No restrictions were put on the year of publication. Intervention studies, prediction models and measurement instruments not usable for PTs (e.g. imaging, nerve blocks) (21) were excluded. Only articles in English were included
Information sources
The electronic databases PubMed (1966–2018), Cochrane (1898–2018) and Cinahl (1988–2018) were searched for literature. The last search was performed on 25 October 2018. If full texts could not be obtained, the corresponding author was contacted through email to request the full text.
Search
The search strategies included search terms for the construct (e.g. pain, diagnosis), the target population (e.g. migraine, TTH), the instrument (e.g. questionnaire, test) and the methodological PubMed search filter for measurement instruments (21). The search filters for the Cochrane and Cinahl databases were derivatives from the PubMed search filter. The full search strategies for each database can be found in Supplemental material 1. References of retrieved articles were screened for additional relevant studies.
Study selection
Two reviewers (HvdM, CMV) independently assessed titles, abstracts and reference lists of the studies, using the online program Covidence (22). In case of disagreement between the two reviewers, a third reviewer (CMS) made the decision regarding inclusion of the article. After initial screening of the titles and abstracts, HvdM and CMV read the full texts of included articles and screened these for eligibility. All reviewers are orofacial physiotherapists and researchers in this field.
Data collection process
Study characteristics of included articles stratified for the target populations of patients with migraine, migraine and tension-type headache and cervicogenic headache.
Not given in article, therefore calculated based on the published 2 × 2 table.
Articles included in meta-analysis as shown as in Table 3.
MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; –: missing data; F: female; SD: standard deviation; M: migraine; CM: chronic migraine; PM: probable migraine; TTH: tension-type headache; CTTH: chronic tension-type headache; PTTH: probable tension-type headache; n/a: not applicable.
Risk of bias in individual studies
The methodological quality of the included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) (23,24). This tool assesses the risk of bias within four domains: Patient selection, index test, reference standard, and flow and timing (24). Concerns regarding applicability were also determined for the first three domains (24). Methodological quality of studies regarding the criterion validity was assessed using the COSMIN checklist (25). Criterion validity is defined as the degree to which the scores of an instrument are an adequate reflection of a gold standard (26). Within diagnostic accuracy, criterion validity is an essential measurement property. For criterion validity, box H of the COSMIN was used (25).
Data extraction and assessment of methodological quality were performed by two reviewers independently (HvdM, CMS). HvdM was trained to use the QUADAS-2 tool and CMS was trained by the COSMIN team on quality appraisal and data extraction. The protocol for methodological assessment using the QUADAS-2 tool for this review was made available for the review authors (Supplemental material 2). The protocol for the COSMIN checklist is published elsewhere (25).
Summary measures
Sensitivity and specificity were used as measures of diagnostic accuracy.
Synthesis of results
A best evidence synthesis was performed using the GRADE recommendations for diagnostic accuracy studies with the GRADE pro online software (27). These recommendations provide a step-by-step assessment to determine the certainty of evidence of a diagnostic test, which results in a comprehensive and transparent approach for developing the recommendations for these tests. To determine the impact of the test, both the sensitivity and specificity of the test must be known as well as the prevalence of the target condition (27). Based on the prevalence of the target population, the pre-test probability of the presence of the headache was determined for a population of 1000 people (27)
A pooled sensitivity and specificity was used for each measurement instrument when there were multiple studies for one measurement tool. The pooled measurements were calculated using the ‘rmeta’ package for the R statistical software (28). A bivariate model resulting in a summary estimate for sensitivity and specificity together was used, as recommended by the Cochrane Collaboration (29,30). This model takes potential threshold effects and the correlation between sensitivity and specificity into account (29,30). The pooled sensitivity and specificity were used for the GRADE recommendations. When there was only one study for a measurement instrument, the published sensitivity and specificity of that measurement instrument were used. Finally, a summary receiver operating characteristics (S-ROC) curve was created using the ‘mada’ package for the R statistical software (29,31,32).
Factors determining the quality of evidence according to the GRADE approach are: a) Limitations in study design or execution (risk of bias); b) inconsistency of results; c) indirectness of evidence; d) imprecision; and e) publication bias (27). For limitations, the risk of bias assessment from the QUADAS-2 was used to determine if downgrading of the evidence was needed. When ≥50% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “serious” and the level of evidence was downgraded by one. When ≥75% of the assessed domains scored a “high” or “unclear” risk of bias, this was considered “very serious” and the level of evidence was downgraded by two. Inconsistency refers to unexplained heterogeneity of the results between multiple studies, after which the level of evidence may be downgraded. The indirectness of evidence was determined by the applicability assessment of the QUADAS-2 tool with the same rules as the risk of bias assessment. In the case where there was only one article studying a measurement tool, the evidence was downgraded for imprecision. All steps of the synthesis of results are depicted in Figure 1.
Flow of steps after article inclusion.
Risk of bias across studies
Methods to detect publication bias are not very reliable in diagnostic accuracy studies (30). As diagnostic accuracy studies have sensitivity and specificity values as outcome measures rather than a stated null hypothesis with a p-value, it is unlikely for publication bias to be associated with statistical nonsignificance (33). Therefore, no publication bias assessment was applied in this review.
Results
Study selection
The search in all three databases resulted in 4129 articles, which were imported in Covidence (22). After removing duplicates and assessment of eligibility on title/abstract, 150 articles remained to be assessed full text. Of these, 52 articles were excluded based on the inclusion and exclusion criteria (Supplemental material 3) and 67 articles assessed other clinimetric outcome measures than diagnostic accuracy. These 67 articles will be included in the second review regarding clinimetric outcome measures based on the ICF. This resulted in 31 articles to be included in the current review. The complete flowchart of the study selection can be found in Figure 2. No authors were contacted to obtain the full texts of any study.
Study flow diagram.
Study characteristics
The included headaches associated with musculoskeletal symptoms in this review are migraine, TTH and CGH. No measurement instruments were found that studied the diagnostic accuracy for instruments related to secondary headache attributed to TMD or headache attributed to whiplash injury. Table 1 shows the study characteristics of the 31 included studies, stratified by target population of the measurement instrument. From the 31 studies, 22 articles had migraine as the target population (10–12,34–51). Seven articles had both migraine and TTH as target population (13,14,52–56), and two articles examined patients with CGH (57,58). In total, 28,246 people were included in the 31 studies. Of the included population, 64% were female, though three articles did not describe the gender distribution (38,54,55). Mean age varied from 19 (42) to 52 years (53).
For migraine, 11 different measurement instruments were studied (10–12,34–37,40–43,44–51,59). ID-Migraine was the most studied measurement instrument, with nine studies in five languages (12,34,40, 44–47,49,50). Eight of these instruments were screening instruments, one was a replacement test for the diagnostic process, and for two instruments the aim of the test was unclear. Out of the seven studies for both migraine and TTH, only two articles looked at the same questionnaire (13,56). From the seven instruments, one was a screening test, three were replacement tests, and the aim of two was unclear. Both studies on CGH researched the cervical flexion-rotation test (CFRT) (57,58). The aim of the CFRT compared to the ICHD-3 criteria for cervicogenic headache is unclear.
Risk of bias within studies
Methodological quality assessment with QUADAS-2 and clinimetric evaluation of the criterion validity with the COSMIN checklist Box H
MSMDQ: Michel's Standardized Migraine Diagnosis Questionnaire; MAT: Migraine Assessment Questionnaire; MSQ: Migraine-specific questionnaire; MA-IHS-M; Modified Algorithm for IHS Migraine; SMIQ: Structured Migraine Interview Questionnaire; CHAT: Computerized Headache Assessment Test; HSQ-DV: Headache Screening Questionnaire – Dutch Version; SAHQ: Self-Administered Headache Questionnaire; SHQ: Structured Headache Questionnaire. An extended version of this table including explanation of judgement can be found in Appendix 4.
The clinimetric evaluation of the criterion validity was established with the COSMIN Box H. One study scored excellent (14), one good (35), 21 fair (11,12,34,36–48,50–53,57) and the remaining eight scored poor (10,13,50,55–57,59). Of the studies scoring poor, all but two (54,55) also scored a high risk of bias on ≥2 domains (10,12,13,50,55,57,59).
Migraine measurement instruments
Results of individual studies
The sensitivity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) (see Table 1). Only three studies had a sensitivity below 0.70 (38,41,50) and eight studies found a sensitivity of 0.90 or higher (11,39,42,44, 45,47–49). Half of these studies with a high sensitivity were researching the ID-Migraine (44,45,47,49). Specificity ranged from 0.27 (10) to 0.99 (37). Six studies found a specificity of 0.70 or lower (10,39,43,45, 47,49), and a specificity above 0.90 was found in six other studies (38,41,42,48,50,51). Eleven studies had both sensitivity and specificity above 0.70 (11,12,34, 35,40,42,44,46,48,51,59), of which two studies had both above 0.90 (42,48).
Synthesis of results
For two measurement instruments, the sensitivity and specificity could be pooled. For the 3-question Screen the pooled sensitivity was 0.73 and specificity was 0.93 (Table 3) based on two (10,41) out of three studies, due to missing data in one article (59). The pooled sensitivity for the ID-Migraine was 0.87 and specificity was 0.75 (Table 3, Figures 3(a) and 3(b)). The results were based on four studies (34,40,47,49) as the other five studies (12,44–46,50) did not have sufficient data available to perform the analyses.
(a) Summary Receiver Operating Characteristics (S-ROC) curves for pooled sensitivity and specificity of the 3-question screen; (b) S-ROC curves for pooled sensitivity and specificity of the ID-migraine; (c) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for migraine; (d) S-ROC curves for pooled sensitivity and specificity of the German questionnaire for tension-type headache; (e) S-ROC curves for pooled sensitivity and specificity of the cervical flexion rotation test. Pooled sensitivity and specificity of the 3-Question screen, ID-Migraine, German language questionnaire and Cervical Flexion-Rotation Test. N: number; CI: confidence interval; TTH: tension-type headache.
GRADE recommendations for measurement instruments for target population Migraine, stratified per measurement instrument.
Prevalence in the general population of 14.7% is used (65). CoE: certainty of evidence.
±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.
×Results based on the outcome of one single study.
95% confidence interval (CI) calculated by reviewers.
Combined migraine and TTH measurement instruments
Results of individual studies
The aim of the index tests differed between the included seven articles, where four were ‘replacement’ tests (13,54–56), one a ‘triage’ test (14) and two aims were unclear (52,53). Three articles established the diagnostic accuracy for several migraine and TTH ICHD diagnoses aside from the “standard” diagnoses, including chronic migraine, chronic TTH, probable migraine, and probable TTH (14,52,53). For migraine, the sensitivity ranged from 0.49 (53) to 1.00 (54) and the specificity ranged from 0.85 (56) to 0.96 (13). For chronic migraine, the sensitivity and specificity were 0.71 and 0.98 respectively (52). Probable migraine had a sensitivity of 0.89 and a specificity of 0.54 (14). The sensitivity for TTH ranged from 0.36 (14) to 1.00 (54) and the specificity range was 0.69 (53) to 0.98 (13). One study did not establish the specificity results from their test (54). Chronic TTH was tested in two studies, for which the sensitivity was 0.64 (53) to 0.70 (52) and the specificity 0.96 (52) to 1.00 (53). The test for probable TTH had a sensitivity of 0.92 and a specificity of 0.48 (14).
For migraine, chronic migraine, and probable migraine (13,14,52,54,56) five studies had a sensitivity above 0.70, which was also found for TTH, chronic TTH, and probable TTH in five studies (see Table 1) (13,14,52–54). All six studies that reported specificity, had a specificity of 0.70 or higher for migraine, chronic migraine, and probable migraine and for TTH chronic TTH, and probable TTH (13,14,52,53,55,56).
Synthesis of results
GRADE recommendations for measurement instruments for target populations Migraine and Tension-Type Headache, stratified per measurement instrument.
*Prevalence in the general population of 14.7% is used for migraine.
**Prevalence in the general population of 62.6% is used for TTH (65).
CoE: certainty of evidence.
±“Unclear” or “high” risk of bias on ≥50 < 75% of the domains on QUADAS-2.
×Results based on the outcome of one single study.
95% confidence interval (CI) calculated by reviewers.
Not possible to calculate 95% CI.
There was a very low level of evidence for the Computerized Headache Assessment Test (CHAT) (54), the use of Headache Questions (53) and the Structured Headache Questionnaire (52). The German Language Questionnaire (13,54) and the Self-Administered Headache Questionnaire (55) are both supported with a low level of evidence. Only the Headache Screening Questionnaire (HSQ)– Dutch Version was found to have a moderate level of evidence (14).
Cervicogenic headache measurement instruments
Results of individual studies
The two included studies for CGH established the diagnostic accuracy of the Cervical Flexion-Rotation Test (CFRT) (57,58). Both sensitivity and specificity ranged from 0.70 (57) to 0.91 (58).
Synthesis of results
GRADE recommendations for measurement instruments for target population Cervicogenic Headache.
Prevalence in the general population of 4.1% is used (76).
CoE: certainty of evidence.
¥“Unclear” or “high” risk of bias on ≥75% of the domains on QUADAS-2.
95% confidence interval (CI) calculated by reviewers.
Discussion
Within this review, for migraine alone 11 tools were identified (10–12, 34–37,40–51,59), for the combination of migraine and TTH six (13,14,52–56), and for CGH one tool (57,58). The sensitivity and specificity of the measurement instruments for migraine ranged from 0.38 (38) to 0.99 (48) and 0.27 (10) to 0.99 (37) respectively. The sensitivity and specificity for migraine based on the combined measurement instruments ranged from 0.49 (53) to 1.00 (54) and 0.85 (56) to 0.96 (13) respectively. For TTH, the sensitivity and specificity ranged from 0.36 (14) to 1.00 (54) and 0.59 (53) to 0.98 (13) respectively. For the CFRT, the only measurement instrument for cervicogenic headache, both the sensitivity and specificity ranged from 0.70 (57) to 0.91 (58). All measurement tools for migraine and TTH were questionnaires. The measurement tool for CGH was a physical examination test. Migraine and TTH are solely based on information from the history of the patient (15), allowing the diagnosis to be derived from a questionnaire. However, the choice of gold standard within headache research is inconsistent. Some studies used the International Classification of Headache Disorders (ICHD) first, second or third edition (15,60,61), others used the diagnosis of a neurologist or a headache nurse and for CGH the Sjaastad criteria were used (62). As the ICHD is based on the most recent scientific findings and clinical expertise from experts worldwide, the newest version of the ICHD is recommended as the gold standard (15,63).
The aim of each measurement instrument is described in Table 1. This was unclear for five measurement instruments. Nine measurement instruments are meant to be used as a screening tool in a broader population before seeing a medical specialist for a definitive diagnosis. These screening instruments are recommended for health care providers like PTs, as they are not trained for medical diagnoses but do see these patients often and can refer them to the medical specialist (64). Three measurement instruments studied were meant as a replacement test for the gold standard. This may be efficient for research purposes, as this allows the researchers to diagnose the patients without an extensive visit to a specialist. However, no conclusion was drawn from the included articles as to whether the measurement instruments were better than the gold standard (the medical specialist), therefore the presence of a medical specialist is still recommended in clinical practice.
For each measurement tool, the cut-off criteria to recognize headache should be described to allow for comparison of outcomes between studies. In reality, cut-off criteria differed between studies, which resulted in highly variable sensitivity and specificity. The lack of established cut-off points was taken into account within the ‘Index Test’ domain when assessing both methodological qualities and risk of bias.
Migraine measurement instruments
From the 11 measurement instruments found for migraine, only three were supported by evidence of two or more articles: The 3-question screen (10,41,59), the ID-migraine (12,34,40,44–47,49,50) and the Migraine Screen Questionnaire (11,51). Several studies introduced serious patient selection bias by only recruiting patients with the headache they were interested in studying (10). By doing so, there were no false positives or true negatives present, which resulted in more favourable diagnostic accuracy outcome measures. Other studies excluded participants who had a secondary headache (45), or who did not screen positive for a preliminary screening for migraine (45,46,49). One study selected their participants so 50% had a confirmed migraine diagnosis prior to the index test and 50% did not have migraine (11). This also introduced selection bias in favour of the outcomes, as the prevalence of the studied disorder (50% in the tested group versus 14.7% in the general population) determines the pre-test probability and thus the chance of correct diagnosis (65,66).
Furthermore, serious bias was introduced in the “flow and timing” section of the articles, as some articles did not properly describe the order of receiving the index test and the reference standard diagnosis. Other studies did not include all participants in the analysis (11,12,34,37,38,40,42,43,48,49,59). The introduced biases on both domains resulted in a downgrade of the certainty of evidence on all measurement instruments except for the Migraine Assessment Tool (35). However, as this tool is only studied in one article, the level of evidence was also downgraded for imprecision. Therefore, there are no measurement instruments for migraine with a high level of evidence.
Combined migraine and TTH measurement instruments
Out of the six measurement instruments that looked at both migraine and TTH, only the German language questionnaire is supported by two articles (13,57). However, due to a serious risk of bias and indirectness, there is only a low level of evidence for this questionnaire. In both studies, only patients with headaches that were also studied in the questionnaire were included, which introduced a serious selection bias (13,57). Similarly, the Computerized Headache Assessment Tool (CHAT) presented a sensitivity of 1.00 for both migraine and TTH, but no true negatives or false positives were available, and no specificity was presented (54). In this study, the gold standard was the diagnosis established by a headache nurse (54). As stated before, this is an unreliable gold standard for a headache diagnosis (63).
The seven articles differed in population. Some study samples were retrieved from the general population (53,55,56), others from urgent care or family practice (54), and others from a headache clinic (13,14). In one study, the sample origin was unclear (52). The prevalence used in the GRADE recommendations was for the general population, but in health care settings the prevalence is higher. This increases the pre-test probability of a positive headache diagnosis. This must be taken into consideration when interpreting the results of those studies (14,54,56).
Regarding the flow and timing of these studies, not all participants received both the index test and reference standard (52–54,56). Other studies did not include all participants in the final analyses (13,14,53,55). By excluding participants in these ways, the generalization of results is compromised. All these components resulted in very low to moderate level of evidence for the six combined migraine and TTH measurement instruments.
Cervicogenic headache measurement instruments
Both articles studying the diagnostic accuracy of the cervical flexion rotation test (CFRT) for CGH showed selection bias, as participants were selected based on headache type (57,58). In one study, the sensitivity and specificity were both 0.70 (57), whereas in the other study the sensitivity was 0.91 and the specificity 0.90 (58). In the study with lower diagnostic accuracy, the control group consisted of other headache forms (migraine or multiple headache forms) (57). This makes differentiating between headache types more difficult as other headaches are related to neck problems (5,67,68). The study with higher diagnostic accuracy compared patients with CGH with asymptomatic participants and several patients with migraine (58), which made it easier to recognize the CGH. When this test is applied in the clinic, patients will have a headache complaint and will not be asymptomatic, so the sensitivity and specificity of 0.70 will likely be more accurate.
Just as in the current review, another recent systematic review describing physical examination tests for screening and diagnosis of CGH, the CFRT was determined to be the most useful test with the highest reliability and strongest diagnostic accuracy (69). There is, however, a debate in the literature on the reliability of manual ROM tests of the spine (70). Inter-examiner reliability for the cervical spine passive ROM ranged from poor to substantial. The manual tests of the upper cervical spine (C1/2, C2/3) have a fair to substantial level of reliability (70). The reliability of the CFRT has been established to be good to excellent (71). However, CFRT reliability was established by comparing a manual diagnosis of C1/2 dysfunction with the outcome of the CFRT (71). If the reliability of the manual diagnosis of dysfunction is only fair, then the reliability of the CFRT is questionable. However, in another study where the cervical ROM was measured with a device (CROM), a significant difference was found between the ROM in patients with CGH compared to patients with migraine and healthy subjects, which confirms the findings of the included papers of this review (57,58,72). In conclusion, the CFRT is a valid and reliable measure to recognize CGH, though the reliability is higher when using a CROM device rather than assessing the ROM manually.
Strengths and limitations of the study
The current review is, to the authors' knowledge, the first review establishing an overview of the diagnostic accuracy of measurement instruments for headaches associated with musculoskeletal symptoms. By using the QUADAS-2 and COSMIN tool, the methodological quality was assessed in a well-known and internationally accepted manner (24,25). By using the GRADE recommendations, the findings of this review are transparent and easy to translate to the clinical practice (27).
There are, however, also a few limitations of this study. Comparison between index and reference test was not easy, as the validation of the index test was performed in a different population compared to the population in which the reference standard was developed. It is important to keep in mind that the diagnostic accuracy is dependent on the prevalence of the target condition in the population; the study sample needs to be taken into consideration when interpreting the results. The prevalence of the target condition is the pre-test probability of a person having that condition, and a good measurement instrument will increase the chance of recognizing the target condition correctly. However, if the study sample is biased by having a very high prevalence in the target condition whereas the measurement instrument would normally be used in a setting with a low prevalence of the target condition, the diagnostic accuracy is not valid for that specific population. Validation studies of measurement instruments should therefore always test the measurement instrument in the population and setting for which it is being validated.
Also, some measurement tools were used in different languages and cultures, which must also be considered when interpreting these results. In this review, great variability was found between the different studies, as illustrated in the S-ROC curves in Figure 3(a) and (c). These S-ROC curves show the uncertainty of the findings compared to reality, so the pooled data should be used with caution. The clear gap between the diagnostic accuracy of some measurement instruments between studies showed the necessity of conformation by multiple studies within the same population and against the same reference standard.
Implications for practice
The findings of the current review support the use of the ID-Migraine questionnaire to diagnose migraine with a moderate level of certainty (Table 4). However, patients with headaches often experience multiple headache forms (7,13,74). This warrants a measurement instrument that can diagnose more than one headache. From the questionnaires that looked at both migraine and TTH, the HSQ has the highest level of evidence within this review (Table 5). To establish if there is a migraine and/or a TTH present, this questionnaire is therefore recommended. As CGH needs to be confirmed by physical examination (15), the CFRT is recommended (Table 6). No other measurement instruments for secondary headache related to musculoskeletal complaints were found. Therefore, for these headache types, such as secondary headache attributed to temporomandibular disorders or headache attributed to whiplash injury, no recommendations can be made.
Implications for future research
Currently, there are many questionnaires for migraine and TTH, most of them validated by one study. Future research should use the recommended measurement instruments and validate them in different samples of the same population to increase the level of certainty that the diagnostic accuracy is realistic. The QUADAS-2 and COSMIN tools should be used when designing their studies to enhance their methodological quality.
Furthermore, additional clinimetric properties of measurement instruments for headache should be examined. Clinimetric properties such as reliability and responsiveness are important to enhance the care of headache complaints and monitor the course of these complaints. For that reason, the authors are conducting a complementary review to establish the clinimetric properties of measurement instruments for these symptoms and factors (Figure 2).
In conclusion, only a few measurement instruments reached a moderate level of evidence for the diagnostic accuracy. For migraine, the ID-Migraine is recommended. For migraine and TTH, the HSQ is recommended, and the CFRT is advised to be used for CGH. However, more studies are needed to validate these instruments further to enhance the level of evidence.
Article highlights
ID-migraine is the most studied diagnostic accuracy measurement instrument for migraine and has a moderate level of certainty. Six measurement instruments are examined that establish the diagnostic accuracy for both migraine and tension-type headache. The Headache Screening Questionnaire has the highest level of evidence to screen for both migraine and tension-type headache. Only the Cervical Flexion Rotation Test studies the diagnostic accuracy for cervicogenic headache, but the level of evidence is very low.
Supplemental Material
Supplemental Material1 - Supplemental material for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms
Supplemental material, Supplemental Material1 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia
Supplemental Material
Supplemental Material2 - Supplemental material for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms
Supplemental material, Supplemental Material2 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia
Supplemental Material
Supplemental Material3 - Supplemental material for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms
Supplemental material, Supplemental Material3 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia
Supplemental Material
Supplemental Material4 - Supplemental material for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms
Supplemental material, Supplemental Material4 for The diagnostic accuracy of headache measurement instruments: A systematic review and meta-analysis focusing on headaches associated with musculoskeletal symptoms by Hedwig A van der Meer, Corine M Visscher, Tom Vredeveld, Maria WG Nijhuis van der Sanden, Raoul HH Engelbert and Caroline M Speksnijder in Cephalalgia
Footnotes
Acknowledgements
This study was funded by the Dutch Organisation for Scientific Research (Nederlandse Organisatie voor Wetenschappelijk Onderzoek – NWO) [grant number 023.006.004]. There is no conflict of interest within this study.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Dutch Organisation for Scientific Research (Nederlandse Organisatie voor Wetenschappelijk Onderzoek – NWO) [grant number 023.006.004].
Registration
This review is registered on PROSPERO (CRD42017062472).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
