Abstract
Objective
To evaluate the quality of existing clinical practice guidelines for headache management and their main recommendations.
Background
Evidence-based clinical practice guidelines have been developed to support the clinical decision-making. However, to achieve this goal, the quality of these guidelines must be ensured.
Methods
A systematic search for clinical practice guidelines for headache management was conducted in the PubMed database, in websites of known guideline developers and in websites of known headache associations. The quality appraisal was performed through the Appraisal of Guidelines for Research and Evaluation II method.
Results
Twelve guidelines were evaluated. The domains of rigor of development, applicability, and editorial independence, which most influence the overall quality of guidelines, had the lowest average scores and the highest standard deviation rates (61% ± 23; 37% ± 20; 53% ± 31). The main recommendations regarding medication use for acute treatment of episodic tension-type headache and migraine in adult patients consisted of paracetamol, acetylsalicylic acid, and other nonsteroidal anti-inflammatory drugs in all guidelines.
Conclusions
The statistical results indicate that the appraised guidelines have room for both individual and collective improvement. In addition, there is a well-established medication recommendation pattern among all guidelines evaluated.
Introduction
The Evidence-Based Medicine movement was introduced in the early 1990s, being later expanded to Evidence-Based Health Care, with the primary goal to ensure that clinical decision-making was guided and supported by the best currently available evidence from scientific research studies (1–3). To achieve this goal, evidence-based clinical practice guidelines (CPGs) have been developed to provide recommendations based on a systematic search, selection and evaluation of the existing literature and potentially improve health care decisions as well as their overall quality and outcomes (4,5). However, for CPGs to achieve their intended purpose, their general quality and, therefore, their validity and reliability, must be ensured by appropriate methods and rigorous transparent development strategies (6).
Headache disorders affect more than 3 billion people worldwide (7). Over 90% of patients who seek primary care practitioners with a headache complaint have a primary headache disorder, most commonly tension-type headache (TTH) and/or migraine (8,9). Both TTH and migraine show a peak in prevalence between ages 35 and 39 and are responsible for almost all of the burden related to headache disorders, whether they are direct costs associated with health care services or indirect costs such as reduced productivity, among others (9–11). TTH and migraine treatments include acute therapy to relieve pain related to individual attacks and preventive therapy to minimize the attacks’ frequency, severity, and duration (12,13). However, most headache patients do not receive medical care and rely mainly on self-treatment with over-the-counter acute medications (14,15).
Taking into consideration the Evidence-Based Health Care’s principles, the epidemiologic characteristics, and the relevant socioeconomic impact of primary headache disorders, as well as the medication use pattern among most headache patients, this article seeks to evaluate the quality of existing CPGs regarding the acute treatment of episodic TTH and migraine in adult patients, as well as to identify the main recommendations given by those CPGs.
Methodology
Guidelines search and selection
To identify the existing CPGs, a systematic search was carried out in July 2020. The search was conducted in the PubMed database and in websites of known guideline developers, such as the National Institute for Health and Care Excellence (NICE) and the Scottish Intercollegiate Guidelines Network (SIGN). The search also included websites of known headache associations, such as the British Association for the Study of Headache (BASH) and the European Federation of Headache (EFH), as well as the American (AHS), the Brazilian (BHS), the Canadian (CHS) and the International Headache Societies (IHS).
The strategy for this search combined MeSH terms with Boolean operators, which resulted in the following: “Headache”[Title] OR “Migraine”[Title] AND “Guideline”[Title/Abstract]. The inclusion criteria were guidelines published after July 2010, written in English or Portuguese and with free full text. The exclusion criteria were guidelines that did not include acute treatment for episodic TTH or migraine in their scope, guidelines designed for pediatric care or other specific populations such as pregnant women, guidelines applied to emergency setting or inpatient management of headache, guidelines focused on only one kind of treatment and guidelines based on consensus instead of evidence.
Guidelines’ evaluation
To evaluate the quality of the CPGs selected, the Appraisal of Guidelines for Research and Evaluation II (AGREE II) instrument was chosen (16). This tool considers the entire CPG creation process since it appraises 23 key items organized into six domains: Scope and purpose, stakeholder involvement, rigor of development, clarity of presentation, applicability and editorial independence (17,18). Each item was scored by four independent appraisers (BMCSA, JMAV, LBPB and SRO), who had previous knowledge of the method, on a seven-point Likert scale according to how poorly or how well each feature of the CPG met the criteria established by the AGREE II users (17–19). Following that, a percentage of suitability between 0% and 100% of each domain was obtained from the sum of the scores attributed by all appraisers and the possible maximum score (20,21).
To determine the degree of agreement beyond chance between the appraisers, the quadratic weighted kappa statistic was calculated (22,23). The interrater reliability analysis performed was the Light kappa, a variant of Cohen’s kappa, commonly used for nominal variables. Therefore, the analysis was performed considering scores 1 and 2 as “low”, scores 3 to 5 as “intermediate” and scores 6 and 7 as “high” (23,24).
In addition, the arithmetic average and the median were calculated to measure the central tendency of the data (22). The standard deviation, the 95% confidence interval and the interquartile range were also calculated to establish how dispersed the data were and if the mean scores alone were misleading due to potential outliers (25).
Results
Guidelines selected
After duplicates were accounted for, the search found 40 guidelines. Upon consideration of the exclusion criteria, as detailed in Figure 1, 12 guidelines (10,11,26–35) were selected for further evaluation, as shown in Table 1.

Flowchart of the search and selection of existing guidelines for the acute treatment of TTH and migraine in adult patients.
Guidelines selected for further evaluation.
1Year of publication or, where applicable, year of the latest update found.
Kappa coefficient
The quadratic weighted kappa statistic calculated after the first round of appraisal by AGREE II was 0.4546, which is considered a moderate agreement between the appraisers (22,23). This agreement rate was deemed adequate for the further analysis and interpretation of the collected data (24).
Guidelines’ quality appraisal
The percentage of suitability calculated for each domain of each CPG, as well as their statistical measurements, is shown in Table 2.
Percentage of suitability (%) obtained from the AGREE II quality appraisal domains.
D1: scope and purpose; D2: stakeholder involvement; D3: rigor of development; D4: clarity of presentation; D5: applicability; D6: editorial independence.
2Arithmetic average ± confidence interval.
³Standard deviation.
4Median.
5Interquartile range
The domain with the highest average score was scope and purpose (D1), with 90% suitability. This domain had no CPGs with scores under 60%. It also had the second lowest standard deviation observed, 11%, and a median of 92%. The second domain with the highest average score was clarity of presentation (D4), with 85% suitability. This domain had all CPGs scores over 70% and the lowest standard deviation observed, 9%, as well as a median of 83%. The third domain with the highest average score was stakeholder involvement (D2), with 68% suitability. This domain also had the third lowest standard deviation observed, 17%, and a median of 67%. In this domain, two CPGs received scores under 50% (G6 = 47%; G8 = 42%). These three domains had the three lowest interquartile ranges (D1 = 9.5%; D2 = 17.75%; D4 = 13.5%).
The three domains with the lowest average scores, as well as the highest standard deviations observed, were rigor of development (D3), applicability (D5) and editorial independence (D6). They had, respectively, a 61% average score with a standard deviation of 23%, a 37% average score with a standard deviation of 20% and a 53% average score with a standard deviation of 31%. These three domains also had the three highest interquartile ranges (D3 = 38%; D5 = 19.25%; D6 = 47.5%) and medians of 66% for D3, 32% for D5 and 59% for D6. Still regarding these domains, scores over 60% were achieved in D3 and D6 by seven (G2 = 84%, G3 = 79%, G6 = 67%, G7 = 92%, G9 = 77%, G10 = 83%, G12 = 65%) and six (G2 = 83%, G3 = 67%, G6 = 60%, G7 = 96%, G10 = 73%, G12 = 92%) guidelines, respectively, but just two guidelines in D5 achieved scores within that range (G7 = 77%; G10 = 65%).
The CPG with the highest scores was G7, with all scores but one (D5 = 77%) above 90%. The CPG with the lowest scores was G8 with all scores but two (D1 = 79; D4 = 76%) under 50%. The maximum score possible (100%) was attributed to two guidelines (G7, G9) in D1 and to one guideline (G10) in D4. In addition, scores under 20% were attributed to guideline G2 (D5 = 13%) and scores under 10% were attributed to guidelines G5 (D6 = 4%) and G6 (D5 = 9%).
Guidelines’ grading systems
Grading systems provide an insight into the basis used by guidelines to offer management recommendations (36). The grading systems used by the CPGs selected are shown in Table 3.
Rating systems used by the selected guidelines.
Grading systems were classified as unavailable when this information was intentionally omitted by the guidelines’ authors or unidentified in the document by the guidelines’ appraisers. The same approach was applied when CPGs provided an evaluation of the level of evidence associated with the recommendations but failed to provide a grading system for the recommendation itself, given that the strength of a recommendation is conceptually different from the quality of the evidence (35).
Grading systems were classified as unnamed when the existence of a grading system was acknowledged by the guideline, but it was not adequately referenced in the document. In contrast, when a grading system was expressively mentioned and used, it was accordingly noted.
Grading systems were classified as specific when the guidelines’ authors designed their own grading system. In this case, further information about the grading criteria was added as an observation.
As a result, the grading system was classified as unavailable for four guidelines (G1, G4, G5, G12), as unnamed for one guideline (G11) and as specific for six guidelines (G2, G3, G6, G8, G9, G10). One guideline (G7) used a well-known grading system, the GRADE method.
Guidelines’ main recommendations
The guidelines’ recommendations ranged broadly from clinical assessment of headache patients and diagnostic procedures to headache management and follow-up instructions. Recommendations related to the acute treatment of episodic TTH and migraine in adult patients were addressed as main recommendations, taking into consideration this article’s focus on this topic. Recommendations for non-pharmacological treatment were not considered, because they concentrated mainly on preventive treatment. Eight main recommendations (R) were identified, as outlined in Table 4. The range of doses shown in Table 4 represents the lowest and the highest recommended doses identified across all guidelines.
Main recommendations regarding the acute treatment of episodic TTH and migraine in adult patients.
Recommendations for medication prescription approaches (R1) were explicitly mentioned by seven guidelines (G1, G4, G6, G7, G10, G11, G12). Three guidelines (G1, G4, G12) recommended a stepped approach, in which medication is prescribed according to a treatment ladder that must be climbed from the start by all patients. Two guidelines (G6, G11) recommended a stratified approach, in which medication is prescribed according to the headache attack’s severity and associated disability experienced by each patient. Two guidelines (G7, G10) recommended that the choice of medication prescription approach should be based on patient preference.
Recommendations for patient education and advisement about medication use (R2) were made by all CPGs. All CPGs recommended patient education regarding the importance of acute medication use restriction (in general, to no more than 2 days per week) given the risk of medication overuse headache (MOH) development. Nine guidelines (G1, G4, G5, G6, G7, G9, G10, G11, G12) recommended patient education regarding early acute medication intake during the attack (in general, as soon as possible). Guidelines’ recommendations regarding patient advisement on ideal medication dose differed; however, two guidelines (G2, G6) defined the ideal medication dose as the lowest effective well-tolerated dose.
Recommendations for the acute treatment of TTH (R3) were provided by 10 guidelines (G1, G2, G3, G4, G5, G6, G8, G9, G11, G12). All of them recommended simple analgesics and nonsteroidal anti-inflammatory drugs (NSAIDs) as the first-line treatment. All CPGs, expect one (G8), which did not specify NSAIDs medication, recommended acetylsalicylic acid, paracetamol and ibuprofen. Two guidelines (G3, G12) also recommended combined analgesics with caffeine as first-line treatment, while four others (G2, G5, G6, G9) recommended these medications as second-line treatment because of their higher associated risk of MOH development. Recommendations regarding metamizole conflicted, given that one CPG was in favor of its use (G6) while another (G12) was against it, because of its risk of agranulocytosis. Eight guidelines (G1, G2, G4, G5, G6, G8, G9, G12) recommended against the routine use, if any, of certain medications (R4), such as triptans, opioid analgesics (including codeine), muscle relaxants, barbiturates, and botulinum toxin.
Recommendations for the acute treatment of migraine (R5) were made by 11 guidelines (G1, G3, G4, G5, G6, G7, G8, G9, G10, G11, G12). All of them considered simple analgesics, nonsteroidal anti-inflammatory drugs (NSAIDs) and triptans as first-line treatment. Like TTH’s recommendations, all CPGs except one (G8) recommended acetylsalicylic acid, paracetamol and ibuprofen; however, paracetamol was deemed less effective than other medications for migraine. Therefore, it was advised only as a second-line treatment for mild to moderate headache attacks for patients with NSAID use contraindication. All CPGs, except one (G8), which did not cite any triptan medication, recommended sumatriptan. Three guidelines (G3, G5, G6) recommended combined medication of acetylsalicylic acid, paracetamol, and caffeine. Eight guidelines (G4, G6, G7, G8, G9, G10, G11, G12) recommended against the routine use, if any, of certain medications (R6), such as opioid analgesics (including codeine), barbiturates and alkaloids ergots. However, dihydroergotamine, in intranasal and subcutaneous formulations, was recommended by three guidelines (G5, G7, G9) as a last resource for patients who failed to achieve the intended therapeutic outcome with other medications.
Recommendations on management of the common migraine symptoms nausea and vomiting (R7) were provided by 10 guidelines (G1, G4, G5, G6, G7, G8, G9, G10, G11, G12). All recommended the use of antiemetics as adjunct therapy. Metoclopramide was mentioned by all CPGs, prochlorperazine was mentioned by seven guidelines (G1, G5, G6, G7, G8, G10, G11) and domperidone was mentioned by seven guidelines (G1, G4, G6, G7, G9, G11, G12). The most frequently recommended dose for these medications was 10 mg. Recommendations on the use of non-oral formulations, when possible, were provided by four guidelines (G5, G7, G8, G12).
Recommendations on medication prescription for pregnant women (R8) were given by eight guidelines (G1, G4, G5, G6, G7, G8, G9, G10). Overall, medication use should be avoided as much as possible. However, if needed, these CPGs recommended paracetamol, since it is deemed the safest medication available.
Discussion
Guidelines’ quality appraisal
As previously mentioned, CPGs can only suitably achieve their intended purpose if their quality is assured (6). It is unclear how different domain scores should be weighted to determine whether a guideline should be classified as of low or high quality, since the AGREE II instrument doesn’t provide a distinct cut-off point (37,38). An online survey has shown that a third of AGREE II’s users apply a cut-off point ranging from 50% to 83%, although most did not clarify how their chosen parameter was generated (39).
Within the studies that established a cut-off point, the rigor of development domain (D3) was often elected as one of (if not the only) the strongest indicators of guideline quality, since it evaluates minimum bias and evidence-based guideline development (39,40). Alongside D3, the applicability (D5) and editorial independence (D6) domains have been found to be the other two domains that strongly influence an overall assessment of guideline quality, since they appraise guidelines’ facilitators and barriers for proper implementation, as well as transparency for potential bias of guidelines’ authors through disclosure of conflict of interests (40,41).
In this article, the three domains that most influence guideline quality were the ones with the lowest average scores and the highest standard deviations, as well as the highest interquartile ranges. This finding indicates that the most critical aspects of guideline quality have room for individual improvement, which would translate into higher average scores, and for collective improvement, which would translate into lower standard deviations and lower interquartile ranges, producing, consequently, a more cohesive result.
If a 50% cut-off point in D3 was the only criterion considered, seven guidelines (G2 = 84%; G3 = 79%; G6 = 67%; G7 = 92%; G9 = 77%; G10 = 83%; G12 = 65%) evaluated by this article would be deemed of high quality. If a 50% cut-off point in D3 as well as in any two other domains were the criteria considered, the seven previous guidelines still be considered of high quality.
However, if a 50% cut-off point in D3, D5 and D6 were the criteria considered, just two guidelines (G7 = 92%, 77%, 96%; G10 = 83%, 65%, 73%) evaluated by this article would be deemed of high quality, both of which are guidelines specific to the pharmacological treatment of migraine. The higher quality found among this specific field might be due to the fact that most headache studies focus on migraine and evidence-based health care focuses greatly on drugs and devices (42,43).
The guideline with the highest average score (G7) described its development process through a summary structured in line with the 23 key items of the AGREE II instrument, which grants a potential basis to the result obtained. However, this finding aligns with the AGREE II instrument’s proposal to not only asses CPGs’ quality, but also to provide a methodologic strategy for guideline development and to recommend which information should be presented by guidelines to ensure that more transparent, valid and reliable CPGs are produced (44).
Guidelines’ grading systems and main recommendations
As previously mentioned, grading systems provide an insight into the basis used by guidelines to offer management recommendations (36). However, the grading exercise is undermined if there is an abundance of grading systems in use, since the employment of different grading systems by multiple organizations hinders effective communication and favors confusion (36,45). None of the guidelines appraised in this article used the same grading system. This finding suggests that the issue of disagreement on which grading system should be elected as the gold standard, which arouses concern over the validity of any grading system, remains unresolved (46).
Contrary to the wide variation regarding the guidelines’ grading systems, the guidelines’ main recommendations presented a well-established pattern for the acute treatment of episodic TTH and migraine in adult patients, since all eight main recommendations were made by, at least, more than half of the selected CPGs (R1 = 58%, R2 = 100%, R3 = 83%, R4 = 67%, R5 = 92%, R6 = 67%, R7 = 83%, R8 = 67%). The only conflicting recommendation was regarding metamizole, which is most likely due to the different national regulations in relation to its use (47,48).
Article limitations
This article has three main limitations. First, its inclusion criteria of language and text access and its database choice for CPGs search, given that CPGs written in languages other than English and Portuguese, CPGs that had restricted access, and CPGs that were not indexed in the selected databases were beyond this article’s reach. Second, the AGREE II user’s manual points out that some of the information required for CPGs’ quality appraisal might not be added to the CPG itself but registered in a different document (16). To standardize the research, as well as to evaluate whether guidelines’ support documents, if they exist, are easily accessible, a methodological decision was made to include on the AGREE II’s quality appraisal only additional information extracted from support documents that could either be retrieved from the online systematic search performed or from their correspondent guideline. Therefore, it is possible that CPGs’ development information, although existing, was not accessible to the appraisers, which impacts directly the scores given to each domain of each CPG. Third, and last, the lack of a clear cut-off point for CPGs’ quality appraisal (37,38) hampers the analysis and conclusion about which CPGs are deemed of low or high quality.
Conclusion
This article concludes that the lack of a clear cut-off point for CPGs’ quality appraisal by the AGREE II instrument hampers the analysis and conclusion about which CPGs are deemed of low or high quality. However, when taking into consideration the average scores achieved by the three domains that most influence the CPGs’ overall quality (rigor of development, applicability, and editorial independence), it is clear that the appraised CPGs have room for both individual and collective improvement. In addition, this article identified that the acute treatment of episodic TTH and migraine in adult patients has a well-established pattern across all CPGs evaluated.
Article highlights
Twelve guidelines regarding headache management from Canada, the USA, the UK and the European Union were evaluated through the AGREE II quality appraisal method. The domains of rigor of development, applicability, and editorial independence, which most influence the overall quality of guidelines, had the lowest average scores and the highest standard deviation rates as well as the highest interquartile ranges. The main recommendations regarding medication use for acute treatment of episodic tension-type headache and migraine in adult patients consisted of paracetamol, acetylsalicylic acid, and other non-steroidal anti-inflammatory drugs in all guidelines. Although there is a well-established medication recommendation pattern among all guidelines evaluated, the appraised guidelines have room for both individual and collective improvement.
Footnotes
Acknowledgments
Laís Bié Pinto Bandeira and Sara Rodrigues Oliveira contributed to the development of this research. Gabriela Maria de Albuquerque Vaz contributed to the revision of this article.
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
