Abstract
Background
Depression is frequently described to occur in migraine, and depression screening questionnaires are commonly used to evaluate depressive symptoms in patients with migraine. The present study aimed to investigate how the most common depression screening tools are used in migraine studies to determine whether they are applied and interpreted correctly.
Methods
PubMed was systematically searched, and we included any study using the Beck Depression Inventory (BDI), Patient Health Questionnaire-9 (PHQ-9), Hospital Anxiety Depression Scale (HADS) or Hamilton Depression Rating Scale (HAM-D). The study included adults diagnosed with migraine based on the International Classification of Headache Disorders (ICHD-2 or ICHD-3).
Results
The literature search generated 78 studies. Thirty-five (45%) of the included studies used a depression screening tool as evidence of depression. This applied to 53, 46, 47 and 13% of studies using PHQ, BDI, HADS and HAM-D, respectively. Only one study out of 35 confirmed the diagnosis with a diagnostic interview. The data presentation and interpretation across the studies was highly heterogeneous.
Conclusions
Screening tools as evidence of depression in patients with migraine may lead to inaccurate estimates of depression among migraine patients. There is a need for guidelines on and validation of depression screening tools in patients with migraine.
This is a visual representation of the abstract.
Keywords
Introduction
Migraine is a primary headache disorder that affects more than one billion people worldwide. The one-year prevalence rate is around 10% overall, in the range 4.5–6.0% in men and 14.5–18.0% in women (1–3). It is a disorder that affects quality of life and is strongly associated with several comorbidities, including psychiatric diseases such as anxiety and depression (3–6). The association between migraine and depression has previously been investigated, and it has been reported that patients with migraine are two to four times more likely to develop lifetime major depressive disorder (7,8).
Often, in studies describing the comorbidity between migraine and depression, depression screening questionnaires are used to assess the presence and severity of depressive symptoms in patients with migraine. However, in some cases, the questionnaires are used to establish the presence of depression, which then results in the percentage of patients scoring above a certain cut-off point being described as having a genuine depression (9,10). This can result in erroneous conclusions resulting in an overestimation of the prevalence of depression among migraine patients. This may particularly be the case when using the questionnaires in patient groups for whom the screening tools have not been validated. Depression screening questionnaires are effective tools to roughly screen for depression; however, correct diagnosis of depression should be conducted as described in national and international guidelines by a psychiatrist or general practitioner, and not based on a single questionnaire (11,12). One meta-research review (9) found that the prevalence of depression was 31% based on screening or rating tools, 22% for combinations and 17% for diagnostic interviews. Thus, a two-stage estimation method that combines screening questionnaires and diagnostic interviews can reduce resource requirements and generate valid prevalence estimates for depression in migraine patients (10).
No research has been conducted to systematically investigate the misuse or misinterpretation of depression screening questionnaires, although such findings have been reported by Levis et al., (9) who found that studies that used screening tools or rating scales instead of diagnostic interviews, did not disclose this in abstracts, and described the prevalence as being for “depression” or “depressive disorders” even though disorders were not assessed. In the present study, we aimed to investigate how the most common depression screening and assessment tools are used in migraine studies, to show whether they are applied and interpreted correctly. Specifically, we investigated whether the use of screening scores led to the conclusion of patients having depression or depression symptoms. We systematically reviewed the Beck Depression Inventory (BDI), Patient Health Questionnaire-9 (PHQ-9), Hospital Anxiety Depression Scale (HADS) and Hamilton Depression Rating Scale (HAM-D) (13). Furthermore, we compiled the studies that had validated the different screening tools in patients with migraine.
Methods
The screening tools
Patient Health Questionnaire-9 (PHQ-9)
The PHQ is a self-administered version of the PRIME-MD (Primary Care Evaluation of Mental Disorders) diagnostic instrument for common mental disorders. The PHQ-9 is the depression module, which scores each of the nine American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders Fourth Edition (DSM-IV) criteria as “0” (not at all) to “3” (nearly every day). A PHQ-9 score ≥10 has a sensitivity of 75–88% and a specificity of 85–90% for major depression (14–16). However, one systematic review(15) of 42 studies found that the overall sensitivity of PHQ-9 ranged from 37 to 98%, specificity from 42 to 99%, positive predictive value (PPV) from 9 to 92% and negative predictive value (NPV) from 80 to 100% (17) PHQ-9 scores of 5, 10, 15 and 20 represent mild, moderate, moderately severe and severe depression, respectively. Other than being a reliable and valid measure of depression severity, some studies claim that the PHQ-9 can be used to make criteria-based diagnoses of depressive disorders. However, one study(18) found that PHQ-9 ≥10 substantially overestimates depression prevalence.
One clinical-based study(19) of 132 patients with migraine has validated the PHQ-9 in migraine patients. At a cut-off score of seven, the PHQ-9 had a sensitivity of 79.5%, a specificity of 81.7%, a PPV of 64.6% and an NPV of 90.5%. The area under the curve (AUC) was 0.88. Another study(20) of 300 consecutive migraine patients assessing the performance of PHQ-9 and HADS in patients with migraine found that, at a cut off-score of 10, the PHQ-9 demonstrated 82.0% sensitivity and 79.9% specificity with a PPV of 56.9% and a NPV of 93.2%. The AUC was 0.89. Sensitivity was reduced with higher cut-off scores, whereas specificity was increased. It was found that in patients with migraine, a cut-off score of 14 for the PHQ-9 has the highest accuracy (sensitivity of 60.0% and a specificity of 93.5%).
Hospital Anxiety Depression Scale (HADS)
HADS is a self-assessment scale for detecting states of depression and anxiety, as well as measuring the severity of the emotional disorder (21,22). It consists of 14 items, seven of which measure anxiety (HADS-A) and seven of which measure depression (HADS-D) (23). Each item is scored on a scale of 0–3, resulting in an overall score of 0–21 for both HADS-A and HADS-D. The cut-off value of the anxiety and depression subscales on HADS varies from study to study; however, a cut-off point of eight on both scales is commonly used (10). None of the items include somatic symptoms common to both physical and mental disorders, such as insomnia, loss of appetite or fatigue, which is useful for migraine patients as a means of preventing overestimates of psychological distress as a result of mistaking migraine-related physical symptoms for somatic symptoms secondary to anxiety or depression. A systematic review and individual participant data meta-analysis(23) found that the sensitivity and specificity of HADS-D were 74% (68–79%) and 84% (81–87%) for a cut-off score of eight or higher. One clinical-based study(20) of 300 consecutive migraine patients assessing the performance of the PHQ-9 and HADS in patients with migraine found that at a cut-off point of eight, the HADS demonstrated 86.5% sensitivity and specificity, with a PPV of 68.2% and NPV of 95.0%. The AUC was 0.92. It was determined that for the HADS-D a cut-off score of 11 or greater was the most accurate for patient classification (sensitivity of 59.6% and specificity of 98.1%). HADS-D had a better performance than the PHQ-9. Another clinical study(24) of 62 headache patients found that a total HADS score of 10 or greater was the optimal cut-off point for depressive disorders (sensitivity 85.7% and specificity 33.3%).
Beck Depression Inventory (BDI)
The BDI is a self-report instrument intended to assess the existence and severity of symptoms of depression as listed in the DSM-IV (25). It consists of 21 questions, each scored on a scale of 0–3. Although both the first edition of BDI and BDI-II use a 21-item questionnaire, the cut-off values for depression are different. On BDI, a score of 0–9 indicates minimal depression, 10–18 indicates mild depression, 19–29 indicates moderate depression and ≥30 indicates severe depression, whereas, on BDI-II, a score of 0–13 indicates minimal depression, 14–19 indicates mild depression, 20–28 indicates moderate depression and ≥29 indicates severe depression (26). The BDI is not meant for establishing a diagnosis of major depressive disorder, but it has been validated for identifying the presence and severity of symptoms and screening for probable cases of major depression (27). There are as of yet no studies validating the BDI in patients with migraine.
Hamilton Depression Rating Scale (HAM-D)
The HAM-D is used to quantify the severity of symptoms of depression in patients already diagnosed with a depressive disorder (28,29). It is designed to be administered by a trained clinician and contains 17 items rated on either a three- or five-point scale, with the sum of all items making up the total score. A score of 0–7 indicates no depression, 8–16 indicates mild depression, 17–23 indicates moderate depression, and ≥24 or higher indicates severe depression (30). Concern about the reliability and validity of the HAM-D has been raised, as well as concern about the methodology of much research using the scale to assess the severity of psychiatric disorders (31–33). It continues to be used in clinical trials to asses clinical changes during treatment, and less in clinical practice (34). There are as of yet no studies validating the HAM-D in patients with migraine.
Search strategy
PubMed was systematically searched by one investigator (JA) and last updated on 9 April 2023. We searched for studies that used one of the four depression screening methods for assessing the presence and severity of depression or depressive symptoms. One investigator (JA) screened the titles and abstracts, and two investigators (JA and FA) read the full texts of the studies. The PRISMA flowchart was filled out (Figure 1).

PRISMA flow chart.
The main key words used were migraine disorders, Beck Depression Inventory, Patient Health Questionnaire, Hospital Anxiety Depression Scale and Hamilton Rating Scale for Depression. The complete search string can be seen in Figure 2.

Full search string.
Inclusion criteria were (i) patients diagnosed with migraine according to the International Classification of Headache Disorders (ICHD-2(35) and ICHD-3(36)) or the Silberstein–Lipton criteria(37); (ii) any study evaluating depression with the use of one of the four depression-screening tools; and (iii) patients ≥18 years old.
Episodic migraine was defined as headache occurring 0–14 days/month. Chronic migraine was defined as headache occurring on 15 or more days/month for more than three months, which, on at least eight days/month, has the features of migraine headache. Probable migraine was defined as migraine-like attacks missing one of the features required to fulfil all criteria for a type or subtype of migraine coded above, and not fulfilling criteria for another headache disorder.
Exclusion criteria were (i) other headache diagnoses such as tension type headache, vestibular migraine, post-traumatic headache and medication overuse headache; (ii) migraine patients with comorbid diseases such as traumatic brain injury and restless legs syndrome, except for obese/overweight patients; (iii) not including number or proportion of patients with a certain score, or the mean score from the screening tool; and (iv) study not available in English language.
We extracted information on the data collection method, headache diagnoses (episodic or chronic migraine), proportion of patients with a score above or below certain cut-off points if available, the mean scores if available and the authors’ conclusion to the score (whether the authors concluded that the patients had depression or simply depressive symptoms). Depressive symptoms are symptoms of depression that do not necessarily fulfill the criteria for major depression mentioned in the diagnostic systems. We considered the following three terms as erroneous conclusions, if the authors did not specify that the diagnosis was made using the ICD-10 or DSM criteria: “Depression, level of depression and depressive disorders”.
If the scores were not used to conclude either, we noted it as “not discussed”. All information was added to four separate tables for each one of the screening methods, and totals were calculated if applicable. The results were summarised in a separate table. If a study compared the scores from the depression screening questionnaires with the results from a diagnostic interview for depression, this was noted in the tables, and we did not include them in the calculation of the number of studies that used the questionnaires as evidence of depression. The majority of the studies did not provide screening scores or prevalence rates for patients with migraine with and without aura. Therefore, we did not collect such data.
In studies where the depression screening scores were used to compare two or more different patient groups with each other, only the scores of migraine patients were included. If scores before and after a treatment were specified, only baseline scores were included.
The mean scores of migraine patients who were divided into subgroups were used to calculate the mean of means.
To make it easier to summarize all four tools in the course of the text, we have chosen to mention them under the collective term “screening tools”, although we recognize that HAM-D is not a screening tool as described earlier.
Results
The literature search generated 279 articles of which 78 were included. All articles before 2017 used the ICHD-2 to classify migraine (36 articles). Articles written after 2018 used ICHD-3 to classify migraine (32 articles). Nine articles used the International Headache Society guidelines to classify migraine. One article (38) used the Silberstein–Lipton criteria.
The number of eligible studies using the PHQ, HADS, BDI and HAM-D were 15, 19, 37 and eight, respectively. One study (91) reported data on both the HAM-D and HADS and has therfore been included in both tables. The total number of participants was 34,005 of whom 78.5% were women. Fourteen and 45% of participants had chronic and episodic migraine, respectively. The remainder had unknown migraine type/probable migraine. See Table 5 for a summary of the results.
The studies used the following terms when interpreting the scores: depression, level of depression, suggestive of depression, depressive disorders, depressive/depression symptoms, symptom score of depression, depression scale scores, psychological distress and psychological rating scores. In total, 44% of the studies concluded that the participants had depression if they scored above a set cut-off point. The cut-off points in the studies using the PHQ-9 were <10, 10–14 and >15, indicating mild, moderate and severe depression. One study (39) used only one cut-off point of >10 to diagnose depression. The cut-off points in the studies using the BDI were 10–18, 19–29 and 30–63. Three studies (40–42) used other cut-off points, which were 14–19, 20–28 and 29–63. The cut-off point in the studies using the HADS was ≥8, except for one study(43) that used a cut-off point of ≥10. The cut-off points in the studies using the HAM-D were 8–17, 18–24 and ≥25.
For the studies using the PHQ-9 (Table 1), the total number of participants was 17,717 of whom 17% had chronic migraine, 70% had episodic migraine and 13% had unknown migraine type or probable migraine. Nine studies (39,44–51) reported the mean scores and one study (52) reported the median score. The total mean scores of the PHQ-9 were 4.9 ± 5.3, 7.6 ± 5.6 and 5.8 ± 5.5 for participants with episodic migraine, chronic migraine and unknown migraine type, respectively. Nine studies (39,44–46,53–57) (60%) reported the proportion of participants that scored within a certain range. Eight studies (39,46,51–56) (53%) using the PHQ-9 concluded that participants scoring above a certain cut-off point had depression.
Study characteristics for the included studies using the Patient Health Questionnaire-9 (PHQ).
CM = chronic migraine; EM = episodic migraine; NA = not applicable.
†PHQ-9 score ≥10.
^PHQ-8.
*Chronic migraine.
**Episodic migraine.
“Probable migraine/migraine type not specified.
°Migraine diagnosed according to Silberstein–Lipton criteria.
Median score.
For the studies using the BDI (Table 2), the total number of participants was 5276 of whom 21% had chronic migraine, 39% had episodic migraine and 40% had unknown migraine type/probable migraine. Thirty-five (38,40–42,58–89) (95%) studies reported the mean scores, whereas one study (42) reported the median score. The total mean scores on the BDI were 10.4 ± 9.5, 16.9 ± 10.3 and 9.8 ± 8.6 for participants with episodic migraine, chronic migraine and unknown migraine type, respectively. Eight studies (40–42,58,72,74,89,90) (22%) reported the proportion of participants that scored within a certain range. Seventeen studies (42,58–60,63,64,68,70,72,74,75,77,78,81,86–88) (46%) using the BDI concluded that participants scoring above a certain cut-off point had depression.
Study characteristics for the included studies using the Beck Depression Inventory (BDI).
CM = chronic migraine; EM = episodic migraine; NA = not applicable.
“Migraine type not specified/ probable migraine.
†1–4 migraine days.
‡5–7 migraine days.
^≥8 migraine days.
*BDI score 14–19.
**BDI score 20–28.
***BDI score 29–63.
For the studies using the HADS (Table 3), the total number of participants was 10,802 of whom 7% had chronic migraine, 7% had episodic migraine and 86% had unknown migraine type. Fourteen studies (43,47,91–102) reported the mean scores. Two studies (102,103) reported the median scores. The total mean scores on the HADS were 5.2 ± 3.9, 5.4 ± 4 and 7.1 ± 4.4 for participants with episodic migraine, chronic migraine and unknown migraine type, respectively. Seven studies (18,92,93,96,104–106) (37%) reported the proportion of participants that scored eight or above. Nine studies (43,93–97,105–107) (47%) using the HADS concluded that participants scoring eight or above had depression.
Study characteristics for the included studies using the Hospital Anxiety Depression Scale (HADS).
CM = chronic migraine; EM = episodic migraine; NA = not applicable.
**Chronic migraine.
*Episodic migraine.
“Migraine type not specified/probable migraine.
†HADS-D score ≥10.
Median score.
For the studies using the HAM-D (Table 4), the total number of patients was 252 of whom 15% had chronic migraine, 13% had episodic migraine and 72% had unknown migraine type or probable migraine. Six studies (91,108–112) (75%) reported the mean scores. The total mean scores on the HAM-D were 9.4 ± 2.6, 10.6 ± 3.0 and 6.9 ± 5.7 for participants with episodic migraine, chronic migraine and unknown migraine type, respectively. Two studies (113,114) (25%) reported the proportion of participants that scored within a certain range. Only one study (113) (1/8) using the HAM-D concluded that participants scoring ten or above had depression.
Study characteristics for the included studies using the Hamilton Depression Rating Scale (HAM-D).
CM = chronic migraine; EM = episodic migraine; NA = not applicable; PM = probable migraine.
“Migraine type not specified.
*HAMD score > 7.
Summary table.
BDI = Beck Depression Inventory; CM = chronic migraine; EM = episodic migraine; HADS = Hospital Anxiety Depression Scale; HAM-D = Hamilton Depression Rating Scale; PHQ = Patient Health Questionnaire-9.
Discussion
In this review, we found that, out of all included studies using the depression screening tools in migraine patients, 35 (45%) of them used the tools to diagnose depression. For the studies using the PHQ-9, BDI or the HADS, almost half of them used the screening tools as evidence of depression. Only one of the eight studies using the HAM-D used the scores as evidence of depression.
Questionnaires as diagnostic tools
We found 35 studies that used the questionnaires as evidence of depression. This was especially the case in the studies using depression screening tools that have been validated in patients with migraine, such as the PHQ-9 and HADS (19,20,24). This may be a result of some studies suggesting that these screening tools may be used for diagnosing depression (14,15). Most of the included studies using the PHQ-9 refer to those studies. For example, Kroenke et al. (15) writes that their analysis of the full range of PHQ-9 scores complements rather than supersedes the PHQ-9 algorithm for establishing categorical diagnoses. However, this may not be the case for patients in whom the screening tool has not been validated. Furthermore, Maurer et al. (115) writes that, if the screening for possible depression is positive when using the PHQ-9, the diagnosis should be confirmed using the DSM-IV criteria, thus recommending that the screening tool does not stand alone in diagnosing depression. The same is concluded in the metanalysis by Levis et al. (18) comparing the PHQ-9 ≥10 prevalence with the SCID (Structured Clinical Interview for DSM-IV) major depression prevalence in 44 primary studies that administered the PHQ-9 and SCID. They state that estimates of depression prevalence should be based on validated diagnostic interviews. The same principle can be applied to the rest of the depression screening tools.
Of the 78 included studies, only one (91) compared the scores from the depression screening questionnaires with the results from a diagnostic interview for depression, such as the DSM-IV. This method of determining depression is relevant if a study intends to measure the prevalence of depression because a screening instrument cannot stand on its own. Because 45% of the studies used the screening tools to conclude that the patients had depression, these studies need to compare the total screening scores and their subscale scores against a diagnostic clinical interview to determine the truest prevalence of depression. Doing this may help reduce the rate of false negatives and positives, which can have important clinical implications concerning being correctly assessed and referred to the appropriate resources. Furthermore, it will prevent authors from determining the prevalence of depression from the screening scores alone. However, this can be more costly, time-consuming and would require more resources, which may not be accessible.
Data presentation and interpretation
Presentation of data in the different studies was very heterogeneous, making it challenging to collect data systematically. Forty-two studies (38–41,43,45–47,51–53,55,56,60–62,68–70,72,73,75–78,82–84,86,89,92,94,95,99,101,103,104,107–109,113) reported the proportion of patients with chronic or episodic migraine, whereas the remainder did not specify migraine type. Fifty-nine studies reported the mean score of the screening and four studies reported the median score, whereas 26 studies reported the proportion of patients that scored within a certain subscale. Only 14 studies (39–45,50,58,62,74,92,93,96) reported both. Moreover, varying cut-off points were used in a few of the included studies, although most of them used the same cut-off points. As an example, Moon et al. (40) and Llop et al. (41) used score ranges corresponding to the BDI-II, which was different from the rest of the studies using the BDI. Thus, careful and circumspect interpretation of the screening scores, as well as clarity and thoroughness in reporting data, is important.
Furthermore, among the studies concluding that the patients had depression based on their screening score, not all of them explicitly wrote that the patients had depression. Rather this may have been implied or it may not have been emphasized that the score reflected symptoms of depression. For example, phrases such as “the percentage of patients meeting the criteria for anxiety and depression, were” (43) and “… in cases without depression and mild to severe depression” (58) were used in the included studies. This makes the results difficult to interpret for the reader, which can lead to erroneous conclusions, since reported prevalence may exceed actual prevalence. However, some authors who have diagnosed depression through a screening tool have acknowledged that this method is not comparable to a diagnosis according to the ICD or DSM criteria (59).
Selecting the appropriate screening instrument
The choice of screening instrument is crucial, because the psychometric properties of the scale used may affect the outcome and effectiveness of a screening program. If a screening tool has not been validated in the proper context, the tool may have limited utility and interpretability. Thus, this can lead to wrong and misleading estimates on the prevalence of depression of patients with migraine. Acknowledging and controlling for specific disorders will improve the screening tool to better recognize true psychiatric illness, as opposed to reflecting symptoms of the comorbidities that overlap with symptoms of the psychiatric illness (116). Furthermore, assessing the validity and reliability of a screening instrument is important, aiming to reveal which patient groups it performs better in, as well as whether different cut-points may be needed for subsets of patients. This can lead to more accurate results of the screening (20).
Although a depression screening tool has been validated for the general population, one must still critically evaluate the reliability of the questionnaires when used for patients with headache. One study (117) determining the concordance between the SCID and the PHQ in 50 women found that the PHQ failed to identify 35% of the women who met SCID criteria for at least one major depressive disorder. Moreover, the reported optimal cut-off points, as well as sensitivities and specificities of the different screening instruments, varied between the different studies (19,20,24). This means that the reported severity of depression is sometimes inconsistent depending on the chosen screening instrument, as well as the chosen cut-off score (13). A systematic review (118) investigating the accuracy of HADS in cancer patients found that varying thresholds/cut-off points were used across the included studies, yielding varying prevalence rates of depression and anxiety. Because the cut-off point affects the number of false positives and false negatives, the choice of a cut-off point has both clinical and economic consequences for healthcare. Therefore, when choosing a screening instrument for a study, it should be performed with careful consideration of the scope and limitations of the respective instrument.
The study population
Ninety-five percent of the studies included in this review are clinical based, meaning that most of the patients being assessed are migraine patients attending tertiary headache clinics. These patients may have more complex medical histories than the general migraine population, and might consequently have a higher prevalence of depression. Studies comparing headache clinic patients and controls found that the headache patients who attended a clinic had a higher disease burden with respect to scoring higher on hypochondriasis, depression, psychasthenia and social introversion scales than controls, as well as rating their “most severe headache” as more intense than that of the controls (119,120). Thus, more studies that investigate the prevalence of depression in the general migraine population are needed to give a more accurate reflection of the prevalence. Likewise, studies comparing patients with chronic and episodic migraine found that disability and disease burden, as well as depression, are higher amongst patients with higher headache frequency and intensity (53,121,122). For this reason, it is essential to divide patients into subgroups based on their headache frequency or intensity, to provide a more accurate prevalence of depression in these groups.
The scores among the studies using the same screening tool ranged widely. These differences may be a result of varying proportions of patients with chronic and episodic migraine, as well as some studies not further distinguishing between the migraine types or headache frequencies. The studies that have a higher proportion of patients with chronic migraine may report higher mean scores or a higher proportion of patients with scores in the moderate to severe ranges, and vice versa. For the studies that did not distinguish between these subgroups of participants, it is impossible to know what exact factors have influenced the scores. Instead, these studies may have divided participants into subgroups of migraine patients with and without aura, which we did not take into account in the present study.
Another factor that may affect the differences in scores across the studies is the proportion of females and males included in the study. Females have a higher headache related disability than males (123), which can lead to higher mean scores on the depression screening tool in studies with a higher proportion of females, especially in the studies that had only included women. As an example, all participants were pregnant women in the studies by Orta et al. (44) and Friedman et al. (54)
Limitations
Only one investigator (JA) screened the titles and abstracts, and the only search engine used to conduct the systematic literature search was PubMed. Thus, it is likely that some eligible studies have not been included. Likewise, not all retrieved studies were accessible in full text and thus had to be excluded. Furthermore, many studies did not report mean scores or the proportion of participants that scored within each severity range, giving us a smaller data set to work with.
Moreover, not all the included studies disclosed the proportion of patients who had medication overuse and thus a likely medication overuse headache (MOH). As a result, although we excluded participants who had MOH/medication overuse, some might have still been included. De Silva et al. (124) reported that these patients have an almost three-fold higher Migraine Disability Assessment (MIDAS)-score than patients with episodic migraine, meaning that they have a higher headache related disability than headache patients without MOH. This may cause them to score higher on the depression screening tool, thus further contributing to inconsistencies in the results across the different studies.
Another limitation is that the validation process of the depression tools can be questioned because there is no clear “gold standard” or biomarker/test. However, it is a fundamental condition in psychiatric research that there is no ideal gold standard or biomarker to diagnose mental illness (125). The same problem is also seen in other clinical specialties, which have similar issues with inter-rater reliability. Even the assessment of liver size using ultrasound leads to discrepancies, which, however, does not prevent hepatologists from conducting valuable research. In depression research, ideally, headache studies investigating comorbid depression should apply the so-called LEAD (i.e. Longitudinal Expert Assessment of All Data) as an index of validity in these studies (126).
Lastly, we did not assess the quality or reliability of the included studies. The scope of this review was to investigate the basis for assigning depression as a comorbidity in migraine studies. Therefore, we did not specifically apply quality assessment tools such as the Newcastle–Ottawa Scale.
Conclusions
In this review, we found that, out of 78 eligible studies using a depression-screening tool in migraine patients, 45% of them used the screening-tool as evidence of depression. The screening instruments should not be used as a substitute for a validated diagnostic interview performed by a psychiatrist to diagnose depression. Instead, a two-stage estimation method that combines screening questionnaires and diagnostic interviews can reduce resource requirements and generate valid prevalence estimates for depression in migraine patients. Depression screening questionnaires should only be used to assess the presence and severity of symptoms of depression. Future studies should rely on assessment tools validated for the specific population, and authors should be critical when evaluating the scores. More studies are needed to further validate commonly used depression screening tools in patients with migraine, such as the PHQ-9, HADS, HAM-D and BDI. Furthermore, guidelines for the use of depression screening tools in patients with migraine are needed.
Clinical implications
Almost half of the eligible studies used the screening-tool as evidence of depression, whereas only one study confirmed the diagnosis with a diagnostic interview. This resulted in erroneous estimates of the prevalence of depression among patients with migraine. Correct diagnosis of depression should be conducted as described in national and international guidelines. More studies are needed to further validate commonly used depression screening tools in patients with migraine, such as the PHQ-9, HADS, HAM-D and BDI. Guidelines for the use of depression screening tools in patients with migraine are needed.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors received no financial support for the research, authorship and/or publication of this article.
