Abstract
Objectives
To critically appraise the quality of the studies underpinning the Global Burden of Disease (GBD) 2017 estimates for Major Depressive Disorder (MDD) with respect to i) the GBD 2017 inclusion criteria and ii) population coverage.
Design
Systematic critical appraisal.
Setting
Not applicable.
Participants
Not applicable.
Main outcome measures
Each study was critically appraised with respect to the four GBD 2017 inclusion criteria: representativeness, study method and sample, diagnostic criteria and publication from 1980 onwards. Population coverage was calculated.
Results
Less than half of studies (221/467, 47.3%) were nationally representative. Only 262/467 (56.1%) of studies reported specifically on MDD and more than a third did not use DSM or ICD diagnostic criteria: 94/467 (20.1%) did not specify any diagnostic criteria and 68/467 (14.6%) relied on self-reported depression for diagnosis. Only 62/467 (13.3%) of studies were conducted during the period 2011-2017. Only 107/195 (54.9%) of countries had one or more prevalence studies.
Conclusions
GBD 2017 estimates for MDD are based on incomplete country and population coverage. The inclusion of studies with non-representative populations, that do not use diagnostic criteria and the lack of specific data on MDD reduces the reliability of estimates and limits their value for policy making.
Keywords
Background
Global Burden of Disease (GBD) studies, undertaken by the Institute for Health Metrics and Evaluation (IHME) in collaboration with the World Health Organization (WHO), are intended to provide objective and comparable measures of population health, trends in the health status of populations and changes in disease burden over time. Such measures, the authors state, are critical for tracking progress towards Sustainable Development Goal 3 (“ensure healthy lives and promote well-being for all at all ages”) and to inform priority-setting and planning at national and international levels (1). The Gates Foundation, the largest funder of the IHME, uses the GBD data to inform its investment portfolio (2). Governments including China, Chile, Costa Rica, Ethiopia, Brazil, Indonesia, India, Japan, Nepal, Nigeria, Norway, New Zealand, Pakistan, Poland and the United Kingdom have used GBD estimates in national planning (3).
The GBD project uses the disability-adjusted life year (DALY) metric to quantify disease burden. It is a composite measure combining years of life lost (YLL) from premature death and years lived with disability (YLD). Critiques of GBD methodology include: the biomedical focus of the DALY; the devaluing of aspects of disease and healthcare that cannot be quantified including local contexts (4, 5); use of fixed disability weights in the DALY which may underestimate the burden associated with morbidity in disadvantaged populations and overestimate the burden in advantaged populations (6). Another key concern is that the development and adoption of indicators are driven by donors and their programmatic priorities rather than the need to strengthen and invest in local health systems and information systems (7).
Disease burden metrics, particularly for depression, have played a crucial role in raising governmental awareness of mental health, and its inclusion within the United Nations Sustainable Development Goals (8). However, the reliability and validity of GBD estimates of depression for 2000 have been called into question due to the lack of reliable data and nationally representative studies in many countries (9). GBD estimates for MDD published in 2017 expanded the number of data sources and changed the inclusion criteria (1). It also introduced methodological changes such as estimating incidence from other parameters (not stated) rather than relying on raw incidence data and introducing upward or downward adjustments to address between-study variability in the raw prevalence data (GBD 1 , supplementary appendix one, p493-494). As a result, depressive disorders (Major Depressive Disorder (MDD) and dysthymia) were identified as the third leading cause of YLD, estimated to be the cause of 5% (43 million /853 million) of all YLD globally.
This study aims to critically appraise the quality of the epidemiological studies underpinning the GBD 2017 estimates for MDD with respect to i) the four GBD 2017 inclusion criteria and ii) the population coverage of the studies by country and WHO region.
Methods
Data sources
Studies underpinning the GBD 2017 estimates for MDD were identified using the GBD 2017 data input sources tool (10) and retrieved via internet sources, libraries and personal communications with the authors of the studies. Of the 431 studies underpinning the 2017 GBD estimate for MDD, 400 (92.8%) were retrieved and analysed (Table 1). We could not access 20 studies and 11 studies were not available in English.
Number of studies included in GBD 2017 and number retrieved by WHO region.
* Multi-country studies were disaggregated into country-level studies and assigned to WHO region.
AFRO: WHO AFRO region, EMRO: WHO Eastern Mediterranean Region, EURO: WHO European Region, PAHO: Pan American Health. Organisation, SEARO: WHO South-East Asia Region, WPRO: WHO Western Pacific Region.
GBD 2017 includes “multi-country studies” covering more than one WHO region. Such studies used standardised methods and presented their results for all different countries in a single publication. Country level samples used in multi-country studies were disaggregated from the overall sample and analysed separately by country to give a total of 467 country-level studies.
Study type
Table 2 shows the source of studies underpinning the GBD 2017 estimates for MDD. The majority (341/431, 73.0%) were scientific literature.
Source of studies underpinning the GBD 2017 estimates for MDD.
WHO World Health Survey: standardised surveys jointly owned by countries and the WHO. WHO World Mental Health Survey: standardised surveys led by the WHO International Consortium in Psychiatric Epidemiology. National Survey: surveys collecting data for a national population.
Analysis
We critically appraised the quality of the epidemiological studies underpinning the GBD 2017 estimates for MDD with respect to the four GBD 2017 inclusion criteria:
The GBD 2017 methods exclude inpatient or pharmacological treatment samples, case studies, veterans or refugee samples. Without further details on how the GBD authors determined whether a study was representative we used generally accepted criteria for representativeness (11); representative of the population in terms of age, gender, geographic location and sociodemographic group; use of a probability sampling method, e.g. a sampling method that involves randomly selecting a sample from the population; and an adequate sample size.
A study had to include both males and females to be considered representative. Studies of a narrow age group (e.g. 10-20 years old) were deemed representative if they were representative of that population in other respects. Studies based on limited geographic country areas; specific sociodemographic groups (e.g. “white collar workers”, “white adults” and “university students”) were not considered generally representative. An “adequate sample size” was a sample size over 1000, as this was a criterion for inclusion in GBD studies prior to GBD 2015.
Reasons for individual studies not deemed representative were recorded, categorised and compared by region.
The GBD 2017 methods do not specify exactly the information needed to be provided by studies to be deemed “sufficient”. For each study we extracted information on: age group; gender; geographic location; sampling method; sample size; response rate; and diagnostic process (i.e. format of data collection, diagnostic instrument and diagnostic criteria used).
Diagnostic criteria for establishing “caseness” (i.e. MDD) and the reported outcome (i.e. diagnosis and severity where measured) were extracted from each study and compared by region.
As well as checking date of publication, data on the time period of data collection of each study were extracted and compared by region.
In addition to the four GBD criteria, we calculated population coverage in the GBD 2017 estimates for MDD. Coverage per 100,000 population by individual country, by WHO region and in total was calculated as follows:
= (Total number of respondents used to calculate GBD 2017 MDD estimate/total population in 2017) * 100,000.
For each country, the number of people in each study sample was extracted and combined to give the total number of respondents per country. Where a study was based on multiple samples drawn at different time points these samples were separated out and counted separately. For example, an Australian study by Goldney et al. used three separate samples from 1998, 2004 and 2008 (12). Where data from the same sample population were used in multiple studies, the sample was only counted once. For example, the same data from the Japanese World Mental Health Survey were published once as a single country study and again in a multi-country study.
Results
GBD inclusion criteria
The findings for each of the four inclusion criteria are summarised in Table 3 and described below.
Regional analysis of studies meeting GBD inclusion criteria.
*Criteria for representativeness were: Representative of the population in terms of gender, geographic location and sociodemographic group; Usage of a statistical sampling method; Sample size >1000. **Yes: Number of studies including information. No: Number of studies not including information. Criteria where information was missing from studies are highlighted in bold.
Of the 467 studies analysed, 221 (47.3%) were nationally representative (Table 3). The number and percentage of representative studies per region ranged from 31/89 (34.8%) in WPRO region to 87/160 (54.4%) in EURO region. The most common reason for not being representative was that the study was conducted in a limited geographic area. However, most studies described as ‘not representative’ met multiple criteria for non-representativeness. The 29 studies that only included a narrow age group were all representative of this population in all other respects (not shown in table).
Of 467 studies, 10 did not report on the age group and 57 did not report response rates (Table 3). All other studies (400/467; 85.6%) provided sufficient information to assess the quality of the study according to the criteria described in the methods (Table 3).
a) Diagnostic criteria
Although use of Diagnostic and Statistical Manual (DSM) or International Classification of Diseases (ICD) criteria is a criterion for inclusion in GBD 2017, more than a third of studies did not meet this requirement: 94/467 (20.1%) studies did not specify any diagnostic criteria and 68/467 (14.6%) studies relied on self-reported depression for diagnosis. All WHO World Health Surveys used self-reported diagnoses of depression. One study in EURO region used alternative diagnostic criteria, the Research Diagnostic Criteria (13)(Table 3).
Table 6 (Appendix 1) shows the 56 different instruments used to assess for depression: these included questionnaire-based depression screening instruments, structured, semi-structured and unstructured interviews, symptom checklists, generic psychopathology assessment tools, and behavioural questionnaires. Many studies either gave vague descriptions (e.g. “clinical diagnosis based on present state examinations”), or stated “self-reported” or did not specify how a depression diagnosis was made.
b) Reported outcome
Table 3 shows the reported outcome of studies underpinning the GBD 2017 estimate for MDD by region. Of 467 assessed studies, just over half, 262/467 (56.1%), reported prevalence data for MDD or a major depressive episode, a third, 155/467 (33.2%), of studies used ‘depression’ as their reported outcome, with no details on the severity of the disorder, and 22/467 (4.7%) used ‘depressive symptoms’. The remaining three studies used reported outcomes of: psychosocial problems, depression and/or anxiety and emotional disorder (Table 3).
All studies underpinning the GBD 2017 estimate for MDD were published in or after 1980. Under two thirds of studies finished their data collection in the period 2001-2010 (284/467, 60.8%) and only 62/467 (13.3%) of studies finished data collection in the period 2011-2017. Notably, two studies in PAHO region finished their data collection in 1961-1970 (Table 3).
Population coverage in the GBD 2017 estimates for MDD and by WHO region
Of the 195 countries included in the GBD 2017 estimate for MDD (1), 107 (54.9%) had one or more epidemiological studies (Table 4). The total coverage per 100,000 population was 1099.3 overall.
Number of countries with one or more studies and population coverage in the GBD 2017 estimate for MDD by region.
Population data were from the World Bank Open Data (14). *Population data for Eritrea were from 2011 as this was the most recent available. **Data for all countries except North Korea.
The number and percentage of countries covered by one or more studies varied by region from 15/36 (41.7%) in PAHO and 11/26 (42.3%) in WPRO regions to 13/22 (59.1%) in EMRO and 36/52 (69.2%) in EURO regions. EMRO and EURO have the highest number of countries covered and WPRO and PAHO the lowest.
Population coverage by region shows that SEARO and EMRO had the lowest coverage with 6.3 and 10.6 per 100,000 population, respectively, while EURO (795.2) and PAHO (7617.1) had the highest coverage.
Country studies by WHO region
The population coverage by country varied (Appendix 2) and coverage in some regions was primarily driven by a single country.
In the PAHO region, 51/101 (50.5%) studies were undertaken in the United States of America (USA) (Table 5). While the population of the USA makes up approximately a third (33.7%, 325.1 million/966.3 million) of the region’s population, respondents from the USA represented 90.6% (66.7 million/73.6 million) of PAHO region respondents.
Population coverage in the GBD 2017 estimate for PAHO and WPRO regions comparing the USA and Vietnam to all other PAHO and WPRO region countries respectively.
In the WPRO region, the Vietnamese population makes up 5.0% (94.6 million/1902.6 million) of the region’s population. However, respondents from Vietnam accounted for 73.6% (2.6 million/3.6 million) of all respondents in the WPRO region. One study, a large national health survey study, the Vietnam Burden of Disease and Injury Study 2008 accounted for the majority (2.6 million) of respondents (15).
Discussion
Since the publication of GBD 2000, there has been a significant improvement in country coverage of studies of MDD; whereas only 40/195 countries were included in GBD 2000, over half of all countries 107/195 (54.9%) were included in GBD 2017 (9).
Despite this improvement, major gaps in coverage persist: the SEARO region has population coverage of 6.3/100,000 compared with the PAHO region of 7617.1/100,000. Moreover regional estimates are distorted by overrepresentation of individual countries with the USA and Vietnam driving regional estimates for PAHO and WPRO respectively. The estimates from the Vietnam Burden of Disease and Injury Study 2008, which contributed the majority of the Vietnamese respondents to GBD 2017, were mainly based on surveys of varying quality. The authors reported major inconsistencies and implausible age-specific estimates for mental disorders between surveys conducted in 2006 and 2008 (15).
Of concern is that while the number of studies included in GBD 2017 estimates increased 10-fold, from 42 in 2000 to 431 in 2017, less than half of individual country studies (221/467, 47.3%) included in the GBD 2017 were representative for age, gender, geographic location and sociodemographic group and used an adequate sample size. Additionally, 57 studies did not report their response rates which are important for assessing sampling bias.
Diagnostic criteria for MDD and reported outcomes
A third of studies did not use DSM or ICD diagnostic criteria for MDD. One fifth did not specify use of diagnostic criteria and 68 (14.6%) relied on self-reports of depression, which are known to inflate depression estimates.
Depression in the GBD studies was ascertained on the basis of 56 different instruments and a variety of methods (e.g. questionnaires, self-report). The inclusion of such a wide range of screening instruments appears to rest on the unsupported assumption that depression screening tools can be used interchangeably in different regions at different times and for different populations (16). More research is needed to analyse the screening tools.
GBD methods and modelling have previously been criticised for a lack of transparency which makes it difficult to explain how outputs relate to country data (17). The GBD 2017 paper (1)(Supplementary appendix one, p493-494) states that estimates based on different diagnostic methods used by studies were adjusted, however, the level of detail is not sufficient for researchers to replicate the methods. Moreover, there are no standardised methods for determining the prevalence of mental disability, and comparisons across existing methods of data collection are highly unreliable (18). Relatedly, there is also the issue of ‘current’ versus ‘ever’ depression which will clearly provide very different estimates. The GBD 2017 paper (Supplementary appendix one) states that all data points derived from past year prevalence were adjusted towards the level they would have been if the study had captured point/past-month prevalence. Again, no further detail is given by the authors on how this adjustment was made.
Just over half of the studies (262/467, 56.1%) reported prevalence data specifically on MDD or major depressive episode. A third (155/467, 33.2%) of studies used depression as their reported outcome but did not include severity measures; 22 (4.7%) used a reported outcome of depressive symptoms and 3 (0.6%) used other outcome measures (psychosocial problems, depression and/or anxiety and emotional disorder). It is not known whether these cases would meet diagnostic criteria for MDD. Also, different studies may have used the same reported outcome with different definitions. Brhlikova et al. found that the epidemiological studies underpinning the GBD 2000 estimates lacked a standard case definition and used different measures or different thresholds for reporting (9). GBD 2017 methods do not state how the differences in reported outcomes between studies were dealt with.
Finally, although the WHO World Health Surveys were representative and allow cross-country comparisons, they were designed to provide insight into how health systems are functioning for the purposes of policy analysis (19), not to provide estimates of MDD. They comprised 14.4% of the GBD data sources.
Timeliness
Only 13.3% of studies collected data in the period 2011-2017. The majority of studies undertook data collection between 2001-2010 and a fifth of studies between 1991-2000. The GBD 2017 paper provided updated estimates of incidence, prevalence and YLD numbers and rates for MDD in the years 1990, 1995, 2000, 2005, 2010, and 2017. It is not clear from the GBD 2017 methodology if the older data were incorporated into revised estimates for the time periods they relate to or whether they were also incorporated into estimates in more recent time periods but outside the period of study. The GBD authors did not reply to our request for clarification.
Strengths and limitations
We systematically appraised all available country studies included in GBD 2017 estimates for MDD. Several limitations however arise. Thirty-one epidemiological studies that were included in GBD 2017 could not be retrieved despite requests to authors. Relatedly, the GBD 2017 methods do not state how authors determined the representativeness of studies and our criteria may differ from those used in GBD 2017.
Response rates could not be analysed as many of the studies included in GBD 2017 reported different response rates for individuals, households and across study samples.
Overlap between samples (e.g. the same respondent sampled twice in two different studies) and between populations sampled in different studies would lead to an overestimate of population coverage. GBD methodology does not explain how the authors resolved this.
The findings relate to the GBD 2017 estimates. More recent GBD estimates for MDD were published in 2019. Although a full analysis of GBD 2019 is beyond the scope of this paper there were no changes to the inclusion criteria between GBD 2017 and GBD 2019. In January 2023 the GBD website listed 401 sources for the 2019 estimates of MDD compared to 431 in 2017; there were no additional sources in the 2019 list. Of the 30 studies that were excluded in 2019, 15 were data relating to United States Medical Expenditure Panel Survey (2000-2014) which may reduce overrepresentation of the US in the PAHO region.
Conclusion
GBD estimates play a central role in shaping the priorities of global health organisations and national governments. The WHO’s ranking of depression as the single largest contributor to global disability (20) is being used to advocate for scaling up depression treatment, particularly in the global south where resources are limited. However, GBD 2017 estimates are based on incomplete country and population coverage and unclear methodologies. Extrapolation of single study estimates, the inclusion of studies with non-representative populations, and lack of specific data on MDD, undermines the reliability of many country and regional estimates. The critical flaws in the data underpinning the GBD 2017 estimates mean that policymakers should interpret these with caution.
Footnotes
Acknowledgements
We thank Evinia Listiania and Maisie Johnson for their help with data collection.
Competing interests
The authors declare that there is no conflict of interest.
Guarantor
Petra Brhlikova.
Contributorship
Conceptualisation (AMP, PB); data collection (RL, CB); first draft (RL, LC); all authors contributed to the design, analysis, interpretation of data, revisions, and approved the version to be published.
