Abstract
To evaluate the psychometric properties of the Arabic-language Patient Health Questionnaire-9 (PHQ-9) among Saudi caregivers of patients with chronic diseases. Using a cross-sectional design, 94 Saudi caregivers (37 male and 57 female) in the medical city participated in the study. A comparative assessment was conducted on 4 models proposed in the existing PHQ-9 literature to gauge their compatibility through confirmatory factor analyses. This study evaluates convergent validity through a correlation analysis, examining the relationship between the PHQ-9 and the Depression, Anxiety, and Stress Scale-21 (DASS-21). Among the various models we examined, the single-factor structure of the PHQ-9 displayed the best fit with the data we gathered. Notably, the Cronbach alpha coefficient for the PHQ-9 registered at .81, indicating a high level of internal consistency. Factor loadings spanned a range from .39 to .76. The convergent validity of the PHQ-9 and DASS-21 was deemed satisfactory. It is established that the PHQ-9 serves as an effective tool for depression screening among Saudi caregivers in Saudi Arabia. Its strengths lie in its demonstrated validity, dependability, brevity, and convenience of administration, positioning it as a valuable resource for preventative measures and performance assessment within mental health settings.
Keywords
The Patient Health Questionnaire-9 (PHQ-9) is a commonly used measurement for depression and depressive symptoms, and the PHQ-9 component structure is essential and controversial because it influences interpretations of findings.
The PHQ-9 corresponded significantly with the corresponding criterion instrument, and the analyses supported a one-factor representation of the questionnaire.
The outcomes of this research demonstrate that the Patient Health Questionnaire-9 is a theoretically sound instrument that may be used for both research and clinical practice, and it can be employed as a useful screening and assessment tool in mental health care settings.
Introduction
The most common mental illness among the general public is depression, sometimes referred to as major depressive disorder or clinical depression. It is characterized by enduring feelings of depression, a loss of interest or enjoyment, disturbances in the neurovegetative system, and a decrease in overall energy levels. 1 Depression, in its most severe form, can raise the chance of suicide. Depression usually worsens over time, having a major impact on psychosocial functioning.2 -4 According to the WHO, approximately 322 million individuals (equating to 4.4% of the global population) were believed to be suffering some form of depression as of 2017. 5 Depression is frequently undiagnosed or untreated even though it is a widespread phenomenon. Mental health issues are now a major national concern in Saudi Arabia due to the rising suicide rate that was observed among the Saudi population between 1990 and 2017 compared to other Middle Eastern and North African nations. 6 Consequently, Saudi Arabia’s general authority for statistics has placed a significant emphasis on mental health, particularly depression.7,8 The second-highest suicide rate among the nations of the Arabian Gulf Cooperation Council in 2022 was observed in Saudi Arabia, with a suicide rate of 6 per 100 000 inhabitants. 9 There is a consensus that depression is a significant factor in a large number of these suicides. Between 14.9 and 47.8 percent of caregivers in Saudi Arabia who had dependents with chronic diseases reported having depression.10 -12 As a result, caregivers who are depressed may inadvertently cause similar negative effects in those under their care. Caregivers who are not depressed, on the other hand, tend to provide more effective care.13 -15 Furthermore, caregivers are primarily women who devote significantly more time, ranging from 2 to 10 times more, to unpaid care work than men. 16
It has been established that the PHQ-9’s internal consistency is sufficient (α = .77-.89).17 -19 The study results have proposed average to positive correlations with relevant outcomes of anxiety and depression. Despite analyses of the underlying variables supporting the Patient Health Questionnaire-9, they demonstrate inconsistency. Examples include studies on normal grownups in Germany and Hong Kong,20,21 studies on young teenagers in Ghana, 22 college samples from the United States and Nigeria,19,23 and primary care patients. 24 These investigations also discovered that the most effective results were obtained by a one-factor framework showing the depression’s single-dimensionality as a construct. 22 The Patient Health Questionnaire-9 factor structure was found to be best embodied within 2-factor models that distinguished between somatic and non-somatic contexts in studies of psychiatric patients,25,26 coronary heart disease patients, 18 patients with painful spinal cord accidents,17,27,28 and elderly psychiatric patients in Taiwan. 29 These contradictory findings may be at least in part explained by the fact that the studies included participants from a wide range of cultures (eg, the United States vs Asia), age groups (eg, elderly adults vs young teenagers), characteristics of participants (eg, people with back injuries as opposed to community-based participants), and sample sizes (eg, one study used a sample group of college students of 16 754, while the other 2 were only 857 and 512).19,23,30 Confirmatory factor analysis (CFA) is common employed to examine the factorial validity of psychometric variables, is more acceptable in these situations from a methodological perspective because it may produce more conclusive findings in relation to the PHQ-9 measurement model than explorative factor analysis, which makes hardly assumptions about a prior hypotheses.
Although primary care settings have successfully validated the PHQ-9,31,32 the identification of practical measures of depression is particularly crucial for Saudi caregivers of dependents who were diagnosed with chronic diseases. However, the psychometric properties of members of this population have not previously been investigated. The PHQ-9 may be a beneficial tool given the high rate of depression in caregivers and the necessity for swift assessments that may properly assess symptoms of depression among members of this population. The PHQ-9 was conceived for use in primary care contexts, and it is a practical tool for those settings thanks to its length, ease of scoring, and access to the general public. 23 The researcher evaluated the psychometric variables of the Patient Health Questionnaire-9 (more specifically, convergent validity, factor structure, and internal consistency) with a Saudi sample to determine it’s appropriateness with caregivers of dependents diagnosed with chronic diseases in Saudi Arabia. The fits offered initially presented one-factor and 2-factor frameworks that were contrasted to analyze the PHQ-9’s factor structure and investigate the tool’s construct validity. The researcher also used the tool in conjunction with other comparable assessments of mental dimensions.
Method
Study Participants
The Institutional Review Board of the King Saud University Medical City Ethical Review Committee (Ref. No. 21/01144/IRB). At each administration, the principal investigator was there to provide guidance. The study participants were family caregivers of dependents who had been identified as having been treated in the past or present for cardiovascular illness, diabetes, eye conditions, cancer, or kidney disease. The researcher chose the participants after reviewing cases of patients with chronic disease being treated at the Medical City. The researcher invited caregivers whose patients had been diagnosed with chronic diseases to participate in the study using a tablet containing the Google Form link for the survey. Caregivers could include a partner or family member staying in the hospital with the patient. Participants were chosen at random to ensure that every member of the population had an equal chance of being chosen for inclusion in the research sample. Each department has a list of the caregivers’ names. The odd numbers in the caregivers’ list were asked to participate in the study. After selecting the caregivers from the list, the patients diagnosed were examined to see if they were chronically ill. Lastly, the caregiver was asked for the inclusion and exclusion criteria below. The researcher sampled 109 caregivers, with 94 responding for an 86.2% response rate.
The researcher defined a primary caregiver as a Saudi citizen who is at least 18 years old, resides with the patient while they are receiving care in the Medical City, and gives the patient supporting care without receiving payment for their services. 33 The exclusion criteria for caregivers were: (1) cognitive impairment that prevented them from reading the consent form and responding to surveys; (2) the existence of a serious psychiatric disorder (such as schizophrenia); and (3) having medical reasons why they should not participate.
All participants provided online informed consent after being informed of the study’s aims and purposes. Participants were made aware that taking part was entirely optional. Participants also got assurances that their answers would be kept private and confidential. In general, the questionnaire took 15 to 20 min to complete. The research was performed in line with the Helsinki Declaration protocol. Experts in instrument development have advised that for CFA, a minimum of 10 participants are required for each instrument item.34,35 Consequently, a sample of at least 90 Saudi caregivers was deemed adequate.
Measures
The PHQ-9, a self-report questionnaire comprising 9 items, serves as a tool for assessing the severity of depressive symptoms. This assessment draws its foundation from the diagnostic criteria for major depressive disorder as outlined in the DSM-IV. Participants are provided with a 4-point Likert scale, where they assign ratings (0 = not at all, 1 = a few days, 2 = more than half the days, and 3 = almost every day) to indicate how frequently they have experienced 9 recognized depression symptoms over 2 weeks. Higher scores on this scale signify a more pronounced symptomology. The individual responses to each symptom are aggregated, and a total score is computed. This score can range from 0 to 27. Notably, mild depression is defined as a score between 0 and 14, while a score between 15 and 19 corresponds to moderate depression, and a score exceeding 20 indicates severe depression. This study was based on the PHQ-9 in Arabic, validated and utilized in several investigations on university students and patients who visited primary care facilities.36,37
A self-reported instrument called the Depression Anxiety and Stress Scale-21 (DASS-21) evaluates stress, anxiety, and depression from the preceding week and has 3 subscales, each with 7 items. 38 According to a 4-point Likert scale, each item is rated, with 0 being the least applicable to the respondent and 3 being the most applicable. Elevated scores are indicative of more pronounced symptoms, and the scores of the 3 scales are determined by averaging the values assigned to the relevant questions. In many language versions and population, the DASS-21 has been validated with promising results.39 -42 Throughout this study, we utilized the DASS-21 in Arabic language on the basis that it has been proven to offer robust psychometric properties.43,44 Cronbach’s alpha values for stress, anxiety, depression, and the overall DASS-21 score for the current sample were .84, .77, .88, and .93, respectively.
Data Analysis
The data were analyzed using Analysis of Moment Structures (AMOS 21.0) and IBM SPSS Statistics for Windows, Version 28.0 (IBM Corp., Armonk, NY, USA). 45 The structure of the PHQ-9 was subsequently tested using a series of plausible alternative feasible models using CFA with the maximum-likelihood technique. CFA can be used to test the validity of the predicted correlation between the observable variables and the related latent components. 46 CFA was selected as the instrument for this research since it makes it possible to test specific hypotheses and establishes a priori the instrument’s structure as it was conceptually intended.46,47 As a result, items associated with a specific factor can contribute to that factor’s load, while items without any association with that factor are removed. Testing a measure’s internal consistency and validity using this approach is also very helpful when developing a scale. 47 Each model’s goodness-of-fit was evaluated using the criteria and model fit indices below. The model was tested using the following methods: chi-square (χ2) and the resulting degrees of freedom (df), comparative fit index (CFI), root mean square error of approximation (RMSEA), goodness-of-fit index (GFI), normalized root mean square residual (SRMR) and the 90% confidence interval (90% CI). Although earlier research used cutoffs of 0.90, 48 CFI values of 0.95 or higher indicate a satisfactory match. 49 Excellent fit is shown by GFI values greater than 0.90. 48 A good fit is indicated by RMSEA values of 0.05 or below, fair or reasonable approximation errors are indicated by values up to 0.08, and mediocre fitting is identified when the scores are between 0.08 and 0.10. 50 Similar to the RMSEA, lower SRMR values indicate a good model fit. SRMR values of 0.05 or less indicate a close fit, while values of 0.08 are acceptable. 51
The researcher evaluated the extent to which each model fit the data. The fit indices from 4 competing models presented in the existing literature were contrasted. Model 1, the initial one-factor model, loaded all 9 PHQ components onto a single factor proposed by Kroenke et al. 24 The other 3 competing models’ 2-factor topologies were slightly different, and there were differences in the loading of items on the somatic and non-somatic components. Sleeping difficulties, exhaustion, and appetite change are 3 items that load on one somatic component in Model 2a, a 2-factor framework developed by Krause et al, 17 whereas the additional 6 items load on a non-somatic factor. Similar to Model 2a, Model 2b, based on the work of Richardson and Richards, 28 differs in that the somatic factor is burdened by the items “concentration difficulties” and “psychomotor agitation/retardation.” Model 2c was derived from the work of Krause et al. 27 The item “anhedonia” loads on the somatic component in this model, unlike Model 2b, which does not.
The researcher evaluated whether the frameworks were substantially different from one another. Then, the researcher performed chi-square difference testing after the PHQ-9 goodness-of-fit study of competing models. The researcher was able to evaluate the relationships between the PHQ-9 and the criteria instrument DASS-21 and compute the PHQ-9’s convergent validity by assessing the correlation using Pearson’s r. Calculating Cronbach’s alpha allowed for the determination of internal consistency.
Results
Sample Characteristics
During the COVID-19 epidemic in Saudi Arabia’s central region, data were gathered from 94 Saudi caregivers (37 male and 57 female). The participants’ mean age was 39.6 ± 9.45. Table 1 provides an overview of the participants’ sociodemographic information. Depression scores for the individuals varied from 0 to 20, with a mean of 4.50 ± 4.36. As a result, the current sample as a whole tended to show fairly minimal depressive symptoms. In this sample, the PHQ-9’s level of internal consistency was a respectable Cronbach’s alpha of .81.
Sample Description (N = 94).
Note. M = mean; SD = standard deviation.
Factors Composition on the PHQ-9
The ratings for the competing models’ goodness-of-fit are summarized in Table 2. It can be concluded that the initial one-factor concept (Model 1) properly explained the data across the sample due to the comparably high values of GFI and CFI and low values of SRMR and RMSEA [χ2 = 49.1, df = 27; CFI = 0.90; GFI = 0.91; RMSEA = 0.094 (90% CI = 0.050-0.135); SRMR = 0.071] (Figure 1). All 3 of the 2-factor models’ goodness-of-fit indices fell below the necessary cutoff points. A slight decrease in the CFI showed that Models 2a (Figure 2) and 3c suited the data slightly better than Model 2b (Figure 3), but the indices did not meet the established fit standards. Model 2a generated nearly identical fit indices to Model 3c (Figure 4).
Models’ Goodness-of-Fit Indexes for the PHQ-9.
Note. *P < .05, †represents the study’s final model. Patient Health Questionnaire-9 (PHQ-9): K is the number of items, df is the degrees of freedom, Comparative Fit Index is denoted as CFI, Goodness-of-Fit Index is GFI. SRMR stands for standardized root mean residual. RMSEA stands for root mean square error of approximation.

Factor loadings: Model 1.

Factor loadings: Model 2a.

Factor loadings: Model 2b.

Factor loadings: Model 3c.
The data from the entire sample appeared to exhibit a unidimensional nature rather than a purely multidimensional one, and the one-factor approach (Model 1) was determined to provide the best match. The one-factor model has good factor loading, with all standardized factor loadings significant at P < .05 and ranging from .39 to .76. Table 3 displays the factor loadings for the one-factor model.
Factor Loadings for the PHQ-9 Modified Model.
Note. PHQ-9 = Patient Health Questionnaire-9.
Convergent Validity
The 3 DASS-21 subscales for anxiety (r = .43), stress (r = .54), and depression (r = .59) revealed a moderate and positive correlation with the PHQ-9 score. Thus, using our sample, we achieved the acquired level for convergent validity.
Discussion
No previous investigation has examined the PHQ-9’s psychometric attributes among Saudi people responsible for giving care to dependents with formal diagnoses of cardiovascular illness, eye diseases, cancer development, and kidney illnesses. The PHQ-9 has high psychometric features, as shown by its excellent accuracy and good criterion, factorial, and convergent validity, which were examined in a previous study using a large primary care sample. 24 The researcher also looked at the factor structure, internal consistency, and convergent validity of the PHQ-9. The CFA’s findings showed the initial one-factor approach that provided with the best match to the data for Saudi caregivers. Four different models were examined. This finding was consistent with those of prior studies on college samples from the United States and Nigeria,19,23 on general adults in Germany and Hong Kong,20,29 on young teenagers in Ghana, 22 and on patients in primary care. 24 The majority of research that have examined the variable structure of the PHQ-9 has backed up its unidimensionality. Another research, however, has noted both a somatic and an affective (non-somatic) dimension.17,18,26 The usage of various subgroups of people may be a contributing factor in the disagreement across research. The majority of the participants in our study were in a medical setting and reported with mild depression.
However, investigations seeking a 2-factor model distinguished by non-somatic and somatic (affective) indications have been performed using a wide range of communities that exhibit comorbid physical problems like major depression, coronary heart disease, and spinal cord injury. The potential interactions between the physical and mental disorders may explain the somatic factor loading. 26 Furthermore, Petersen et al 26 found that, in contrast to our 2-factor models, which fit the data poorly because one index was below the standard cutoff threshold, a 2-factor framework represented the most effective sample fit in a population of primary care patients. The participants in Petersen et al’s 26 study, however, had been diagnosed with serious depression and were eligible for antidepressant therapy. The relatively homogeneous sample used by Petersen et al 26 may have contributed to a limitation of the variety of responses, counteracting the interrelations between the parameters. The relatively homogeneous sample is one explanation for such a 2-factor solution found in their research. Because of this attenuation, factor loading and any relationships between factors may be overstated. 52 Given the greater magnitude of variation and the increased likelihood of items loading onto a single factor when using the PHQ-9 with more diverse samples, like the general population, researchers have a greater capacity to identify a one-factor solution. 26
In a study including female non-Latina and Latina students on campus, Granillo 30 discovered that a 2-factor solution best fit the data. Even though these female students may appear to differ in many ways from those who have severe clinical conditions (such as major depression or spinal cord injuries), the fact that somatic and non-somatic indications of the PHQ-9 are separate depression subscales in those kinds of communities could be attributed to features that were shared across the populations, in that somatic symptoms are believed to be the exception as opposed to the norm. 30 Students excelled higher on the somatic subscale than on the non-somatic subscale, according to Granillo’s study’s findings. Like a multidimensional assessment of depression, the PHQ-9 appeared to be suitable in this case for use with female college students. In accordance with other sets of research spanning both clinical and non-clinical samples from the U.S., Ghana, Taiwan, Germany, Hong Kong, and Nigeria, the PHQ-9 achieved excellent internal consistency. These outcomes indicate that the PHQ-9 has satisfactory internal consistency among different populaces and linguistic situations. All of these findings suggest that the PHQ-9 has adequate internal consistency among different populations and linguistic situations. The scale’s general convergent validity was adequate, supporting its use as a gauge of depression severity, as the PHQ-9 scores were modestly and positively linked with those on alternative depression and anxiety measures.
Our outcomes have significant implications for the screening and identification of depressed symptoms among Saudi caregivers of dependents. Due to its succinctness, simplicity in scoring, and appropriate psychometric properties, the PHQ-9 offers certain practical benefits over other widely used measures of depression, as noted by Keum et al 53 in a clinical assessment of depression in university settings. These benefits may enable the PHQ-9 to be employed successfully in health settings to perform preventative and performance evaluations. For instance, the medical field may consider incorporating the PHQ-9 into surveys given to new caregivers of dependents. 23 If caregivers are found to be at risk of developing symptoms of depression because they exhibit the associated symptoms, they could be offered free in-department testing and treatment options. Furthermore, because the PHQ-9 is a simple self-report mechanism to implement and interpret, caregivers will have a higher likelihood of completing it.
This study has limitations. The current study’s limitations included a small convenience sample, a single setting, and only self-reports. Therefore, the study involved a mainly uniform sample. The uniform sampling could have reduced the generalizability of the current findings because the sample consisted of caregivers. As a result, the factorial structure of the PHQ-9 may vary depending on the population, including teenagers, older adults, those with more serious clinical disorders, and people from varying cultural backgrounds. Future research should replicate with more specific samples to investigate the PHQ-9’s utility across diverse populations.
Conclusion
Based on our discoveries, the PHQ-9 proves to be a reliable and one-dimensional tool for assessing depression among caregivers in Saudi Arabia. Our findings substantiate the belief that the PHQ-9 serves as a robust and consistent self-report measure for evaluating depression in individuals across clinical and non-clinical settings.
Supplemental Material
sj-pdf-1-inq-10.1177_00469580231221287 – Supplemental material for Psychometric Properties of the Patient Health Questionnaire-9 for Saudi Caregivers: A Cross-Sectional Study in Saudi Arabia
Supplemental material, sj-pdf-1-inq-10.1177_00469580231221287 for Psychometric Properties of the Patient Health Questionnaire-9 for Saudi Caregivers: A Cross-Sectional Study in Saudi Arabia by Salman M. Alreshidi in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Supplemental Material
sj-pdf-2-inq-10.1177_00469580231221287 – Supplemental material for Psychometric Properties of the Patient Health Questionnaire-9 for Saudi Caregivers: A Cross-Sectional Study in Saudi Arabia
Supplemental material, sj-pdf-2-inq-10.1177_00469580231221287 for Psychometric Properties of the Patient Health Questionnaire-9 for Saudi Caregivers: A Cross-Sectional Study in Saudi Arabia by Salman M. Alreshidi in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Footnotes
Acknowledgements
The author of this study extends his appreciation to the Researchers Supporting Project Number (RSPD2024R880), King Saud University, Riyadh, Saudi Arabia.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article.
Ethical Statements
Ethical approval was obtained from the King Saud University Medical City Ethical Review Committee (Ref. No. 21/01144/IRB). Written informed consent was obtained from all participants.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
