Sage Journals: Discover world-class research

Abstract

Background:

Universal screening of all perinatal women using the Edinburgh Postnatal Depression Scale is currently recommended in Australian National Guidelines, yet Australian validation studies of this measure are limited and with mixed findings. This study aims to address a current gap using the largest Australian sample to include both antenatal and postpartum periods to evaluate the performance screening for Major Depression.

Method:

Data from 887 women is drawn from the Mercy Pregnancy and Emotional Wellbeing Study, a prospective cohort, in Melbourne, Perth and regional and rural Western Australia. Participants completed an Edinburgh Postnatal Depression Scale and Structured Clinical Interview for DSM diagnostic interview between weeks 12 and 20 of pregnancy and then again at 6 months postpartum. Data are compared to report internal validity and receiver operator characteristics including area under the curve, sensitivity, specificity, positive predictive value, negative predictive value and optimal cutoffs for the Edinburgh Postnatal Depression Scale.

Results:

Internal consistency was good. With recommended Edinburgh Postnatal Depression Scale cutoffs of 13 or above in the postpartum and 15 and above antenatally the Edinburgh Postnatal Depression Scale was found to have a positive predictive value of 52% and 58%, respectively. Overall, the receiver operator characteristic analysis suggests fair to poor performance of the Edinburgh Postnatal Depression Scale for detecting Major Depression in both the antenatal and postpartum periods.

Conclusions:

Clinicians and researchers using recommended Edinburgh Postnatal Depression Scale cutoffs may expect to have one in two of those women screening positive later receive a diagnosis of Major Depression, and one in five who screen negative representing missed cases. Clinical implications and recommendations are discussed.

Keywords

Edinburgh Postnatal Depression Scale (EPDS)perinatal depression screening validation

Introduction

Screening for depression during pregnancy and the first postnatal year can have substantial benefits for women and their children, if women can subsequently access treatment in a timely manner (Austin, 2014; El-Den et al., 2022). The Edinburgh Postnatal Depression Scale (EPDS) (Cox et al., 1987) is one of the most widely used depression screening tools in the perinatal period, for both clinical and research purposes. It was first validated for use in an Australian population in 1993 based on a sample of 103 postpartum women in Sydney (Boyce et al., 1993). The scale has been recommended for widespread use in screening for depression in both the antenatal and postpartum periods by Australian clinical guidelines (The Centre of Perinatal Excellence [COPE], 2023).

Despite more than a decade of recommending the EPDS for national screening in Australia, there are some significant limitations and gaps in the available Australian validation studies. Previous validation studies comparing EPDS scores to a clinical diagnostic interview in the postpartum period in Australia found sensitivities ranging from 38% to 100% and more consistent specificities of 81–96% at a cutoff of 13 and above (Barnett et al., 1999; Boyce et al., 1993; Matthey et al., 2001; Phillips et al., 2009). Furthermore, in some Australian studies only women who scored above the designated cut off were included in validation against a diagnostic interview, not the whole sample (Milgrom et al., 2005); such studies are hence unable to report negative cases for sensitivity and specificity.

The authors are aware of only three studies that have used an antenatal sample to report the performance of EPDS against a diagnostic interview in Australia. The first of these is an unpublished PhD thesis (Lien, 2007), the second, a large study focusing on the development of an alternative screening tool, but also reporting performance of the EPDS from a sample in NSW (Austin et al., 2005), and the third a small sample drawn from an antenatal clinic in Sydney, NSW (Eapen et al., 2013). Using data from 200 women, Lien (2007) reports antenatal sensitivities and specificities in the same range as the postpartum validation studies reported above, with 67.7% sensitivity and 89.3% specificity at a cutoff of 13 and above. However, in the second study using a sample of 1296 women, Austin et al. (2005) report a much more modest sensitivity of 22.2% and specificity of 91.3% at a cutoff of 12 and above, and positive predictive values below 13% for all reported cutoffs (11.5, 5.5, 9.5), suggesting weak performance of the EPDS when assessed in a larger, Australian, antenatal sample (Austin et al., 2005). The third study drew on a sample of 131 antenatal women and found a sensitivity of 27% and specificity of 96% at a cut off of 13 and above (Eapen et al., 2013; Levis et al., 2020). No Australian study the authors are aware of has examined repeated EPDS in both antenatal and postnatal women.

A recent international meta-analysis of validation studies of the EPDS to detect Major Depression in pregnancy and the postpartum identified 58 studies, of those included were five from Australia (Levis et al., 2020). These Australian studies included four postpartum samples where a diagnostic measure had been used, for some this was only in a subset of the overall study sample. These include a 1997 study of 72 women in NSW (Boyce and Hickey, 2005; Hickey et al., 1997), a 2008 study of 137 women in Victoria (Rowe et al., 2008), a 2009 study of 158 women in NSW (Phillips et al., 2009), a 2010 study of 192 women in Victoria (Fisher et al., 2010), and the above-mentioned antenatal study of 131 women in NSW (Eapen et al., 2013). These validation studies used a range of diagnostic measures for depression: two utilized a semi-structured interview, which is administered by clinical professionals (both the Structured Clinical Interview for DSM [SCID]), two utilized a fully structured interview which is fully scripted and can be administered by trained laymen (both the Composite International Diagnostic Interview, CIDI), and one used the Mini International Neuropsychiatric Interview, MINI, which is designed to be overinclusive (Levis et al., 2019). None included women from regional or rural areas or included both antenatal and postnatal women. Of note the studies by both Austin et al. (2005) and Boyce et al. (1993) were excluded from this meta-analysis for methodological reasons (Levis et al., 2020). Overall, this meta-analysis recommended a cut off for the EPDS of 11 or higher for sensitivity and specificity as a screening tool but was unable to conduct analyses for specific countries and concluded that more ‘well-conducted’ trials were required (Levis et al., 2020).

A further international meta-analysis from the same group identified 29 studies internationally which investigated EPDS performance in determining depression prevalence using the SCID (Lyubenova et al., 2021). This meta-analysis included the same two Australian studies found in the earlier meta-analysis (Hickey et al., 1997; Phillips et al., 2009). This meta-analysis found the difference between EPDS and SCID prevalence ranged widely and if both measures were used this could range between −14% and 12%; the authors concluded that the use of the EPDS to measure prevalence would likely under- or overestimate major depression in the sample, even if a conservative cut off of >14 was used. The overall conclusion was that the EPDS cannot be used to accurately report prevalence of depression or depressive symptoms in samples (Lyubenova et al., 2021).

As well as being used as a screening tool the EPDS is also frequently used to determine prevalence of depression and also as a proxy for a depressive diagnosis in research studies. Indeed, there have been attempts to subtype clinical depression using the EPDS (Putnam et al., 2017), although this has not been replicated, which is not surprising as the EPDS does not collect many of the symptom markers of subtypes of depression (Galbally et al., 2023; Maj et al., 2020). The EPDS has also been used as the only measure of depressive illness in Australian genetic studies, as well as in studies examining diverse outcomes for women and children following perinatal depression (Deave et al., 2008; Kiewa et al., 2022). This is despite studies that have demonstrated differences in findings when using the EPDS in contrast with a clinical diagnostic instrument, as well as the persistent call by the author of EPDS to restrict its use to the intended purpose of universal maternity screening (Cox, 2019; Pawlby et al., 2008). The use of EPDS scores in research as a proxy for depression or to determine prevalence of depression in samples is not supported by research findings including a recent meta-analysis or by the original author of the measure in a recent editorial (Cox, 2019; Lyubenova et al., 2021). The design and purpose of the EPDS is as a screening measure including as the recommended measure for national screening for perinatal mental health in Australia (Highet and the Expert Working Group and Expert Subcommittees, 2023). Yet the validation research available in Australian samples is limited and gaps remain (Cox, 2019; Lyubenova et al., 2021). This supports further research in Australian samples on the EPDS as a screening measure for detecting Major Depression.

The overall aim of this study is to address the identified gaps and utilize a large, longitudinal Australian dataset to assess the performance of EPDS to detect Major Depression against a diagnostic measure in both antenatal and postpartum women. This study used similar sampling methods, yet eight times the number of participants of the most widely cited Australian validation study (Boyce et al., 1993), and more than twice as many participants as the two Australian studies (Hickey et al., 1997; Phillips et al., 2009) included in the recent meta-analysis reporting on EPDS validation against the SCID (Levis et al., 2020). It also includes in the sample women from regional and rural areas. The aim is to examine: (1) validity of EPDS as a screening tool for Major Depression in pregnancy and the postpartum and (2) its positive predictive validity.

Methods

We used data from an ongoing, longitudinal prospective cohort study, the Mercy Pregnancy and Emotional Wellbeing Study (MPEWS) (Galbally et al., 2017). Women were recruited in early pregnancy (12–20 weeks gestation) and have been followed for over 10 years. In this study, we use data from four timepoints (which we refer to as ‘waves’) of this broader longitudinal study, collected in the same cohort of women at weeks 12–20 of pregnancy (wave 1), approximately 28 weeks of pregnancy (wave 2), 6 months postpartum (wave 4), and 12 months postpartum (wave 5). Wave 3 at delivery captures only birth data from hospital records, and therefore was not considered for the current nested study.

Participants

Participants for this nested study include those from the first four cohorts which included a total of 887 women recruited through Mercy Hospital for Women in Melbourne, Victoria, Australia between 2012 and 2017, from public hospitals in Metropolitan Perth, Western Australia between 2017 and 2018, and from regional and rural Western Australia, including the Southwest, Midwest, and Goldfields, between 2018 and 2020. The mean maternal age was 31.91 years (SD = 4.77, range = 18.97–48 years) on recruitment to the study. Within this sample 90.3% (n = 763) were born in Australia, 60.4% (n = 335) have a university education, 97.4% (n = 790) are married or in a de facto relationship, and 66% (n = 584) were enrolled in the study with their first pregnancy. All participants provided written informed consent for participation, and the study was conducted with ethics approval from the Human Research Ethics Committees of Mercy Health (R08/22), the South Metropolitan Health Service (REG Number: 2016-192), and the Western Australia Country Health Service (18/02).

Participants were recruited mostly from the general pregnant population through antenatal clinics at public hospitals. To recruit a sufficient number of participants with suspected Major Depression, and those taking antidepressants, a targeted recruitment was also utilized. Exclusion criteria for the longitudinal study included a lack of English fluency, psychiatric illness requiring acute inpatient admission, substance abuse disorder, and/or child protection involvement.

The data used here had 15.7% missing data for the clinical interview conducted at 6 months postpartum. Importantly, there was no significant difference between those who did and did not undertake the postpartum interview regarding whether they were assessed as depressed at the wave 1 timepoint (chi-square test with continuity correction: χ² = 2.58, p = 0.11; see Supplementary Material for other demographic comparisons between those who completed and those who did not the complete SCID at follow up at 6 months postpartum).

Measures

Edinburgh Postnatal Depression Scale

At all four timepoints in early and late pregnancy and at 6 and 12 months postpartum, participants completed questionnaires, either on paper (for the earlier waves and cohorts) or online. These questionnaires included the Edinburgh Postnatal Depression Scale (EPDS) (Cox et al., 1987). The continuous total score is used for all analyses here, and the performance of different cutoff values is compared.

Structured Clinical Interview for DSM

At recruitment in early pregnancy (<20 weeks gestation, wave 1) and at 6 months postpartum (wave 4), women were interviewed by a trained mental health professional using the Structured Clinical Interview for DSM-IV (cohorts 1 and 2) (First et al., 1997) or Axis 1 Disorders schedule for DSM-5 (cohorts 3 and 4) (First et al., 2015) (SCID). This assessment is a semi-structured diagnostic interview for Major Depression among other Diagnostic and Statistical Manual of Mental Disorders (DSM) and is used as the reference standard for diagnosis in this study (Trevethan, 2017). The SCID conducted at wave 4 was slightly adjusted for the study to capture episodes of depression at the time of the interview or since the previous SCID at wave 1 (rather than for the previous 2 years).

While every effort was made to keep elements of each wave within short succession, in practice, the EPDS and clinical interview were conducted a median of 14 days apart for wave 1 (SD = 36.53 days) and 18 days apart at wave 4 (SD = 56.59 days). Sensitivity analyses using subsets of the data collected within 14 days only were performed to test the robustness of our analyses to this variability in time between measures.

Statistical analysis

All data cleaning and analysis was performed in R version 4.3.1 (R Core Team, 2023), and the pROC package (Robin et al., 2011) for Receiver Operating Curve analysis.

First, we provide descriptive statistics for the SCID in the form of the proportion of the sample meeting criteria for Major Depression at waves 1 and 4. For EPDS scores, we provide the following descriptive statistics by SCID diagnosis: mean, standard deviation (SD), 95% confidence intervals for the mean (CI), median, interquartile range (IQR), minimum and maximum. We provide the internal consistency for the EPDS at each wave, including α and Feldt’s confidence intervals.

To assess the validity (against the SCID diagnostic measure) of EPDS as a screening tool for Major Depression in pregnancy and the postpartum, a receiver operating characteristic (ROC) curve analysis was performed separately at the wave 1 and wave 4 timepoints (Weinstein et al., 1989). Area under the curve (AUC) is used as a summary measure of the quality of the ROC, combining sensitivity and specificity resulting in a value between 0 and 1. AUC values ranging between 0.9 and 0.99 indicate an excellent test, 0.8–0.89 a good test, 0.7–0.79 a fair test and 0.51–0.69 a poor test (Carter et al., 2016).

Based on cutoffs in the antenatal and postpartum periods recommended in the literature, as well as the ‘best’ cutoff given our sample’s statistics, we report the number of true and false negatives and positives, sensitivity and specificity, positive predictive value (PPV) and negative predictive value (NPV) . We report these separately for wave 1 and wave 4. In each case, the ‘best’ cutoff was determined by the maximum Youden index (Youden, 1950), as the value at which both sensitivity and specificity are maximized.

Sensitivity analysis was conducted to investigate the effects on the AUC of (a) cohort to which participants belonged, (b) whether or not the participants were on antidepressant medication at wave 1, and (c) the timing of the EPDS and SCID assessments (data from all participants versus participants who completed the EPDS and SCID within 2 weeks of each other). Finally, as the SCID at wave 4 captures participant experiences since the previous SCID at wave 1, we also test whether using the maximum EPDS score (wave 2 and wave 4), rather than the EPDS at wave 4 only, makes a significant difference to the AUC. For these sensitivity analyses, we use the roc.test from the pROC package (Robin et al., 2011) to compare AUC with the appropriate DeLong’s tests for two ROC curves (DeLong et al., 1988; Sun and Xu, 2014).

Results

Descriptive statistics

At recruitment in early pregnancy (wave 1), 164 participants (18.49%) met the diagnostic criteria for Major Depression according to the SCID and 723 participants did not. At 6 months postpartum (wave 4), 173 participants (23.13%) met criteria for Major Depression on the SCID assessing depressive episodes since wave 1 and 575 participants did not. Descriptive statistics for EPDS scores at each wave are reported in Table 1 separated by SCID results at waves 1 and 4.

Table 1.

EPDS descriptive statistics split by W1 and W4 SCID depression results.

	95% CI
	W1 SCID	W4 SCID	N	Mean	SD	Lower	Upper	Median	IQR	Minimum	Maximum
EPDS W1	0	0	490	5.06	4.09	4.70	5.42	4	5.00	0	28
	0	1	105	7.07	4.61	6.18	7.96	7	7.00	0	21
	1	0	63	8.29	5.29	6.95	9.62	7	5.50	0	21
	1	1	54	11.46	5.46	9.97	12.95	12	6.00	0	27
EPDS W2	0	0	482	4.90	3.97	4.55	5.26	4	5.00	0	20
	0	1	99	7.33	5.48	6.24	8.43	7	8.00	0	22
	1	0	62	8.13	5.60	6.71	9.55	7	7.75	0	24
	1	1	52	10.65	5.98	8.99	12.32	10	7.00	0	30
EPDS W4	0	0	435	4.92	3.94	4.54	5.29	4	6.00	0	20
	0	1	81	7.72	5.53	6.49	8.94	7	6.00	0	27
	1	0	57	6.74	4.26	5.61	7.87	6	5.00	0	18
	1	1	49	10.51	6.44	8.66	12.36	11	9.00	0	23
EPDS W5	0	0	378	5.00	3.87	4.60	5.39	4	5.00	0	25
	0	1	70	7.67	5.22	6.43	8.92	7	7.00	0	21
	1	0	48	8.08	5.43	6.51	9.66	6	6.00	0	22
	1	1	32	10.09	5.15	8.24	11.95	10.5	8.00	1	23

The CI of the mean assumes sample means follow a t-distribution with N – 1 degrees of freedom. EPDS = Edinburgh Postnatal Depression Scale; SCID = Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders; W1 = wave 1, ~12–20 weeks gestation; W2 = wave 2, ~28 weeks gestation; W4 = wave 4, ~6 months postpartum; W5 = wave 5, ~1-year postpartum; CI = confidence interval; SD = standard deviation; SE = standard error; IQR = interquartile range.

Comparing SCID diagnoses at wave 1 and wave 4 within participants, 81.90% of participants assessed as not depressed at wave 1 remained in the not depressed group for wave 4 (n = 511), whereas 48.39% of participants classed as depressed using the SCID at wave 1 maintained their diagnosis at wave 4.

Reliability: internal consistency of the EPDS in pregnancy and the postpartum

Cronbach’s alpha for EPDS at all waves in our sample were good (wave 1 α = 0.86, 95%CI_Feldt = [0.84, 0.87]; wave 2 α = 0.88, 95%CI_Feldt = [0.86, 0.89]; wave 4 α = 0.87, 95%CI_Feldt = [0.85, 0.88]; wave 5 α = 0.86, 95%CI_Feldt = [0.85, 0.87]). No sizable deviations were observed in Cronbach’s alpha with any single item removed from the scale at any wave.

Validity: ROC curve analyses

Early pregnancy < 20 weeks gestation (wave 1)

Area under the curve (AUC) for early pregnancy (wave 1) EPDS and SCID was 0.74 (95% CI = [0.70, 0.79]) and had a best cutoff at 6.5 on the EPDS. See Tables 2 and 3 for a count of true negatives, true positives, false negatives, false positives, sensitivity, specificity, positive predictive value, negative predictive value and Youden’s index for the best cutpoint determined for this sample as well as the common cutoff suggested by previous literature for the antenatal period (Matthey et al., 2006; Murray and Carothers, 1990); full cutoff statistics are available in the Supplementary Materials. This was not significantly different between participants on anti-depressants during pregnancy compared with those who were not (D(284.02) = −1.32, p = 0.19, ∆AUC_AD-NoAD = −0.07), nor among any of the four cohorts recruited into the study (AUC = 0.71:0.78; p = 0.33:0.88). Restricting the sample to EPDS and SCID completed within 14 days of each other did not make a significant difference to AUC either (D(625.1) = −0.57, p = 0.57, ∆AUC_{Standard-TimeRestricted} = 0.03). See Supplementary Materials for full statistics for the time-limited ROC.

Table 2.

ROC analysis results for wave 1 EPDS vs SCID.

EPDS cutpoint		TN (n)	TP (n)	FN (n)	FP (n)	Sensitivity (%) (95%CI)	Specificity (%) (95%CI)	PPV (%)	NPV (%)	Youden’s index
6.5	Best	450	106	47	237	69.28 [62.09, 75.82]	65.50 [62.01, 69.14]	30.90	90.54	0.35
14.5	Literature Recommended^a	666	29	124	21	18.95 [13.07, 25.49]	96.94 [95.63, 98.25]	58.00	84.30	0.16

EPDS = Edinburgh Postnatal Depression Scale; TN = true negatives; TP = true positives; FN = false negatives; FP = false positives; PPV = positive predictive value; NPV = negative predictive value; Youden’s Index = sensitivity + specificity – 1.

Levis et al. (2020).

Table 3.

ROC analysis results for wave 4 EPDS vs SCID.

EPDS cutpoint		TN (n)	TP (n)	FN (n)	FP (n)	Sensitivity (%) (95% CI)	Specificity (%) (95% CI)	PPV (%)	NPV (%)	Youden’s index
5.5	Best	298	91	39	194	70.00 [62.31, 77.69]	60.57 [56.30, 65.04]	31.93	88.42	0.31
9.5	Literature Recommended^a	422	54	76	70	41.54 [33.06, 50.00]	85.77 [82.52, 88.82]	43.55	84.74	0.27
12.5	Literature Recommended^a	463	32	98	29	24.62 [17.69, 32.31]	94.11 [91.87, 95.93]	52.46	82.53	0.19

Levis et al. (2020).

Six months postpartum (wave 4)

Area under the curve (AUC) for 6 months postpartum (wave 4) EPDS and SCID was 0.68 (95% CI = [0.63, 0.74]) and had a best cutoff at 5.5 on the EPDS (see Table 4; full cutoff statistics are available in the Supplementary Materials. The AUC was not significantly different to the AUC for wave 1 (D(1510.1) = 1.55, p = 0.12, ∆AUC_W1-W4 = 0.05), nor if the maximum EPDS was taken from wave 2 and wave 4 to account for the longer temporal envelope of the SCID compared to the EPDS (Z = −0.14, p = 0.89, ∆AUC_Standard-Max =−0.002). Restricting the sample to EPDS and SCID completed within 14 days of each other did not make a significant difference to AUC either (D(511.7) = 1.25, p = 0.21, ∆AUC_{Standard-TimeRestricted} =−0.06) (see Supplementary Materials).

Discussion

In this study, internal consistency for the EPDS was good. Furthermore, there was no difference in the performance of the EPDS for detecting Major Depression in pregnancy and in the postpartum period, tested in the same participants, suggesting it is equally reliable in these two perinatal phases. However, our AUC results suggest that the EPDS is a poor to fair indicator of Major Depression in pregnancy and the postpartum. Furthermore, at the most common cutoffs, the positive predictive value of the EPDS was just over 50%. This suggests that for every person who screens positive on the EPDS in a clinical setting and is appropriately diagnosed with Major Depression, another will screen positive but not receive a diagnosis i.e. a high false positive rate. Conversely, the sensitivity and NPVs at these recommended cutoffs also mean that women who should receive a diagnosis of Major Depression will be missed. Our results suggest that if the EPDS is relied upon as the primary way of identifying cases for further referral in a clinical setting, Major Depression will be missed in approximately one in five people who do not trigger a positive screen on the EPDS. This has consequences for individuals who may be misclassified, and their families (Krantz et al., 2008). In addition, resourcing for clinical services is substantially impacted given the added assessment burden of double the number of screened potential cases compared to true cases.

The ‘best’ cutoffs for our sample in the antenatal (6.5 = 7 and above) and postpartum (5.5 = 6 and above) periods are substantially lower than the typically recommended cutoffs from the literature of 14.5 (15 and above) for the antenatal period and 12.5 (13 and above) for the postpartum period (Matthey, 2010). This is also below estimates from two recent meta-analyses, which found optimal cutoffs of 11 and above for both pregnancy and the postpartum, and 10 and above in pregnancy (Levis et al., 2020; Rondung et al., 2024), which are also both notably below the traditionally recommended cutoffs. It is important to keep in mind that these optimal cutoffs are determined by sensitivity and specificity, which may not be the most important for clinical application (Trevethan, 2017). Regardless, if sensitivity and specificity (and therefore, AUC) are a measure of the foundational validity of a screening tool (Trevethan, 2017), then our results suggest that there is much room for improvement on the EPDS as a screening tool for perinatal depression in Australia. We acknowledge, however, that even if administration of a gold-standard diagnostic interview for Major Depression were feasible at a universal level during pregnancy and the first postpartum year, this would not necessarily ensure the best outcomes for women and their family unless diagnosis is followed with timely access to evidence based treatment; in our study almost half of women diagnosed with Major Depression at wave 1 were diagnosed with Major Depression again at wave 4, indicating that these women had either not accessed appropriate support and/or treatment or had not responded to treatment.

At a cutoff of 12.5 in pregnancy, Rondung et al. (2024) found sensitivity of .61 (substantially higher than in our sample), specificity roughly equivalent to ours, a PPV of .39 (lower than in our sample) and an NPV of .98 (higher than in our sample). Levis et al. (2019), as in our study, found no notable difference in ROC for pregnancy and the postpartum, and found sensitivity and specificity comparable with Rondung et al. (2024) but did not report PPV or NPV. Despite some differences from these recent large meta-analyses, our values are not very different the range of some other Australian validation studies, such as a study of postpartum women (n = 238) by Matthey et al. (2001), although other studies found sensitivity slightly higher than our upper 95% CI range. Barnett et al. (1999) also found poor PPV for detecting Major Depression from the EPDS in Anglo-celtic (n = 105, PPV_{12.5 cutoff} = 40%), Arabic (n = 98, PPV_{12.5 cutoff} = 38.5%) and Vietnamese (n = 113, PPV_{12.5 cutoff} =50%) postpartum women in Australia. At a similar cutoff value to our optimal cutoff, Austin et al. (2005) reported a PPV of 12.4% antenatally (cutoff of 5.5, n = 1296).

Overall, despite the sensitivity seen in our sample when using the usual cutoff of 13 and above, we suggest that the increased PPV at this value is more important when relying on the EPDS for clinical purposes, and the recommended cutoff for use in Australia should not be changed as a result of our findings. Based on our findings, however, the Centre of Perinatal Excellence (COPE) (2023) recommendation to arrange further assessment of perinatal women with an EPDS score of 13 or more would result in twice the number of further assessments than cases identified as a result of this process requiring considerable mental health service resources, if these referrals resulted in further mental healthcare consultations. In reality, not all women who are referred for further investigation or treatment following an EPDS administration go on to access professional mental health assessment and care (Lee-Carbon et al., 2022). However, this highlights the need for clinicians to not rely on a screening measure such as the EPDS alone to inform referral for further mental health assessment and care but requires further enquiry and assessment to clarify and ensure appropriate referral. Based on our findings, one in five individuals scoring less than 13 on the EPDS who may need support and would likely receive a diagnosis, may be missed according to this cutoff rule. These women may not access specialized mental healthcare, from which they would almost certainly benefit.

Early child and family health service nurses, as well as general practitioners, midwives, and obstetricians who all provide care for women in the perinatal period, would benefit from further training in identification and management of anxiety and depression among women; this may assist in ensuring that only women with significant clinical needs are referred and women with significant symptoms and impairment are not overlooked. In addition, to increase the accuracy of women’s responses, health professionals should not rely on screening measures alone and ensure further enquiry, assessment and a trusted environment where mental health symptoms can be clarified and support provided (Kingston et al., 2015).

In his 30-year reflection on the use of the EPDS, the original author Cox (2019) reiterates that the use of the EPDS should be limited to its initial intended purposes as a screening measure for use in maternity services for maternal depression. In addition, he recommends that the scale’s validity and appropriate cutoffs must also be confirmed for a particular population and, importantly, combined with follow-up consultations with the mother about her mental health prior to a mental health referral (Cox, 2019). Given the relatively poor PPV in our sample, we echo these suggestions to avoid the use of the EPDS as a one-step screening tool which alone determines the next course of action and encourage the use of a range of tools and consultation before progressing to a full mental health evaluation. As per the current guidelines for provision of mental health care in the perinatal period in Australia (COPE, 2023) and the United Kingdom (The National Institute for Health and Care Excellence [NICE], 2014), screening should always take place as part of an integrated psychosocial health assessment.

We acknowledge that the EPDS may be the best available tool for clinical screening, despite its pitfalls, and recommended when relevant ensuring culturally appropriate translations or alternative screening for Aboriginal and Torres Strait Islander women and families (Barnett et al., 1999; Blackmore et al., 2022; Campbell et al., 2008; Marley et al., 2017). The benefits of formalized screening protocols using short self-report measures, such as the EPDS, within early child and family health services, where in-depth mental health knowledge is not a requirement of the practitioners, does increase the likelihood that we identify women with mental health difficulties, despite the reservations and concerns we outline above.

Limitations

Limitations to consider are the relevant timeframes of the EPDS reports (within the past week), SCID (months to years), and the time between assessments, as approximately half of our data was collected with more than 2 weeks between the EPDS and the SCID assessments. However, our results suggest that these temporal factors matter little to the validity of the EPDS in terms of sensitivity and specificity, with no significant differences in AUC between the pregnancy and postpartum timepoints (with very different diagnostic timeframes), nor when considering the maximum EPDS over the time-period covered by the postpartum interview, nor when limiting analyses to only those assessments completed within a fortnight. Similarly, due to the high bar set to meet diagnostic criteria using the SCID (and therefore its reference standard status), there are likely some depressive episodes between the two interviews which could not be retrospectively confirmed to meet this threshold using the SCID interview at a later timepoint. This means that our wave 4 ‘not depressed’ referent classification in particular may be underestimated. However, given our PPV is very much in line with previous literature using a range of diagnostic interviews that cover a range of time-periods for assessment, and sensitivity is low (referent cases are not being reliably detected), this imprecision in the SCID is unlikely to change our conclusions.

Different reference standards may affect the outcome of a validity analysis (Levis et al., 2019). However, both Levis et al. (2020) and Rondung et al. (2024) showed the best that the EPDS performed best against semi-structured interviews such as the SCID.

Our ROC analysis considers the validity of the EPDS in detecting only Major Depression, but it may also be useful to detect other mental health concerns due to its identification of general distress (Galbally et al., 2023; Tendais et al., 2014), anxiety (Matthey, 2008; Rondung et al., 2024; Rowe et al., 2008) and personality disorders (Judd et al., 2019). This is highlighted by the lack of research around the discriminant validity of the EPDS (McBride et al., 2014). Future research should develop a deeper understanding of the EPDS for detecting any mental health condition, as opposed to depression alone.

In being drawn from the MPEWS cohorts, our sample is representative of neither a general population nor a strictly clinical one, but rather oversamples those on antidepressants and those with suspected or confirmed depression at recruitment. This means that our results should not be used to derive prevalence estimates of Major Depression in Australia. This method is similar to sampling methods used in the past (Boyce et al., 1993) and allows for sufficient base rates of Major Depression to adequately assess the EPDS as a screening measure. Our sample was recruited as English speaking and while it included those for whom English was a second language, it did not assess any translated versions of EPDS. Only 10% of our sample were born outside Australia and spoke a language other than English at home; this is significantly fewer than the proportion who spoke a language other than English in Australia in 2021 (23%) (Australian Institute of Health and Welfare [AIHW], 2022). This study also did not include fathers in the validation of EPDS.

Conclusion

Our study represents the largest dedicated validation study to test the EPDS against a semi-structured diagnostic interview in Australia to date, and considers its performance across two states, including rural areas, and in both pregnancy and the postpartum. It found poor to fair performance against a diagnostic semi-structured clinical interview reference standard for detecting Major Depression (SCID). Clinical services that rely on the EPDS alone for determining referral for mental health care will expend a lot of resources for a less than desirable detection rate and may miss people who need help with their mental health in the perinatal period. Despite the EPDS being used in some studies to report the prevalence of depression, depression diagnosis, or depressive symptoms in research, we recommend, in accordance with the EPDS’ author’s intention and the national guidelines, that the EPDS be used as an initial screening tool as part of any universal approach within maternity and early child and family health services and which always includes an additional comprehensive psychosocial assessment. Finally, it is critical to remember that screening is only likely to be beneficial if when indicated, mental health assessment and treatment follow.

Supplemental Material

sj-docx-1-anp-10.1177_00048674251361756 – Supplemental material for Screening for perinatal depression in Australia: Validation of the Edinburgh Postnatal Depression Scale in pregnancy and the postpartum

Supplemental material, sj-docx-1-anp-10.1177_00048674251361756 for Screening for perinatal depression in Australia: Validation of the Edinburgh Postnatal Depression Scale in pregnancy and the postpartum by Kelsey Perrykkad, Isobel Nicholls, Andrew Lewis, Philip Boyce, Karen Wynter, Irene Bobevski and Megan Galbally in Australian & New Zealand Journal of Psychiatry

Footnotes

Acknowledgements

The authors would like to thank those who have both supported and given advice in the development of MPEWS including Michael Permezel and Marinus van IJzendoorn. The authors also thank the staff and students on the study and research coordinators for their contribution to MPEWS. We are also sincerely grateful to the study participants who have contributed a substantial amount of time to participate in this study to date, and many of whom continue to dedicate time and effort many years later.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This study is supported through the 2012 National Priority Funding Round of Beyond blue in a 3-year research grant (ID 519240) and a 2015 National Health and Medical Research Council (NHMRC) project grant for 5 years (APP1106823).

Ethics Approval

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. This project has ethics approval with Mercy Health Human Research Ethics Committee Reference R08/22 and with WA Health South Metropolitan Human Research Ethics Committee Reference 2016-192. All participants provided written informed consent to participate in this study.

Data Availability

The data that support the findings of this study are available on request from the corresponding author, MG. The data are not publicly available due to ethics restrictions.

ORCID iDs

Philip Boyce

Megan Galbally

Supplemental Material

Supplemental material for this article is available online.

References

Austin

(2014) Marcé International Society position statement on psychosocial assessment and depression screening in perinatal women. Best Practice & Research Clinical Obstetrics & Gynaecology 28: 179–187.

Austin

Hadzi-Pavlovic

Saint

, et al. (2005) Antenatal screening for the prediction of postnatal depression: Validation of a psychosocial Pregnancy Risk Questionnaire. Acta Psychiatrica Scandinavica 112: 310–317.

Australian Institute of Health and Welfare (AIHW) (2022) Culturally and Linguistically Diverse Australians. Bruce, ACT, Australia: AIHW.

Barnett

Matthey

Gyaneshwar

(1999) Screening for postnatal depression in women of non-English speaking background. Archives of Women’s Mental Health 2: 67–74.

Blackmore

Gibson-Helm

Melvin

, et al. (2022) Validation of a Dari translation of the Edinburgh Postnatal Depression Scale among women of refugee background at a public antenatal clinic. Australian and New Zealand Journal of Psychiatry 56: 525–534.

Boyce

Hickey

(2005) Psychosocial risk factors to major depression after childbirth. Social Psychiatry and Psychiatric Epidemiology 40: 605–612.

Boyce

Stubbs

Todd

(1993) The Edinburgh Postnatal Depression Scale: Validation for an Australian sample. Australian and New Zealand Journal of Psychiatry 27: 472–476.

Campbell

Hayes

Buckby

(2008) Aboriginal and Torres Strait Islander women’s experience when interacting with the Edinburgh Postnatal Depression Scale: A brief note. The Australian Journal of Rural Health 16: 124–131.

Carter

Pan

Rai

, et al. (2016) ROC-ing along: Evaluation and interpretation of receiver operating characteristic curves. Surgery 159: 1638–1645.

10.

Cox

(2019) Thirty years with the Edinburgh Postnatal Depression Scale: Voices from the past and recommendations for the future. The British Journal of Psychiatry 214: 127–129.

11.

Cox

Holden

Sagovsky

(1987) Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. The British Journal of Psychiatry 150: 782–786.

12.

Deave

Heron

Evans

, et al. (2008) The impact of maternal depression in pregnancy on early child development. BJOG 115: 1043–1051.

13.

DeLong

Clarke-Pearson

(1988) Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44: 837–845.

14.

Eapen

Johnston

Apler

, et al. (2013) Adult separation anxiety during pregnancy and its relationship to depression and anxiety. Journal of Perinatal Medicine 41: 159–163.

15.

El-Den

Pham

Anderson

, et al. (2022) Perinatal depression screening: A systematic review of recommendations from member countries of the Organisation for Economic Co-operation and Development (OECD). Archives of Women’s Mental Health 25: 871–893.

16.

First

Spitzer

Gibbon

, et al. (1997) Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I), Research Version (Administration Booklet). Arlington, VA: American Psychiatric Publishing.

17.

First

Williams

Karg

, et al. (2015) Structured Clinical Interview for DSM-5, Research Version (SCID5 for DSM-5, Research Version; SCID-5-RV). Arlington, VA: American Psychiatric Association.

18.

Fisher

Wynter

Rowe

(2010) Innovative psycho-educational program to prevent common postpartum mental disorders in primiparous women: A before and after controlled study. BMC Public Health 10: 1–15.

19.

Galbally

Van IJzendoorn

Permezel

, et al. (2017) Mercy Pregnancy and Emotional Well-being Study (MPEWS): Understanding maternal mental health, fetal programming and child development. Study design and cohort profile. International Journal of Methods in Psychiatric Research 26: e1558.

20.

Galbally

Watson

Boyce

, et al. (2023) Perinatal depression: The use of the Edinburgh Postnatal Depression Scale to derive clinical subtypes. Australian and New Zealand Journal of Psychiatry 58: 37–48.

21.

Hickey

Boyce

Ellwood

, et al. (1997) Early discharge and risk for postnatal depression. Medical Journal of Australia 167: 244–247.

22.

Highet

and the Expert Working Group and Expert Subcommittees (2023) Mental Health Care in the Perinatal Period: Australian Clinical Practice Guideline. Melbourne, VIC, Australia: Centre of Perinatal Excellence (COPE).

23.

Judd

Lorimer

Thomson

, et al. (2019) Screening for depression with the Edinburgh Postnatal Depression Scale and finding borderline personality disorder. Australian and New Zealand Journal of Psychiatry 53: 424–432.

24.

Kiewa

Meltzer-Brody

Milgrom

, et al. (2022) Perinatal depression is associated with a higher polygenic risk for major depressive disorder than non-perinatal depression. Depression and Anxiety 39: 182–191.

25.

Kingston

Austin

Heaman

, et al. (2015) Barriers and facilitators of mental health screening in pregnancy. Journal of Affective Disorders 186: 350–357.

26.

Krantz

Eriksson

Lundquist-Persson

, et al. (2008) Screening for postpartum depression with the Edinburgh Postnatal Depression Scale (EPDS): An ethical analysis. Scandinavian Journal of Public Health 36: 211–216.

27.

Lee-Carbon

Nath

Trevillion

, et al. (2022) Mental health service use among pregnant and early postpartum women. Social Psychiatry and Psychiatric Epidemiology 57: 2229–2240.

28.

Levis

McMillan

Sun

, et al. (2019) Comparison of major depression diagnostic classification probability using the SCID, CIDI, and MINI diagnostic interviews among women in pregnancy or postpartum: An individual participant data meta-analysis. International Journal of Methods in Psychiatric Research 28: e1803.

29.

Levis

Negeri

Sun

, et al. (2020) Accuracy of the Edinburgh Postnatal Depression Scale (EPDS) for screening to detect major depression among pregnant and postpartum women: Systematic review and meta-analysis of individual participant data. BMJ 371: m4022.

30.

Lien

(2007) The Prediction of Antenatal and Postnatal Depression in a Sample of Western Australian Women. Joondalup, WA, Australia: Edith Cowan University.

31.

Lyubenova

Neupane

Levis

, et al. (2021) Depression prevalence based on the Edinburgh Postnatal Depression Scale compared to structured clinical interview for DSM disorders classification: Systematic review and individual participant data meta-analysis. International Journal of Methods in Psychiatric Research 30: e1860.

32.

Maj

Stein

Parker

, et al. (2020) The clinical characterization of the adult patient with depression aimed at personalization of management. World Psychiatry 19: 269–293.

33.

Marley

Kotz

Engelke

, et al. (2017) Validity and acceptability of Kimberley Mum’s Mood Scale to screen for perinatal anxiety and depression in remote aboriginal health care settings. PLoS ONE 12: e0168969.

34.

Matthey

(2008) Using the Edinburgh Postnatal Depression Scale to screen for anxiety disorders. Depression and Anxiety 25: 926–931.

35.

Matthey

(2010) Are we overpathologising motherhood? Journal of Affective Disorders 120: 263–266.

36.

Matthey

Barnett

Kavanagh

, et al. (2001) Validation of the Edinburgh Postnatal Depression Scale for men, and comparison of item endorsement with their partners. Journal of Affective Disorders 64: 175–184.

37.

Matthey

Henshaw

Elliott

, et al. (2006) Variability in use of cut-off scores and formats on the Edinburgh Postnatal Depression Scale: Implications for clinical and research practice. Archives of Women’s Mental Health 9: 309–315.

38.

McBride

Wiens

McDonald

, et al. (2014) The Edinburgh Postnatal Depression Scale (EPDS): A review of the reported validity evidence. In: Zumbo

Chan

EKH

(eds) Validity and Validation in Social, Behavioral, and Health Sciences. Cham: Springer, pp. 157–174.

39.

Milgrom

Ericksen

Negri

, et al. (2005) Screening for postnatal depression in routine primary care: Properties of the Edinburgh Postnatal Depression Scale in an Australian sample. Australian and New Zealand Journal of Psychiatry 39: 833–839.

40.

Murray

Carothers

(1990) The validation of the Edinburgh Post-natal Depression Scale on a community sample. The British Journal of Psychiatry 157: 288–290.

41.

Pawlby

Sharp

Hay

, et al. (2008) Postnatal depression and child outcome at 11 years: The importance of accurate diagnosis. Journal of Affective Disorders 107: 241–245.

42.

Phillips

Charles

Sharpe

, et al. (2009) Validation of the subscales of the Edinburgh Postnatal Depression Scale in a sample of women with unsettled infants. Journal of Affective Disorders 118: 101–112.

43.

Putnam

Wilcox

Robertson-Blackmore

, et al. (2017) Clinical phenotypes of perinatal depression and time of symptom onset: Analysis of data from an international consortium. The Lancet Psychiatry 4: 477–485.

44.

R Core Team (2023) R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.

45.

Robin

Turck

Hainard

, et al. (2011) pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12: 77.

46.

Rondung

Massoudi

Nieminen

, et al. (2024) Identification of depression and anxiety during pregnancy: A systematic review and meta-analysis of test accuracy. Acta Obstetricia et Gynecologica Scandinavica 103: 423–436.

47.

Rowe

Fisher

JRW

Loh

(2008) The Edinburgh Postnatal Depression Scale detects but does not distinguish anxiety disorders from depression in mothers of infants. Archives of Women’s Mental Health 11: 103–108.

48.

Sun

(2014) Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Processing Letters 21: 1389–1393.

49.

Tendais

Costa

Conde

, et al. (2014) Screening for depression and anxiety disorders from pregnancy to postpartum with the EPDS and STAI. The Spanish Journal of Psychology 17: E7.

50.

The Centre of Perinatal Excellence (COPE) (2023) Effective Mental Health Care in the Perinatal Period: Australian Clinical Practice Guideline (2023 Revision). Melbourne, VIC, Australia: The Centre of Perinatal Excellence.

51.

The National Institute for Health and Care Excellence (NICE) (2014) Antenatal and Postnatal Mental Health: Clinical Management and Service Guidance (NICE Guideline no. 192). London: NICE.

52.

Trevethan

(2017) Sensitivity, specificity, and predictive values: Foundations, pliabilities, and pitfalls in research and practice. Frontiers in Public Health 5: 307.

53.

Weinstein

Berwick

Goldman

, et al. (1989) A comparison of three psychiatric screening tests using Receiver Operating Characteristic (ROC) Analysis. Medical Care 27: 593–607.

54.

Youden

(1950) Index for rating diagnostic tests. Cancer 3: 32–35.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB