Abstract
The majority of women with postnatal depression (PND) do not seek help [1]. Even when women do consult a health care professional, half still remain untreated [2]. Postnatal depression therefore goes untreated in most cases, yet the distress it causes is often acute. The adverse long-term consequences of PND are serious and have been reported elsewhere [3]. Pharmacological and psychotherapeutic treatments for PND are available, and there is accumulating evidence of their effectiveness [4], [5]. However, efficient identification is first necessary to enable appropriate intervention. Although the Edinburgh Postnatal Depression Scale (EPDS) is one of the most widely advocated screening instruments [6], debate continues about screening programs for PND in general and about the EPDS as an adequate instrument for use in such programs [7], [8]. Essential to progress in this debate is clarity about:
how to strike the optimal balance of sensitivity (truepositive rate) and specificity (true-negative rate) of the EPDS in routine primary care settings; whether detection of depression can be improved by using the EPDS in concert with other validated instruments; whether administration of the EPDS is feasible in routine care and acceptable to its target populations.
Although there are numerous EPDS validation studies, they tend to focus on specific target populations [9–11]. The diagnostic utility of the EPDS has rarely been explored in routine primary care and in sufficiently large, community-wide samples representing a cross-section of demographic groups. Although some very large population screening studies have used the EPDS, they have usually not been concerned with formal validation against clinical diagnostic criteria [12]. Consequently, explicit EPDS validation studies in large samples (n<1000) remain relatively rare [13], [14]. If universal screening with the EPDS is to be advocated, then it should be on the basis of such large, demographically representative studies.
This study reports on the prevalence of PND assessed by the EPDS in a large Australian sample in routine primary care. Using the subset of those who screened positive, we aim to improve our understanding of the utility of the EPDS by evaluating its predictive value as related to psychiatric diagnostic profile and to compare its accuracy to that of a widely used diagnostic instrument, the Beck Depression Inventory (BDI). The advantages of a two-step screening process are discussed, together with issues surrounding successful community-wide screening.
Method
Procedure
The screened population consisted of 4148 newly delivered mothers attending 47 Maternal and Child Health Centres (M&CHCs) in northern metropolitan Melbourne and in rural eastern Victoria, Australia, over a 3-year period. Nurses at the M&CHCs had contact with all newly delivered mothers, were well placed to screen a wide demographic range and used the EPDS at 4-month post-partum consultations. Nurses did not themselves elicit the responses or administer the EPDS, which was self-administered by women, thus requiring no interpretation except summing of scores. Nurses remained blind to subsequent clinical assessment procedures. We screened at 4 months to allow transient mood disturbances (including ‘maternity blues’) to resolve and because M&CHCs schedule longer consultations at this time, attendance at which is generally high. Informed, written consent was obtained from all participants. Those unable to give informed consent or who had difficulty with English were excluded. The EPDS has been validated using various cut-offs [16]. In general, a higher cut-off yields both higher true-positive and higher false-negative rates [25]. At both initial screening and with the repeat EPDS given at clinical assessment, we used the cut-off score of 11.5 used by Gerrard et al. [15], to yield a lower false-negative rate than the more widely used 12.5 cut-off [16]. Those who had EPDS scores ≥12 were offered clinical assessment with a psychologist involving a structured interview and diagnosis followed by completion of a second EPDS and the BDI. Those diagnosed as depressed were offered a range of treatment options. Time between screening and diagnosis averaged 1–2 weeks.
Measures
The Edinburgh Postnatal Depression Scale
Designed by Cox et al. [6], this self-rated 10-item instrument requires women to read 10 statements relating to emotional health and for each statement choose one of four possible responses. This takes approximately 5minutes. The responses are rated from 0 to 3 and summed to yield the score. The EPDS was developed specifically to screen for PND and has been used extensively with antenatal and postnatal women [16]. It is particularly useful in PND being short and not relying on somatic symptoms, which are common in postnatal women irrespective of depression. In an Australian sample the EPDS was found to have 100% sensitivity and 89% specificity at a cut-off score of 12.5 [9].
The Composite International Diagnostic Interview
The Composite International Diagnostic Interview (CIDI) is a validated, standardized, structured interview that yields psychiatric diagnoses according to DSM criteria [17], [18]. We applied Section E of the CIDI, which deals with depression, to classify women according to current diagnosis on seven DSM-IV categories that include some component of depression, namely: major depressive disorder; depressive disorder not otherwise specified; adjustment disorder with depressed mood; mixed anxiety-depressive disorder; bipolar disorder; psychotic disorder and dysthymic disorder. All women exhibiting transient distress not fitting criteria for clinical diagnosis (i.e. psychiatric non-cases) were included in a category that we termed ‘non-pathological mood fluctuation’.
The Beck Depression Inventory
The BDI is a widely used, well-validated, 21-item clinical instrument that measures cognitive, affective and physiological factors to assess severity of depression. The 21 items rate, from 0 to 3, different symptoms and attitudes found in clinical practice to be associated with depression, for example, pessimism, guilt, suicidal ideas, social withdrawal, insomnia, crying, irritability etc. In the last four decades the BDI has been validated on many psychiatric and normative populations [19]. The original form of the instrument [20] was revised with the aim of increasing agreement with DSM criteria, simplifying the response options and removing double-negative statements [21]. This revised BDI used here has been applied extensively in clinical settings and its psychometric properties have been well characterized. Internal consistency estimates (Cronbach's α) range from 0.79 to 0.9. Good content, construct and criterion validity are also reported [19]. More recently, the BDI has been revised once more and is published as the BDI-II [22].
Demographics
At clinical assessment, participants also completed a questionnaire covering depressive history and demographic information.
Statistical analysis
Frequency data were assessed by χ2 tests. Parameter estimates are given with 95% confidence intervals (CIs). Logistic regression models were fitted to data and regression coefficients (i.e. β =log[OR]) expressed in terms of ORs (i.e. exp[β]). Receiver operating characteristic (ROC) curves were generated with BDI and EPDS data and areas under curves (AUCs), then compared statistically using the U-test method of De Long et al. [23]. Calculations were performed in SAS, Analyse-it and SPSS.
Results
Screening
Despite resource constraints on Maternal and Child Health Nurses (MCHNs), compliance with screening was generally high with MCHNs being able to implement the screening procedure with more than 80% of all new mothers (M. Van Gemert, Personal communication). In total, 4148 women were screened. No complaints were received and a short telephone survey confirmed the acceptability of the screening process. Of those screened, 533 had EPDS scores ≥12. The mean EPDS score of these 533 women was 16.6 (CI=16.3–16.9). Thus, as defined by the EPDS score, the overall prevalence of PND at screening was 12.85% (CI=1.8–13.9).
Participant characteristics
Of the 533 participants who scored ≥12 on the first EPDS at screening, 344 had a structured clinical assessment using DSM-IV criteria, then completed both a second EPDS and the BDI. This group had an average EPDS score at screening of 16.97 (CI=16.6–17.3), while the 189 women who declined or were unable to consult clinical assessment had a mean score of 16.1 (CI=15.6–16.7). Although a significance test seems unnecessary, it seems logical that this difference of less than one point (significant or not) is due to lower uptake of clinical assessment among lower-scoring women. Demographic data revealed the following: of those assessed 41.5% were first-time mothers; English was the first language of 97.4%;most (59%) had completed 6 years of secondary schooling; 30% had higher education; 51% of infants were male; mean age of infants was 17.9 weeks (CI=16.8–19);mean age ofmothers was 30.1 years (CI=29.5–30.7); 88% were married or in de facto relationships, 12% solitary and 86% of partners had full-time work. Average income in $A×103 was 41.4 (CI=39.4–43.1). Self-reported history of depression during pregnancy was strongly predictive of current depression (OR=4.52, CI=2.0–10.2).
Positive predictive value of the Edinburgh Postnatal Depression Scale as a communitywide screening tool
We calculated the positive predictive value (PPV) of the initial EPDS, administered by MCHNs, according to the standard formula:
Properties of the Edinburgh Postnatal Depression Scale and the Beck Depression Inventory at clinical assessment (post-screening)
Clinical profile
Diagnostic profile of women scoring ≥ 12 on the Edinburgh Postnatal Depression Scale (EPDS)
† Coded as ‘all depressions’ for purposes of binary logistic regression, see text. Tabulated values are mean Edinburgh Postnatal Depression Scale and mean Beck Depression Inventory (BDI) scores, cross-tabulated by DSM-IV diagnostic category. NOS, not otherwise specified.
Diagnostic accuracy
To formally compare the diagnostic accuracies of the EPDS and BDI in the prescreened population of 344 women, we collapsed all the categories into a binary variable coded either as ‘all depressions’ combined or as ‘all other cases’ combined. Separate logistic regression models using EPDS scores andBDI scores as the independent variables were fitted to these data. All regression coefficients were significant (p<0.0001 in all cases) and were transformed to ORs (exp[β]). In terms of predicting depression per se, the second EPDS and BDI administered at clinical assessment performed similarly (EPDS: exp[β]=1.41, CI=1.30–1.53; BDI: exp[β]=1.42, CI=1.31–1.54). Thus, for a one-point increase in either independent variable, the odds in favour of a clinical diagnosis that included depression rose by a factor of approximately 1.4.
Receiver operating characteristic curves were generated for both instruments administered at psychiatric assessment (23] provide a method for deriving a Mann–Whitney U-statistic to test the hypothesis that the AUCs of two ROC curves are non-identical. Applying this method we found that the difference in AUCs of 6.5% was statistically significant (p=0.0015). Since realistic numerical values for the relative costs of false-positive and false-negative results are unavailable (and complex to estimate) formal mathematical solutions to the problem of locating the optimal operating point for each instrument are not possible [24], [25]. The optimal ranges of cut-off values for both instruments were therefore found empirically by visual examination of ROC curve coordinates (Table 2). Rather than applying overall diagnostic accuracy (proportion of cases classified correctly) as the optimality criterion, we looked for the best balances of sensitivity and specificity for each tool. In this population, optimal performance (on the assumption that both high sensitivity and high specificity are desirable) appears to occur coincidentally at values of approximately 12.5 points for both the EPDS and the BDI.
Receiver operating characteristic curves for the Edinburgh Postnatal Depression Scale (EPDS) and the BDI. Numerals indicate positions of optimal cut-off values on the BDI (solid line) and EPDS (broken line).
Range of ‘optimal’ cut-off values for the Beck Depression Inventory (BDI) and the Edinburgh Postnatal Depression Scale (EPDS)
Tabulated values are based on scores, recorded at formal clinical assessment, of the 344 women who scored ≥12 on the EPDS at initial screening.
Maximizing diagnostic efficiency
Given that, as a single diagnostic tool, the BDI performed better than the EPDS, we next asked whether the EPDS might be used to complement the BDI. Sequential logistic regression models were fitted, first with the BDI as the sole predictor variable yielding a deviance statistic (−2log L1) of 213.341. The introduction of the EPDS into the model did not result in a significant change in deviance (−2[log L1−log L2]=−1.333, Wald χ2, df=1, NS). Thus, the logistic regression models indicate that the EPDS adds no non-redundant information over and above that provided by the BDI.
To confirm and better characterize this finding we used methods for combining multiple screening instruments to maximize screening efficiency, which have been used to find optimal rules for the combination of other mental health screening instruments [26–28]. The nature of these combinations may be logical compensatory (the ‘Or’ rule), logical conjunctive (the ‘And’ rule) or they may represent a weighted sum of the various instruments, derived by solution of a logistic regression equation; in our case, this weighted sum regression equation is:
For our own data, the areas under the BDI ROC curve and under the weighted sum ROC curve (AUC=0.931) are not significantly different (p=0.745) and so a weighted sum rule would have performed no better than a simple 12.5 BDI cut-off in this population. Similarly, when we apply either compensatory or conjunctive rules to the EPDS and BDI, by holding the spectrum of possible cut-offs constant on one instrument while varying the cut-off value of the other and vice versa, we find no combinatory rule that clearly outperforms a simple 12.5 BDI cut-off value in predicting DSM-IV diagnosis in this population.
Therefore, as a diagnostic tool, the BDI has a higher diagnostic efficiency than the EPDS in the prescreened population and the addition of the EPDS does not provide significantly better diagnostic power.
Are two Edinburgh Postnatal Depression Scale administrations better than one?
Finally, we want to know if repeat-screening is of clinical value. Although our design does not allow a definitive hypothesis test, we can begin to gauge the utility of repeated EPDS administrations by a χ2 analysis among those 344 women who scored ≥12 at screening. Of these, the 270 women who subsequently scored above cut-off for a second time at clinical assessment were significantly more likely to receive a DSM-IV diagnosis of depression than those (74) who scored below cut-off (χ2 =95.1, df=1, p<0.001). Thus, a repeatscreen does add new information regarding the current likelihood of depression.
The percentage of major depressions (67.8%) was higher in this group. These results must to be interpreted cautiously with regard to the following points. The first EPDS was taken at a routine visit to a MCHN and the second in a hospital where women knew a psychologist was actively assessing them for depression. Also, a second score ≥12 may be expected to be more closely associated with a DSM-IV diagnosis of depression if that score is available to the diagnosing clinician. In any case, it seems intuitively obvious that an EPDS score that is taken on the same day the diagnosis is made should be more closely associated with that diagnosis than an EPDS taken some weeks previously.
Discussion
This study did not aim to validate the EPDS, but quantified its utility in facilitating the identification of PND in a routine primary care setting using a large, demographically representative Australian sample. Under these reallife conditions, the conditions in which any universal screening program must operate, our results confirm previous findings that the EPDS is a simple, safe and precise screening test. At a cut-off of 11.5, administered in routine primary care it correctly identifies approximately 76% of women scoring above cut-off as having depressive disorders. However, the study has some limitations. For example, we gathered no information pertaining to the false-negative rate of the EPDS at the first step of our screening process. In terms of absolute numbers, the false-negative rate is far less worrisome than the situation that currently pertains in many places (i.e. at present, much larger numbers of women go undiagnosed due to the complete absence of screening) which surely represents a worse scenario. As long as the EPDS is regarded as a screening instrument, falsenegatives should not indicate that a woman is not depressed but rather that those scoring positive require definite follow-up assessment. Also, we have little data on those 189 women who declined psychiatric assessment. Although our results indicate that non-uptake was not due to more severe depression, it remains possible that those not assessed differed in some important way from those who were assessed. Almost two decades after the development of the EPDS, these quite basic issues remain to be decisively dealt with by a sufficiently large validation study based on the general postnatal population.
We draw the following conclusions. First, universal screening with the EPDS as part of standard care is feasible. Second, the diagnostic value of the EPDS as a screening tool is adequate, such that of those women who continued to have a clinical assessment, three out of four were diagnosed with depression. This provides a useful (and broadly positive) perspective regarding some concerns over the adequacy of the EPDS [7]. It is also worthy of note that in terms of identifying psychiatric caseness per se the EPDS at a cut-off of 11.5 had a PPV of approximately 89% – only 39 of 344 women were not diagnosed with a DSM-IV disorder of some kind. Third, in a representative cross-section of the population the prevalence of PND as measured by EPDS scores ≥12 is 12.8%. This is virtually identical to an average of 13% reported from a meta-analysis of 59 original studies [29]. Fourth, in a prescreened population, the BDI is actually a better instrument than the EPDS for identifying PND. At face value this finding stands in apparent contradiction to the results of some previous studies [30], [31] and may in part reflect the fact that a diagnostic test's PPV, which is itself a function of population prevalence, is expected to be higher in prescreened populations. The purpose of screening is, after all, to better define a target population in which diagnostic tests can perform with increased power. However, approaches that search for points of optimal performance across the whole ROC curve are likely to be more informative than those that compare different instruments at a fixed number of ‘recommended’ cutoff values [32], [33]. This empirical, evidence-based approach to diagnostic test comparison complements the ongoing need for regular recalibration of optimal operating points to ensure the best possible performance by individual instruments in different populations and under different sets of circumstances. It is unrealistic to expect that one particular cut-off value on the EPDS (or any other screening test) can provide a universal diagnostic decision threshold that will remain valid in all populations for all time.
Finally, the EPDS appears to be acceptable to Australian women and to MCHNs as a screening instrument in routine primary care – out of 4148 administrations over 3 years no complaints were reported and screening was integrated into existing practice with relative ease. Although Shakespeare et al. [8] report a less positive experience in a small, purposive sample of British women (n=39) this may be more a reflection of differences in practices surrounding the wider screening process rather than of the EPDS itself. Obviously, this is an important issue and one that can only be resolved by further research.
Facilitating correct diagnosis in a quick and simple manner would be especially useful in primary care settings. As pointed out by Cox and Holden [16], repeated high scores on the EPDS at successive health visits should alert professionals to an increased likelihood of an enduring depressive disorder. Our own results are consistent with this observation and as a rule of thumb it has the dual virtues of being both simple to apply and of making intuitive sense. However, if the aim is to correctly identify the highest proportion of cases as quickly and as simply as possible, then a simple, two-step screening strategy suggested by our own results is:
Administer the EPDS as a screening instrument on a community-wide basis, using a cut-off pre-optimized with reference to the population being screened (in our case a value of 12.5 may have performed better than the 11.5 cut-off we actually used). Administer the BDI (again using a pre-optimized cutoff, in retrospect approximately 12.5 in our case) to those scoring above EPDS cut-off.
Following identification of positive cases by these procedures, further discussion/support and outlining of possible options for a management plan and/or formal diagnosis by a suitable health professional are now appropriate. In our own data, a BDI score ≥13 at this point would indicate more than a 9 in 10 chance of a DSM-IV depressive disorder of some kind (from Table 1, PPV=92%). It should also be noted that while the EPDS is freely available, the BDI is commercially produced, so that cost is an issue when choosing between the two instruments in practice.
In conclusion, our results suggest that two-step screening can improve efficiency and that given the right combination of resources, tools, training and organization, the integration of communitywide screening into routine primary care through M&CHCs is a viable proposition.
Footnotes
Acknowledgements
This work was funded by Research & Development Grants Advisory Committee/NHMRC Public Health and Austin Hospital Medical Research Foundation. We thank the women and nurses of Melbourne, in particular Marita van Gemert, who made the study possible. Thanks also to the staff of the Department of Clinical and Health Psychology, Austin Health. Melina Ramp, Caroline De Paola and Kate Neilson assisted with statistical analysis and with literature reviews.
