Sage Journals: Discover world-class research

Abstract

Objective: To further examine the screening capabilities (sensitivity, specificity, etc.) of the 12-item Somatic and Psychological Health Report (SPHERE) questionnaire.

Method: A more extensive examination was carried out on data presented in a recent report.

Results: The sensitivities of the questionnaire, scored in three different ways, ranged from 100% to 47%. Specificities ranged from 18% to 72%. As a global screen for mental disorder it had a very high false-positive rate, with, in one sample, 83% of patients screening positive while only 27% had a current psychiatric diagnosis, and in the other sample 55% screened positive with only 13% having a current psychiatric diagnosis. The PSYCH-6 scale used by itself had similar properties to the 30-item General Health Questionnaire from which it was derived. The addition of the scale measuring fatigue (SOMA-6) increased or decreased specificity depending on how it was combined with the PSYCH-6 scale.

Conclusion: The evidence is insufficient to recommend the 12-item SPHERE, in its current form, as a screening instrument for DSM-IV mental disorders in general practice, as the specificity is inadequate. Ways of raising the threshold for caseness need to be explored. The argument for adding a measure of fatigue to a general screening measure is not supported.

Keywords

depression general practice mental disorders primary care screening

Depression is now rightly being recognized as a significant public health problem, with studies in Australia [1], and elsewhere, drawing attention to its underrecognition and under-treatment. Mechanisms for screening for latent depression and other common mental disorders are potentially useful in association with appropriate treatment and monitoring programmes [2]. It is therefore of interest to read of the development of the 12-item SPHERE screening instrument [3], with claimed sensitivity of 93% and better performance than the 30-item General Health Questionnaire (GHQ) [4].

However, nowhere in the original report is the specificity recorded, even though a large percentage of patients screened positive – more than the expected prevalence rate of the common mental disorders. We therefore sought to re-examine the data.

Method

Using data presented in Box 5 of the original publication [2] 2 · 2 tables (screen ±, diagnosis yes/no) were constructed for a variety of comparisons. The DAG_Stat computer program [5] was used to calculate sensitivity, specificity, positive predictive power, negative predictive power and overall efficiency. These and other relevant terms are defined by Baldessarini et al. [6], and Clarke and McKenzie [7]. Calculations of exact confidence intervals for sensitivity and specificity are as described by McKenzie et al. [8].

The SPHERE instrument has two components; one scale of 6 items derived from the GHQ which measures aspects of depression and anxiety (PSYCH-6), and another scale of 6 items which measures fatigue (SOMA-6). As a screening instrument these different components can be combined to require a positive result in both (PSYCH and SOMA) or either (PSYCH or SOMA). The former is a narrower criterion (termed Level 1), the latter a broader one (termed Level 2). We also examined the performance of PSYCH-6 alone. Psychiatric diagnoses were made using the CIDI-Auto [9], a computer-operated enquiry based on the Composite International Diagnostic Interview (CIDI) [10].

Results

In the first analysis, using ‘total current’ diagnosis in the (n= 164) sample, the broadest method of screen (PSYCH or SOMA) was used. The results appear in Table 1.

Table 1.

2 × 2 table and derived screening characteristics for SPHERE (PSYCH or SOMA)

Eighty-three per cent of patients screened positive. Twenty-seven per cent of patients received a psychiatric diagnosis, most of these screening positive (sensitivity 93%). However, the likelihood of a person screening positive being a true case was only 30%. The majority of people screening positive were non-cases. This procedure was repeated using the narrower screen (PSYCH and SOMA; screen positive rate 32%), and also using PSYCH alone(screen positive rate 67%) against ‘any current diagnosis’, and against major depression.

For ‘any diagnosis’ (see Table 2), specificity rose to 37% using the less broad PSYCH alone, and to 72% for the narrowest screen of PSYCH and SOMA; sensitivity dropped reciprocally to 78% and 47%, respectively. The overall efficiencies for the three levels of screening were therefore 40%, 48% and 65%. The parameters for the GHQ-30 are also shown in Table 2, being very similar to the PSYCH alone screen.

Table 2.

Comparison of Somatic and Psychological Health Report screening for ‘any current diagnosis’ (termed ‘total current’ in the original article)

For major depression (Table 3), the CIDI diagnosis rate was 9.8% – much lower than for ‘any diagnosis’, as expected. The screen positive rates were as before (83%, 67%, and 32%, respectively). Consequently, the positive predictive power and the specificity were also much reduced. The narrower screen (PSYCH and SOMA) had the greatest overall efficiency.

Table 3.

Comparison of Somatic and Psychological Health Report screening for current major depression

Similar analyses were additionally done on the data from the (n= 364) sample, also presented in Box 5 of the original article, where the prevalence of any mental disorder was 12.6%. Without going into full detail, we report that, for the different levels of screening, the sensitivity ranged from 84% to 54%, specificity from 48% to 78%, overall efficiency from 53% to 75%, positive predictive power from 19% to 27%, and the screen positive rate from 55% to 25%. Using the overall efficiency as a guide, these results are better than in the previous sample.

Discussion

Screening instruments are potentially useful, although their usefulness in psychiatry is not fully proven, and they can be imprecise, if not clumsy, instruments [11]. Generalizing from the above data, it would appear such instruments are better at screening for broader concepts than narrower ones. However, this may not be so. Using various versions of the GHQ, we have previously shown that the overall efficiency of a screening instrument increases with the narrower the definition of the disorder (for instance, major depression vs any form of depression) as long as the threshold for caseness is raised [12].

With respect to the SPHERE, a number of issues arise from the data. The first is the usefulness of a screening instrument for which 80% of the population screen positive, of which only 30% have the disorder. Of course, this may have been an unusual sample, and in the larger samples reported in the original article [3] the screen positive rates are lower, between 45% and 50%. It may also be that the most appropriate threshold for this population has not been used and that different populations require different thresholds. These possibilities were not examined. Second, the results suggest that there is no advantage in including the fatigue component – at least when joined with an or in which case it makes for a very broad screen. The inclusion of the fatigue component may have an advantage when combined with an and in making the screen more specific, although this effect might probably be equally gained by raising the threshold of the PSYCH scale.

Finally, the data here does not support the proposition that the SPHERE is more efficient than the GHQ-30. The parameters for the GHQ (used on the n= 164 sample only) were almost identical to those when its derivative, the PSYCH scale, was used alone. This is not surprising. The 12-item version of the GHQ was also used in the Australian National Survey of Mental Health and Wellbeing [13], where it had a somewhat similar sensitivity of 75%, though greater specificity of 70% – and a higher sensitivity when scored to take into account chronicity of symptoms [14]. The SPHERE might have the advantage of flexibility whereby, with the fatigue scale added, combined with an or, it becomes more sensitive, and when combined with an and it becomes more specific. However, this effect can be achieved by simply moving the threshold score and does not enhance the overall efficiency of the instrument.

We believe that screening instruments have their place. At present it is debatable whether the SPHERE is a significant advance on other screening instruments, and in its present form, with so many people screening positive, it would appear to be quite the opposite. The principles of screening are that it should be low cost, low risk and of likely benefit [2]. Labelling a significant number of people who are not depressed as ‘probably depressed’ might reasonably be considered a potential harm. We do not want to replace a situation of underrecognition with one of over-recognition, neither being of benefit to the patient.

References

McLennan

. Mental health and wellbeing: profile of adults, Australia, 1997. Australian Bureau of Statistics, Canberra 1998.

Mitchell

Irwig

Principles behind practice: screening as a strategy for disease control

Medical Journal of Australia 1991 155: 237–242.

Hickie

I B

Davenport

T A

Hadzi-Pavlovic

Development of a simple screening tool for common mental disorders in general practice

Medical Journal of Australia 2001 175 (Suppl.) S10–S17.

Goldberg

D P

. The detection of psychiatric illness by questionnaire. Oxford University Press, London 1972, Maudsley Monograph 21.

Mackinnon

A J

A spreadsheet for the calculation of comprehensive statistics for the assessment of diagnostic tests and inter-rater agreement

Computers in Biology and Medicine 2000 30: 127–134, DAG_Stat available from URL: http://www.mhri.edu.au/biostats/DAG_Stat .

Baldessarini

R J

Finklestein

Arana

G W

The predictive power of diagnostic tests and the effect of prevalence of illness

Archives of General Psychiatry 1983 40: 569–573.

Clarke

D M

McKenzie

D P

Screening for psychiatric morbidity in the general hospital: methods for comparing the validity of different screening instruments

International Journal of Methods in Psychiatric Research 1991 1: 79–87, (Erratum published 1996: 6: 175.).

McKenzie

D P

Vida

Mackinnon

A J

Onghena

Clarke

D M

Accurate confidence intervals for measures of test performance

Psychiatry Research 1997 69: 207–209.

World Health Organization Collaborating Centre for Mental Health and Substance Abuse . Composite International Diagnostic Interview: CIDI-Auto. World Health Organization, Geneva 1997, Version 2.1. [Computer Program].

10.

Wittchen

H U

Robins

L N

Cottler

L B

Sartorius

Burke

J D

Regier

Cross-cultural feasibility, reliability and sources of variance of the Composite International Diagnostic Interview (CIDI)

British Journal of Psychiatry 1991 159: 645–653.

11.

Clarke

D M

McKenzie

D P

A caution on the use of cut-points applied to screening instruments or diagnostic criteria

Journal of Psychiatric Research 1994 28: 185–188.

12.

Clarke

D M

Smith

G C

Herrman

H E

A comparative study of screening instruments for mental disorders in general hospital patients

International Journal of Psychiatry in Medicine 1993 23: 323–337.

13.

Australian Bureau of Statistics . Mental health and wellbeing profile of adults, Australia 1997. Australian Government Publishing Service, Canberra 1998, ABS Cat. no. 4326.0.

14.

Donath

The validity of the 12-item General Health Questionnaire in Australia: a comparison between three scoring methods

Australian and New Zealand Journal of Psychiatry 2001 35: 231–235.

An Examination of the Efficiency of the 12-Item SPHERE Questionnaire as a Screening Instrument for Common Mental Disorders in Primary Care

Abstract

Keywords

Method

Results

Discussion

References