Sage Journals: Discover world-class research

Abstract

Objective: To investigate the specificity and sensitivity of three different scoring methods of the 12-item General Health Questionnaire (GHQ-12) and hence to determine the best GHQ-12 threshold score for the detection of mental illness in community settings in Australia.

Method: Secondary data analysis of the 1997 Australian National Survey of Health and Wellbeing (n = 10 641), using the Composite International Diagnostic Interview as the gold standard for diagnosis of mental illness.

Results: The area under the Receiver Operating Characteristic (ROC) curve for the C-GHQ scoring method was 0.84 (95% CI = 0.83–0.86) compared with the area for the standard scoring method of 0.78 (95% CI = 0.76–0.80). The best threshold with C-GHQ was 3/4, with sensitivity 82.9% (95% CI = 80.2–85.5%) and specificity 69.0% (95% CI = 68.6–69.4%). The best threshold score with the standard scoring method was 0/1, with sensitivity 75.4% (95% CI = 72.5–78.4%) and specificity 69.9% (95% CI = 69.5–70.3%). These were also the best thresholds for a subsample of the population who had consulted a health practitioner in the previous 4 weeks.

Conclusion: In the Australian setting, the C-GHQ scoring method is preferable to the standard method of scoring the GHQ-12. In Australia the GHQ-12 appears to be a less useful instrument for detecting mental illness than in many other countries.

Keywords

mental disorders psychiatric states rating scales ROC curve Australia psychometrics

The General Health Questionnaire (GHQ) was developed to detect minor psychiatric illness in the community; it is designed to ‘differentiate psychiatric patients as a class from non-cases as a class’ [1], p.5]. Since its introduction in the 1970s, the GHQ has become one of the principle self-report questionnaires used to measure non-psychotic mental illness in the community and in general practice.

Originally a 60-item questionnaire, there are now 30-, 28-, 20- and 12-item versions [1]. The shorter versions are often preferred as they are quicker to administer. Respondents to the GHQ rate themselves according to the degree to which they have experienced each symptom over the past few weeks. Each item has four response categories. The standard method of scoring the GHQ is a binary method; symptomatic responses to each item are scored ‘1’ and summed over the items [1]. This can be characterized as the (0-0-1-1) method, as adjacent response categories are collapsed. For the GHQ-12, this method results in a score ranging from 0 to 12.

In addition to the standard scoring method, the GHQ can also be scored using a four-point Likert scoring method where scores of 0–3 are assigned to each item response and then summed across items, giving a score ranging from 0 to 36 for the GHQ-12 (0-1-2-3 method) [1]. Another scoring method, devised by Goodchild and Duncan-Jones [2], attempts to overcome the assumed low sensitivity to chronic disorders of the standard scoring method. The positive items are scored in the conventional binary method, but the negative items are scored 0-1-1-1, thus assuming that the ‘no more than usual’ answer to negative questions indicates the presence of a chronic problem rather than good health. This is generally referred to as the C-GHQ scoring method. In all three scoring methods, higher scores indicate an increased likelihood of psychological distress.

For each version of the GHQ, an empirically determined threshold score indicates the likelihood of psychiatric illness. There is a trade-off between sensitivity and specificity, with higher thresholds giving higher specificity, but lower sensitivity. The optimal threshold is that which gives the best combination of sensitivity and specificity.

Goldberg et al.'s study of the GHQ-12 in 15 cities around the world found that, for a given threshold value, there were considerable variations in sensitivity and specificity. In some cities, the best GHQ-12 threshold was 1/2, in others it was 2/3, in still others, 3/4, and, in one centre, 6/7 [3]. Previous studies have found similar variation [3]. The reasons for these differences are not clear.

As there is such variation in the optimal threshold, it is important to determine the most appropriate threshold for use in Australia. An Index Medicus/Medline and PsycLIT search revealed only one Australian study which has investigated the threshold value of the GHQ-12 [4]. This was a small study (n = 120) conducted in two general practices in Sydney. There is evidence that the sensitivity (but not the specificity) of the GHQ is on average considerably higher in primary care settings than in community settings [1]. Although several studies have used the GHQ-12 in community settings in Australia [5–11], there appears to be no Australian evidence on the best threshold to use in this situation.

This study uses unpublished confidentialized unit record file (CURF) data from the Australian Bureau of Statistics (ABS) 1997 National Mental Health Survey [12, 13] to investigate the sensitivity and specificity, and hence the optimal threshold values, of the GHQ-12 for the three different scoring methods.

Method

Data source

The 1997 National Mental Health Survey was conducted on a representative sample of residents of private dwellings in all States and Territories. The relevant ABS publications provide a detailed description of the sampling method [12, 13]. The sample excluded special dwellings such as hospitals, institutions, nursing homes and hostels, and dwellings in remote and sparsely settled parts of Australia. The response rate was 78%, yielding a sample size of 10 641 persons (4705 men, 5936 women) aged 18 and over [14]. The survey was conducted using face-to-face interviews.

Instruments

The survey included the GHQ-12 as a standalone questionnaire, and the Composite International Diagnostic Interview (CIDI), a comprehensive interview which can be used to assess current and lifetime prevalence of mental disorders in adults [13]. The CIDI enables the diagnosis of mental disorders based on either the International Classification of Diseases, 10th revision (ICD-10) [15], or the Diagnostic and Statistical Manual of Mental Disorders, 4th revision (DSM-IV) [16]. To facilitate comparison with Goldberg et al. [3], this study presents results using the diagnoses according to ICD-10. The conditions included were affective disorders (mania; hypomania; mild, moderate and severe depression; bipolar affective disorder; dysthymia), anxiety disorders (panic disorder, agoraphobia, social phobia, generalized anxiety disorder, obsessive–compulsive disorders, posttraumatic stress disorder) and neurasthenia, but not alcohol/drug dependence or harmful use. Respondents were classified as having a mental disorder if they were diagnosed as having one or more conditions during the previous 4 weeks.

Analysis

Data were analysed using SPSS (SPSS, Chicago, IL, USA) and Microsoft Excel.

Using the diagnosis from the CIDI as a gold standard, the sensitivity and specificity of a range of thresholds for each of the scoring methods were calculated. Receiver operating characteristic (ROC) curves were derived for each scoring method. Receiver operating characteristic analysis is a technique which enables comparison of the performance of two or more screening tests or scoring methods. A ROC curve is obtained by plotting sensitivity against the false positive rate for all possible cut-off points of the screening instrument. The area under the curve provides a summary measure of the ability of the instrument or scoring method to discriminate between cases and noncases. A ROC area equal to 0.5 is obtained when the discriminatory ability of the screening instrument is no better than chance; a value of 1.0 stands for perfect discriminatory ability [1].

Results were obtained both for the entire sample, and also for a ‘clinical’ subsample consisting of persons who had consulted a doctor or other health practitioner for any reason in the previous 4 weeks.

Using the weighting factors and method described by the ABS [13], results were adjusted to ensure that they represented as far as possible the adult Australian population. Confidence intervals for proportions and percentages were estimated using the relative standard errors provided by ABS [13] pp. 74–77]. Since the ABS does not provide estimates of standard errors for means, confidence intervals for means were estimated using standard formulae [17] with the weighting factors scaled so that they summed to the sample size.

Results

Based on the CIDI, 7.3% (95% CI = 6.9–7.6%) of the population were diagnosed with a mental illness; 8.9% (95% CI = 8.2–9.5%) of women were diagnosed with a mental illness compared with 7.3% (95% CI = 6.9–7.6%) of men. Of those people who had consulted a health practitioner in the previous 4 weeks, 11.0% (95% CI = 10.3–11.7%) were diagnosed with a mental illness. In this ‘clinical’ population, 12.5% (95% CI = 11.4–13.5%) of women and 9.2% (95% CI = 8.1–10.3%) of men were diagnosed with a mental illness.

On average, women had higher GHQ-12 scores than men, and the average scores of those in the ‘clinical’ subsample were higher than the average scores for the total sample (Table 1). Using the standard scoring method, 66.6% of the total population (69.2% of men, 64.0% of women) and 58.7% of the ‘clinical’ population (61.7% of men, 56.3% of women) scored zero.

Table 1.

Mean scores for standard, Likert and C-GHQ methods of scoring the GHQ-12, Australia, 1978 and 1997

The results in Table 2 and Fig. 1 indicate the trade-off between sensitivity and specificity using different threshold values of the GHQ-12. For a given specificity, the C-GHQ scoring method generally produces the highest sensitivity, followed by the Likert and then the standard scoring method.

Figure 1.

Receiver operating characteristic (ROC) curves for the three GHQ-12 scoring methods. □, standard GHQ scoring; ▴, Likert GHQ scoring; ○, C-GHQ scoring.

Table 2.

Sensitivity and specificity for selected threshold scores for standard, Likert and C-GHQ scoring methods, Australia, 1997

The analyses were repeated for males and females separately and for the ‘clinical’ subsample. For a given threshold score, sensitivity and specificity were higher for males than for females with all scoring methods, the differences averaging around 4%. In the ‘clinical’ subsample, for a given threshold score, sensitivity was higher and specificity was lower than in the total sample by 3–4%, for all scoring methods. (Details of sensitivity and specificity for selected threshold scores, standard, Likert and C-GHQ scoring methods, Australia 1997, for all populations are available from the corresponding author on request.)

As indicated in Table 3, for both the total sample and the ‘clinical’ subsample, the areas under the ROC curves were slightly higher for males than for females, but the differences were generally not statistically significant. Comparing the total sample and the ‘clinical’ subsample, there was no difference in the areas under the ROC curves. In all groups, the area under the ROC curve was greater for the C-GHQ scoring method than for the standard scoring method.

Table 3.

Areas under ROC curve for different scoring methods

Discussion

Based on the ROC analysis, the C-GHQ scoring method provides better discrimination between those with and without a mental illness than either the Likert or the standard scoring methods. This is in contrast to other studies which have found little or no difference between scoring methods [3, 18]. With this scoring method, the best trade-off between specificity and sensitivity is given by a threshold of 3/4, both in the total sample and in the ‘clinical’ subsample.

Tennant's validity study of the GHQ [4] used a disembedded version of the GHQ-12 (that is, the GHQ-60 was the actual questionnaire used in the study, and the GHQ-12 questions were extracted from the longer questionnaire). There is evidence that disembedded versions of the GHQ give different optimal thresholds from those obtained using the corresponding standalone version of the GHQ [19]. In general practice patients in Sydney, Tennant found sensitivity of 0.87 and specificity of 0.91 for a threshold of 1/2 using the standard scoring method. Confidence intervals were not reported, but based on other reported information these can be estimated as being between 0.75 and 0.99 for sensitivity and 0.85 and 0.97 for specificity.

In the present study, with the standard scoring method, the best trade-off between sensitivity and specificity is given by a threshold of 0/1. Australian studies have used thresholds of 1/2 [5–9,20], 2/3 [21] or 3/4 [10, 11, 22], but the results from this study suggest that the sensitivity of thresholds higher than 0/1 is unacceptably low. Even in the group with the highest sensitivity for a given threshold score (males in the ‘clinical’ subsample) sensitivity using a threshold 1/2 was only 66.4%.

Using the standard scoring method, most studies have found the optimal threshold to be 1/2 or 2/3 [3, 23], although 0/1 has been found in at least one other study [18]. It has been suggested that the mean GHQ score for the whole population of respondents provides a rough guide to the best threshold, so that populations with low average GHQ scores will generally have lower threshold scores [23].

Goldberg et al. found mean scores ranging from 1.09 to 3.66, with a majority of the 15 centres in the study reporting mean scores above 2 [3]. The Australian mean scores of 0.93 for the total sample and 1.28 in the ‘clinical’ subsample therefore appear to be low compared with mean scores found elsewhere. However, these low scores seem to be characteristic of the Australian population, as the mean GHQ-12 scores in the 1978 National Health Survey were very similar (Table 1).

In general, it appears that the higher the best threshold on the GHQ, the greater the area under the ROC curve, and hence, the greater the discriminatory power of the GHQ [23]. In this study, both the areas under the ROC curves and the sensitivity and specificity of the optimal threshold were lower than in most of the 15 centres studied by Goldberg et al. [3]. Thus, the evidence from this study suggests that in Australia the GHQ-12 is a less useful instrument for detecting mental illness than in many other countries.

References

1. Goldberg

Williams

. A user's guide to the General Health Questionnaire. NFER-Nelson, Windsor 1991.

2. Goodchild

M E

Duncan-Jones

. Chronicity and the General Health Questionnaire. British Journal of Psychiatry 1985; 146: 55–61.

3. Goldberg

D P

Gater

Sartorius

. The validity of two versions of the GHQ in the WHO study of mental illness in general health care. Psychological Medicine 1997; 27: 191–197.

4. Tennant

. The general health questionnaire: a valid index of psychological impairment in Australian populations. Medical Journal of Australia 1977; 2: 392–394.

5. McDonald

Vechi

Bowman

Sanson-Fisher

. Mental health status of a Latin American community in New South Wales. Australian and New Zealand Journal of Psychiatry 1996; 30: 457–462.

6. Brown

W J

Alexander

McDonald

Mills-Evers

. The health of Filipinas in the Hunter region. Australian and New Zealand Journal of Public Health 1997; 21: 214–216.

7. McFarlane

A C

. Life events and psychiatric disorder: the role of a natural disaster. British Journal of Psychiatry 1987; 151: 362–367.

8. Rickwood

d'Espaignet

E T

. Psychological distress among older adolescents and young adults in Australia. Australian and New Zealand Journal of Public Health 1996; 20: 83–86.

9. Morrell

Taylor

Quine

Kerr

Western

. A cohort study of unemployment as a cause of psychological disturbance in Australian youth. Social Science and Medicine 1994; 38: 1553–1564.

10.

10. Carr

V J

Lewin

T J

Kenardy

J A

. Psychosocial sequelae of the 1989 Newcastle earthquake: III. Role of vulnerability factors in post-disaster morbidity. Psychological Medicine 1997; 27: 179–190.

11.

11. Carr

V J

Lewin

T J

Webster

R A

Kenardy

J A

Hazell

P L

Carter

G L

. Psychosocial sequelae of the 1989 Newcastle earthquake: II. Exposure and morbidity profiles during the first 2 years post-disaster. Psychological Medicine 1997; 27: 167–178.

12.

12. Australian Bureau of Statistics . Mental health and wellbeing profile of adults, Australia 1997. Australian Government Publishing Service, Canberra 1998, ABS Cat. No. 4326.0.

13.

13. Australian Bureau of Statistics . National survey of mental health and wellbeing of adults 1997, users’ guide. Australian Government Publishing Service, Canberra 1999, ABS Cat. No. 4327.0.

14.

14. Australian Bureau of Statistics . Information paper, mental health and wellbeing of adults 1997, confidentialised unit record file. Australian Government Publishing Service, Canberra 1998, ABS Cat. No. 4329.0.

15.

15. World Health Organization . The ICD-10 classification of mental and behavioural disorders clinical descriptions and diagnostic guidelines. World Health Organization, Geneva 1992.

16.

16. American Psychiatric Association . Diagnostic and statistical manual of mental disorders. American Psychiatric Association, Washington 1984.

17.

17. Armitage

Berry

. Statistical methods in medical research. Blackwell, Oxford 1994.

18.

18. Gureje

Obikoya

. The GHQ-12 as a screening tool in a primary care setting. Social Psychiatry and Psychiatric Epidemiology 1990; 25: 276–280.

19.

19. van Hemert

A M

den Heijer

Vorstenbosch

Bolk

J H

. Detecting psychiatric disorders in medical practice using the General Health Questionnaire. Why do cut-off scores vary?. Psychological Medicine 1995; 25: 165–170.

20.

20. Singh

Lewin

Raphael

Johnston

Walton

. Minor psychiatric morbidity in a casualty population: identification, attempted intervention and six-month follow-up. Australian and New Zealand Journal of Psychiatry 1987; 21: 231–240.

21.

21. Harris

M F

Silove

Kehag

. Anxiety and depression in general practice patients: prevalence and management. Medical Journal of Australia 1996; 164: 526–529.

22.

22. Schattner

P L

Coman

G J

. The stress of metropolitan general practice. Medical Journal of Australia 1998; 169: 133–137.

23.

23. Goldberg

D P

Oldehinkel

Ormel

. Why GHQ threshold varies from one place to another. Psychological Medicine 1998; 28: 915–921.

The Validity of the 12-Item General Health Questionnaire in Australia: A Comparison Between Three Scoring Methods

Abstract

Keywords

Method

Data source

Instruments

Analysis

Results

Discussion

References