Abstract
The General Health Questionnaire (GHQ) was originally designed as a 60-item screening instrument to provide information on the current mental wellbeing of primary care patients by assessing normal ‘healthy’ functioning and the appearance of new, distressing symptoms [1, 2]. By focusing on breaks in normal functioning it assesses state rather than trait conditions. Since its development in the 1970s, the GHQ has been extensively used and has even been described as ‘quite possibly the best instrument of its kind’ [3], p.59]. It has been translated into a number of languages and has been crossculturally validated with adults as part of a World Health Organization project on mental illness conducted in 15 centres worldwide, covering both developed and developing countries [4]. This latter study reported that the 12-item GHQ, an abridged form of the GHQ, performed well, with an overall sensitivity of 83.4% and specificity of 76.3%. By 1997, the GHQ-12 had been validated in nine countries involving over 4000 second stage, or diagnostic interviews [4]. However, a recent review of the GHQ-12 in Australian adults noted that it was less effective in Australia than in other locations with a sensitivity of 75.4% and specificity of 69.9% [5].
Although the GHQ was designed for use in adult populations, a recent review identified 82 papers where the GHQ had been used with adolescents [6]. An early validation study from the UK indicated that the instrument was appropriate with adolescents. In a community sample of 17-year-olds, the GHQ-28 had a sensitivity of 100% and specificity of 84.5% at the optimum cut-off score compared with a structured clinical interview [7]. However, as with the adult data [5], the screening characteristics of the GHQ with older Australian adolescents suggested that the test was less effective in Australia than elsewhere. Winefield and colleagues used a criterion diagnosis via DSM-III, and found that the GHQ misclassified 45% of males and 36.8% of females [8]. Nevertheless, despite the reservations about the validity of the GHQ as a marker of psychological distress among adolescents, the GHQ continues to be used as an assessment device and as a validation scale for new instruments [9].
Even if the threshold score on the GHQ does not provide clear evidence of psychopathology, the total score may still index psychological distress, with mean scores differing between groups. A large representative Australian sample of nearly 4000 adolescents aged 16–19 returned a mean score on the GHQ-12 of 9.98 [10]. Among a group of young Australians (aged 18–20) those satisfied with their employment had a mean GHQ-12 score of 9.77 (SD = 4.36) compared with 12.26 (SD = 5.17) in those dissatisfied with their employed and 12.21 (SD = 5.24) in the unemployed [11]. Similar values have been reported from data collected in the UK, where unemployed adolescents (aged 16–17) had a mean score on the GHQ-12 of 14.5 compared to 9.8 for those who were employed [12]. A similar pattern has been reported for those with evidence of existing psychological distress. Among a group of young Australian adults (mean age 23.6) who reported ever having suicidal thoughts, the mean GHQ-12 scores ranged from 11.21 (SD = 5.14) in those who were satisfied with their full-time employment to 16 (SD = 6.56) in the dissatisfied. Surprisingly, those who were unemployed had a mean of 13 (6.59), but the three groups contained less than 100 people in total [13]. In light of the findings that the GHQ may be less effective with adolescents in Australia than elsewhere [8] and the continuing use of the GHQ with young people, the primary objectives of this study, were first, to examine the psychometric properties of the GHQ-12 with young adolescents, second, to validate the instrument against other established instruments for this age group, and third, to suggest an appropriate threshold score on the GHQ-12 for Australian adolescents.
Method
Participants
The participants were pupils from 15 classes in six schools drawn from single sex and mixed schools in the state and private system in Perth, Western Australia. Four of the schools were in new suburban areas and two were in affluent city areas. One of the fee-paying schools had a scholarship programme to enable less well-off families access to the school. The response rate was approximately 75%. The classes covered school grades 7–10 (age range 11–15). As part of a larger on-going project, data were also available from adolescents in hospital. These adolescents fell into two categories, those who had presented to hospital emergency departments with problems related to the use of alcohol or other drugs (AOD) and those with problems not related to alcohol or drug use. These respondents completed a questionnaire that included the GHQ as part of the pilot or intervention phase of the larger study. The majority of interviews were conducted in the emergency department but some adolescents were seen after they were admitted to the observation or other wards. These adolescents did not complete the other screening instruments included in the school survey.
Procedure
An information sheet and consent form were distributed in class approximately one week prior to the survey date. Both a parent/guardian and the adolescent had to sign the consent form before the student was eligible for the study. The test booklets were only distributed to those who had returned completed consent forms. Participants were instructed to work on the task alone, and that they could ask the researchers for help if necessary. As the classroom teacher was generally present during data collection, the confidential nature of the study was emphasized to the adolescents (i.e. the questionnaire did not have the respondents name on it so their answers could not be traced to an individual). However, this meant that we could not intervene with those scoring at clinical levels; therefore, the information sheet provided to all participants included contact details for a range of local youth agencies.
At the end of data collection we re-emphasized the importance of seeking help for mental health or other issues. The adolescents interviewed in hospital could consent without a parent/guardian if it was not possible to contact an appropriate adult. After data collection, the adolescents were randomized to receive either standard hospital care or an enhanced care package. Both studies had approval from the relevant university ethics committee; the hospital based study also obtained the relevant institutional approvals.
Instruments
The test booklet contained the following instruments:
1. The GHQ-12, which was scored as a Likert scale (0–3) giving a possible range of 0–36. The scale contains an equal number of positively (i.e. ‘been able to face up to your problems’, ‘felt capable of making decisions about things’) and negatively (i.e. ‘feeling unhappy and depressed’, ‘felt constantly under strain’) worded questions. The wording of the response scale is reversed for the two types of question so that one end of the scale always represents worse outcomes (i.e. positive questions run ‘more so than usual’ to ‘much less able’ and negative questions ‘not at all’ to ‘much more than usual’). Notably, the exact wording of the response scale is not the same for all positive questions.
2. The Rosenberg Self-Esteem Scale (RSE), which was designed for use with adolescents and contains 10-items that are scored as a Likert scale with low scores representing high self-esteem (range 10–40) [14]. The scale has been widely used and has been extensively psychometrically assessed [15]. A study involving a school-based sample of 10–15-year-olds had mean scores of 30 for boys and 29 for girls [16].
3. The Depression Anxiety Stress Scales (DASS-21) contains three, 7-item scales, of which just the stress scale was used in this study. In the development of the scales the lower age limit was 17 years but the test authors suggested that it may be appropriate down to age 12 [17]. Scores on the DASS-21 are multiplied by two, to enable direct comparison with the full-length version of the test (DASS-42). Scores range from 0 to 42 on each scale. Data from a high school sample (mean age 17.6) gave a mean value for the stress scale of 15.6 (SD = 9.3) [18].
4. The Centre for Epidemiological Studies Depression Inventory (CES-D) [19] was developed for use in adult community samples. It has subsequently been validated as a screening tool with young adolescents in America where a threshold of 12 for males and 22 for females was recommended against a structured clinical interview [20]. Scores on the CES-D range from 0 to 60 with a mean of 16.6 (SD = 9.19) reported for a junior high school sample [21].
5. A short-form of the Marlowe-Crowne Social Desirability Scale [22] was used (MCSD) [23] to measure socially desirable responses or ‘faking good’. It contains 13 true–false items of culturally approved behaviours (i.e. ‘I sometimes feel resentful if I do not get my own way’). It has been validated among college students with a mean of 5.67 (SD = 3.2) and the scale has a range of 0–13 [23].
6. The State-Trait Anxiety Inventory (STAI) Form Y1 [24] provides state anxiety scores ranging from 20 to 80, with higher scores representing increasing levels of anxiety. The STAI has been used in more 2000 published studies and normative values are available for high school students with a mean score of 39.45 (SD = 9.74) for males and 40.54 (SD = 12.86) for females [25].
7. The Perceived Self-Efficacy Scale [26] has been adapted for use with adolescents and contains 22 items that are scored on a five-point scale giving a range of 22–110 and provides an index of Generalized Self-Efficacy (GSE). It was validated in Western Australia for 12–16-year-olds with a mean score of 73.18 (SD = 14.42) [27].
8. The Negative Affectivity Scale (NA) measures the tendency of an individual to experience aversive emotional states. The 21-item scale produces scores ranging from 21 to 147. It was designed and validated primarily with undergraduate samples, where the mean scores were approximately 63 (SD = 16.5) [28].
The order of the instruments within the test booklets was randomized in blocks of 20. Spielberger [25] suggests that where there are less than three questions omitted on the STAI, an adjusted mean total score may be calculated as the mean for the items completed, multiplied by the number of questions on the survey and rounded to the next higher whole number. Using this formula, where there were 10% or less missing data on the other instruments, an adjusted total score was calculated. (The DASS, which has less than 10 items was allowed one missing value).
Criterion score
Cut-off scores for the CES-D have been reported for American adolescents [20]. However, an alternative approach to determining an appropriate cut-off score on the CES-D is to adjust the threshold score to reflect the prevalence of clinical depression in the relevant population [21]. A recent national survey in Australia assessed the prevalence of depression at 3% in those aged 6–17 years [29] and a study in Western Australia assessed the prevalence of anxiety and/or depression at 4% in those aged 4–16 [30].
Data analysis
Because of the impact of multivariate outliers on regression analysis [31], prior to the analysis, outliers were identified and excluded using residual scatter plots and Mahalanobis distance at a threshold of p < 0.001.
Results
Demographics
There were 336 participants of whom 186 (55%) were female. The girls (mean 13.0, SD = 1.3) were significantly older than the boys (mean 12.7, SD = 1.1, t = 2.25, df = 333, p < 0.05), but there was not a significant difference in the proportion of males and females by school grade (χ 2 = 2.56, df = 3, n/s). About 23% of the sample were recruited in primary schools with a further 5% in the lowest grade in a middle school. Approximately 17% of boys and 16% of girls were recruited in single sex schools.
There were 62 adolescents interviewed in hospital, of whom 31 had presented with AOD related problems and 31 were not related to AOD use. The AOD group contained 9 boys and 22 girls (mean age 14, SD = 0.89) while the non-AOD group had 14 boys and 17 girls (mean age 14.3, SD = 0.86).
Internal consistency
The alpha reliability coefficients for the GHQ ranged from 0.90 in grade seven to 0.80 in grade nine, with an overall value of 0.88 in the school sample. From the hospital data the alpha level was 0.91 (n = 59). As a comparison, the values cited in the GHQ manual, based predominantly on adults, range from 0.82 to 0.93, with lower values found with the shorter versions of the GHQ [2].
GHQ-12 scores
There were missing data on 6.3% of GHQ forms from those in school grade seven, 5.4% in grade eight, 1.5% in grade nine and 0% in grade 10. Only one form had more than 10% missing data and was excluded. Differences in total scores by gender and grade were assessed via a twoway, between groups ANOVA. The mean scores on each instrument for males and females are provided in Table 1 showing that females had higher scores on the GHQ (F(1,326) 15.0, p < 0.001), RSE, DASS, NA and STAI while males had higher GSE scores. The school grade effect was such that scores on the GHQ (F(3,326) 4.2, p < 0.01), RSE, STAI and CES-D all increased with grade while the score on the MCSD fell with increasing grade (Table 1). Post-hoc Bonferroni tests showed that the grade 10 score on the GHQ was significantly greater than those for grades seven and nine. None of the scales had a significant sex by grade interaction. In addition, there was not a significant main effect of age (F(4, 324) 1.8, n/s) or an age by gender interaction (F(4,324) 0.797, n/s).
Total scores for males and females and by school grade on all instruments
Construct validity
The pattern of Pearson's correlations between the GHQ and the other scales is shown in Table 2, subdivided by gender. The correlations were in the directions that would be predicted, with positive correlations between the GHQ and measures of anxiety, stress, depression and negative affectivity. There was also a positive correlation with selfesteem as high scores on the RSE represent low self-esteem. The GHQ was negatively correlated with self-efficacy. Among male subjects, there was also a weak negative relationship between the GHQ and a socially desirable response set. The size of the intercorrelations were of the same order of magnitude between the other instruments as those found for the GHQ, with the exception of the MCSD, which was either weakly or not significantly correlated with the other scales.
Inter correlations between all the instruments. Values above the diagonal are for females, with males below the diagonal
Regression analysis
Eight multivariate outliers were identified and excluded from these analyses. A scatter plot of residuals and a plot of regression standardized residuals indicated a near normal distribution. Variables were entered into the regression equation simultaneously in a standard analysis. The adjusted R 2 showed that the model accounted for 68% of the variance with self-esteem, stress, depression and anxiety being independent predictors of the total GHQ-12 score (F(9,279) 72.37, p < 0.001). School grade, sex, self-efficacy and social desirability were not significant predictors of GHQ scores. The main contributors of unique variance were the CES-D (3%) and the STAI (3%) (Table 3). The high level of intercorrelations between both the dependent variable and (most of) the independent variables and also between (most of) the independent variables, accounts for the low values for unique variance.
Standard multiple regression of trait measures and demographic features on GHQ-12 scores
Discriminant validity
The GHQ scores from the school pupils were compared with the GHQ scores collected from adolescents in hospital with a one-way ANOVA. The adolescents with AOD problems had significantly higher GHQ scores than both the non-AOD group and the school sample (mean = 15.5 (SD = 8.0) vs 9.48 (SD = 6.9) vs 11.39 (SD = 6.25), F(2,393) 7.33, p < 0.001). The Bonferroni post-hoc tests showed that the GHQ mean for the non-AOD group was not significantly lower than the school group. Because of the gender and grade differences in GHQ scores found in the school data, the analysis was repeated with sex and age as covariates (the hospital samples did not report their school grade). The main effect of group was still significant (F(2,390) 7.89, p < 0.001) and both the covariates were also significant (sex, F(1,390) 6.48, p < 0.05 and age, F(1,390) 7.29, p < 0.01).
Threshold scores
Since both the Western Australian data [30] and the multiple regression suggested that anxiety may be as important as the CES-D in defining caseness, the highest (approximate) 3% of scores on the CES-D (males 36 +, females 43 +) were deemed as ‘cases’, together with the top (approximate) 4% of scores on the STAI (males 56 +, females 62 +). This gave a total of 22 (6.5%) ‘cases’, of whom five (1.5%) were ‘cases’ on both the STAI and CES-D.
Table 4 shows the screening characteristics for males and females.
Sensitivity and specificity of the GHQ-12 against a criterion score on the Centre for Epidemiological Studies Depression Inventory and/or the State-Trait Anxiety Inventory
At a threshold of 13/14 for males and 18/19 for females the overall sensitivity was 88.8% and the specificity was 87.3% with 13% of cases misclassified. At the above cut-off values, the positive predictive value (PPV) for the GHQ was 38%, the negative predictive value (NPV) was 98.8%. Using these cut-off scores, 18.9% of the school sample were cases compared with 16.1% of the non-AOD hospital group and 56.7% of the AOD hospital group (χ 2 23.8, df 2, p < 0.001). To increase the yield or PPV of the instrument, higher threshold scores would be needed. A cut-off score of 15/16 for males and 20/21 for females increased the PPV to 53% and reduced the NPV to 98%.
Conclusions
The data from this sample of Australian adolescents suggested that the GHQ-12 may be a valid measure of general distress in this age group (11–15 years). The brevity of the scale and the fact that it is influenced by a number of traits that impact on psychological wellbeing implies that it will be a useful measure across a range of different situations and stressors. However, the GHQ was not designed to provide a specific diagnosis [1] and the findings of this study re-emphasizes this point; measures of anxiety, depression, stress and poor self-esteem were all significant independent predictors of scores on the GHQ-12. The GHQ-12 was also highly correlated with measures of negative affectivity and (low) selfefficacy.
This may make the GHQ a particularly useful measure with adolescents where there are likely to be a number of different threats to psychological health, such as poor self-esteem, that may not necessarily constitute a formal psychiatric condition. However, if a specific diagnosis or the measurement of a specific trait were required, a more appropriate instrument or diagnostic schedule should be used.
The GHQ was designed as a ‘state’ measure, with the period of reference the previous week. Accordingly, it is anticipated that its results will reflect overall mental health status and not be unduly effected by any acute events within this week timeframe. The excessive use of AOD was taken as a marker of pervasive psychosocial distress and thus it was predicted that those presenting to hospital emergency departments with problems related to AOD use would be more likely to be suffering from psychological distress than the general population of their peers, and this was the case. However, we were unsure how an acute medical event would effect GHQ scores. The majority of the non-AOD adolescents were attending ED for acute, but non-critical conditions, such as minor accidents and injuries. In most cases the event that precipitated the hospital visit occurred within the previous few hours, and of interest was whether this acute event would unduly effect GHQ scores. These adolescents appear, however, to have considered the full reference period of the previous week while answering the questions, and their GHQ scores were not significantly different from controls. Therefore one of the advantages of the GHQ revealed in our data was that it does not appear to be susceptible to the impact of acute but non-psychological events.
The threshold scores presented in this study should only be used as a tentative guideline. The CES-D has been criticized as a screening tool with young people, with evidence that although it is good at detecting true positives it is not effective in detecting true negatives (with a high percentage of false positives) [32]. Furthermore, the finding that the measures of anxiety, stress and self-esteem were also significant independent predictors of scores on the GHQ-12, suggested that the CES-D alone was not a true criterion against which to compare the GHQ-12. Therefore, the combined ‘cases’ on the CES-D and STAI were used. Until the GHQ-12 is compared against the ‘gold standard’ of a diagnostic interview, this report provides researchers with a suggested threshold score of 18/19 for females and 13/14 for males.
At the suggested thresholds there was a 99% probability that a negative test would indicate a person who was truly not a ‘case’. The PPV obtained, indicated that given a positive test result, there was a 38% probability that the person would be a true ‘case’. Thus, were the GHQ to be used as a screening tool to identify people who might benefit from an intervention, approximately two-thirds of the cases identified would not be true cases and would receive the intervention unnecessarily, but very few people who were ‘cases’ would be ‘screened out’ and not receive the intervention. The final decision on whether these figures are adequate rests on the costs, both financial and clinical, of missing a case against the costs of delivering unnecessary services. The PPV of a test depends on the prevalence of the ‘disease’ being screened for as well as the screening characteristics of the instrument [33]. Thus, the yield can be improved either by targeting a population where cases are more prevalent or by increasing the specificity of the test. For example, by increasing both the male and female thresholds by two, the PPV rose to 53% in this sample.
The findings of this study are markedly different to the results reported by Winefield and colleagues for older Australian adolescents, who found a high level of misclassified cases compared to any Axis I DSM-III diagnosis [8]. The diagnostic interviews rated 20% of the sample as cases, which is consistent with the 17.7% in a recent Australian survey assessing the prevalence of anxiety, affective and substance use disorders [34]. However, it was considerably higher than the prevalence of anxiety and/or depression found in children and young adolescents [30]. By including all Axis I conditions, there may be some disorders that the GHQ does not detect well, which may explain the mediocre sensitivity that they report. However, this would make little difference to the poor specificity. The Winefield [8] study used a GHQ-44, which contained the embedded 12-, 28- and 30-item versions of the GHQ. The threshold was 4/5 on the GHQ-28 scored as a binary (0,0,1,1) scale as opposed to the Likert scoring used on the GHQ-12 in this study. Although the binary method is more frequently used in case assignment, threshold scores using the Likert method have been reported [35], and the method offers a superior distribution of scores if psychological disorders are regarded as dimensions rather than categories [2]. However, in the current study, the screening characteristics are likely to be inflated by the use of the same method (short screening tests) to measure both psychological distress and to define caseness.
A limitation of the study should be emphasized; the sample of adolescents was not a representative sample of the adolescent population, and in particular it was likely to be biased towards children from a higher social economic status background. It was noted that the amount of missing data was greater with younger students, which may reflect reading and/or conceptual problems in understanding the test. However, the problem of missing data was apparent for all the instruments (only reported by grade for the GHQ) with the GHQ having fewer missing data than the other surveys. If comprehension difficulties were the cause of the missing data in this (probably) academically able group, the GHQ-12 may be inappropriate with very young adolescents from a more disadvantaged background. However, where there are problems solely related to reading rather than comprehension, it may be appropriate for an interviewer to administer the test to the adolescent. The GHQ has been used in this manner with illiterate or blind adults [2], but this approach may require the adjustment of the optimum threshold score.
Until the GHQ-12 is evaluated against a diagnostic interview with this population, the GHQ-12 appears to be a valid instrument with young adolescents of this sociodemographic group and it has a number of features that recommend its use. It is brief, easy to administer and score and is sensitive to a wide range of negative emotional states that impact on psychological wellbeing. The instrument shows some ability to discriminate between criterion groups, assuming that young people with alcohol or other drug problems have a higher level of psychological distress than those in school. Given that the GHQ has already been translated into a wide range of languages and validated with adults it should be easy to assemble cross-cultural normative data for young adolescents. Finally, the findings of this study increase the validity of earlier reports that have used the GHQ-12 as a measure with young adolescents.
Footnotes
Acknowledgements
The hospital data reported in this study were collected as part of a study funded by Healthway, the Western Australian Health Promotion Foundation. Thanks to the pupils and schools who participated in this project and Jennifer Frizell, Emily Hobson, and Danielle Monley for their work collecting much of the school data.
