Abstract
The General Health Questionnaire (GHQ) was developed to assess the extent of psychiatric illness in general practice [1]. It was designed as a screening instrument to provide information on the current mental wellbeing of a primary care patient by assessing normal ‘healthy’ functioning and the appearance of new, distressing symptoms rather than giving a specific psychiatric diagnosis [2]. Subsequently, the GHQ has been translated into numerous languages, extensively used in both research and practice and cross-culturally validated in adult populations [3]. However, a recent review of the GHQ-12 concluded that it was less effective in Australia than in other locations [4].
The GHQ manual notes that it is not appropriate for use with children but that it has been used with adolescents [2]. The GHQ has been used to assess psychological distress in adolescents in relation to a range of stressors including examinations [5], bullying [6], parental loss (death or separation) [7], and unemployment/job satisfaction [8]. It has also been employed in a range of English and non-English speaking counties.
However, in contrast to the situation with adults, there has not been an extensive and systematic evaluation of the validity of the GHQ in adolescent populations, an attempt to establish the lower age limit for its use or its cross-cultural validity. This paper reviewed the body of evidence that has now accumulated on the validity of the GHQ with adolescents (aged 12–19) in terms of its age appropriateness, screening characteristics and crosscultural utility. As there are several versions of the instrument and different methods of scoring, this information was also to be incorporated in the review.
The original 60-item GHQ has produced a number of progeny including the GHQ-30, GHQ-28 and the GHQ-12. The GHQ-30 and GHQ-12 were based on the questions that provided the best discrimination among the original criterion groups and were designed such that they contained equal numbers of questions where a positive answer showed health or illness [2]. The GHQ-28 was derived by factor analysis and was designed to give more detailed information that a single severity score with four subscales interpreted as ‘somatic symptoms’, ‘anxiety/insomnia’, ‘social dysfunction’ and ‘severe depression’ [9].
There are also several ways of scoring the test. The traditional method of scoring the questions was a binary code (0, 0, 1, 1) but the GHQ can also be scored as a Likert scale (0, 1, 2, 3) or by assigning different weights to questions associated with illness (0, 1, 1, 1) or health (0, 0, 1, 1) [10]. The threshold or cut-scores for the GHQ family not only vary with the scoring method and length of questionnaire but also across populations. A multicentre trial reported cut-scores ranging from 1/2 to 6/7 on the GHQ-12 and 3/4 to 7/8 on the GHQ-28 [3]. The screening characteristics will vary with the ratio of high to low scores in the sample, so researchers or practitioners working with community samples may need to use a lower threshold score than those screening in primary care settings [11].
Method
Inclusion criteria
To be eligible for inclusion in the review, studies needed to meet the following criteria. They had to be published in an English language journal between 1970 and March 2000. Adolescence was defined as the age range 12–19 years (inclusively) as commonly used [12, 13]. Where there were no validation data using an exclusively adolescent sample, studies that contained both adolescents and young adults were included. All versions of the GHQ were eligible, including versions translated into languages other than English.
Search strategy
We searched for studies published between 1970 and 2000 in Medline, PsychInfo, Current Contents and Sociological Abstracts using the terms ‘GHQ and adolescent’ and ‘General Health Questionnaire and adolescent’ (both with wild cards). In addition, we inspected the reference lists of the eligible studies and in major works on the GHQ.
Results
We identified 82 papers reporting on the use of the GHQ with adolescents aged 12–19 years at baseline. Of these, 23 were published before 1988 and 59 between 1988 (when the User's Guide [2] was released) and March 2000. Twenty-two papers reported data from the UK and 60 reported data either partially or completely collected outside the UK. Of these, the major contributor was Australia with 19 papers. We found only four studies directly validating the GHQ solely with adolescent participants plus four more that included both adolescents and young adults. We also identified six scales that had used the GHQ as a criterion measure in their validation process.
Validity in adolescent populations
The validation data was examined firstly for English language versions of the GHQ and then for translated versions. The seminal paper [14] on the use of the GHQ in adolescent populations established its validity by comparison with a structured clinical interview, the Present State Examination (PSE) [15] in a sample of 200 17-year-old persons. The PSE identified 3.5% of the group as cases and using a threshold of 5/6 the GHQ-28 had a sensitivity of 100%, a specificity of 84.5% and a misclassification ratio (the ratio of false positives plus false negatives to total sample size) of 15%. The subscales on the GHQ-28 were not separately validated. In addition to identifying ‘cases’, the GHQ can also be used as an index of the severity of psychological distress as the GHQ total scores were highly correlated with the scores on the structured interview. The GHQ-30 and 12 were embedded in the same questionnaire but had inferior psychometric characteristics with misclassification ratios of 23.5% and 17.5%, respectively. Table 1 summarizes the validity studies involving adolescents and young adults.
Validation of the GHQ with adolescents and young adults
A study of abnormal eating attitudes [16] assessed the validity of the GHQ-28 against the criterion of the Clinical Interview Schedule (CIS) [17] in a sample of 15-year-old schoolgirls. Using a threshold of 5/6 the sensitivity was 0.61 and the specificity was 0.85. The CIS provides two methods of defining ‘caseness’, either by a total weighted score or by an overall severity rating. Out of the 78 girls interviewed, 29 or 18, were defined as cases by these different methods, respectively. The classification indices in Table 1 are in comparison with the overall severity rating, but the classification results were similar for the two methods.
An Australian study assessed the reliability and validity for the combined GHQ-12, 28 and 30 as part of a four-year follow-up [18]. Clinical psychiatrists interviewed a non-random selection of 118 GHQ-28 ‘cases’, together with 118 matched ‘non-cases’. The sample contained approximately equal numbers of male and female subjects from a narrow age range of older adolescents. (The mean age was 19.6, standard deviation 1.06, so this sample would probably have included people aged at least 20 years). Using the conventional threshold for binary scoring of 4/5, the GHQ-28 misclassified over 40% of people.
The sensitivity and specificity were, 0.70 and 0.52 for males and 0.82 and 0.57 for females. Using a more stringent threshold of 7/8, the proportion misclassified fell to 33% for males and 25% for females, but at this threshold, the sensitivity for males was 0.45. All three versions of the test had acceptable internal reliabilities and when scored by the Likert method, were significantly correlated with other self-report psychological measures. The screening indices were not reported for the GHQ-12 and 30.
We located no validation studies of the Chinese version of the GHQ solely with adolescents but there were investigations with young adults that included people less than 20 years of age. The GHQ-30 was initially validated in English speaking young Chinese people (aged 17–25) against other clinical tests (Chinese Minnesota Multiphasic Personality Inventory (MMPI) and Self-Reporting Questionnaire (SRQ)). The convergent and discriminant validity was evaluated against similar and dissimilar subscales. The level of concordance between the GHQ and SRQ in identifying ‘cases’ and ‘non-cases’ was greater than 75% for all the reported threshold scores. Cases were defined on the MMPI via three subscales (Neurotic, Psychotic and Sociopathic) and an overall score [19]. Table 1 shows the alpha coefficient and screening characteristics compared with the overall score, where 17% were misclassified. There were more misclassified cases for each of the three subscales than with the overall score at a threshold of 5/6 on the GHQ.
A further investigation examined the underlying constructs of the GHQ by administering a Chinese translation and the English version of the GHQ-60 to bilingual Chinese students (aged 18–24) [20]. In addition, the concurrent validity of the Chinese GHQ-30 was gauged in school students (aged 11–20) by comparing their scores on the GHQ with a battery of other clinical scales [21]. However, the latter study did not calculate the optimum threshold value or provide classification indices. It did confirm the reliability (Cronbach's alpha 0.88), internal consistency (split half reliability 0.78) and stability of the five factor structure of this version of the GHQ [22].
Politi and coworkers [23] investigated 252 military recruits (all aged 18) who had passed a threshold on an army psychiatric screening instrument and 111 from below the threshold. The GHQ results were compared with the criterion of a psychiatric interview (ICD 9 criteria). Using a threshold of 8/9 on the GHQ-12 (Likert scoring) the sensitivity was 68%, specificity 59% and overall 40% were misclassified. The extent to which these findings can be generalized to other young persons is questionable, given that the sample was all male, aged 18 and there may have been an incentive to falsify answers and attempt to deceive the psychiatrist to avoid military service.
The GHQ has been validated for young persons in both Spain and Yugoslavia, but again, not solely with ‘adolescents’. In Spain, the GHQ-28 was validated for young persons (15–29) against a semistructured interview (CIS [17]) with 8.57% misclassified [24]. The Yugoslavian investigation [25] used third year medical students (age range not reported) and a criterion of the Standardized Psychiatric Interview [17], which gave a misclassification proportion of 8.6%.
An analysis of the component structure of the questionnaire in new populations does not provide direct evidence of validity, but if a similar structure was found to that originally reported and validated by Goldberg and Hillier [9] there would be support for the construct validity of the instrument in that population. Five investigations have examined the component structure of the GHQ-28 in adolescent samples (see Table 2). Elton and coworkers [26] investigated in groups of 15-year-old schoolgirls, whether the pattern of components was consistent across countries and in different ethnic groups within countries. By constraining the number of components to four, they obtained a similar pattern of item loadings to that found in the original study [9]. However, this only applied to the girls of British extraction. The other groups (Greek girls in Greece and Greek girls in Germany plus Turkish girls in Turkey) did not display the same pattern of loadings, but the test did identify a similar proportion of cases across the groups. A second study that investigated the pattern of components in Greek and Turkish adolescents also failed to replicate the original structure, finding that six components were readily interpretable [27]. When restricted to four components, the pattern was similar to the original data but there were inconsistencies that may have indicated either different cultural interpretation of some items or an age effect [27]. An earlier report of the same GHQ-28 data set also noted that the migrant sample did not replicate the original components [28].
Studies assessing the component structure of the GHQ-28
There have been two Japanese studies that have assessed the structure of the GHQ-28 with adolescents; both of which had approximately equal numbers of male and female participants. One study [29] compared their results with the combined data from Elton and coworkers [26]. Each of the Japanese subscales had their highest loading (coefficient of factor similarity) on the appropriate European subscale. The second study [30] found that Japanese adolescent boys and girls had different component structures even when restricted to four components. The study also included a group of university students (aged 18–21) where the pattern of loadings was similar for male and female students, but differed from the younger subjects. Overall, these findings question the validity of the scaled version of the GHQ in assessing adolescent psychological distress from a cross-cultural perspective, at least in these populations.
Discussion
As a measure of psychological distress, the GHQ has been described as ‘quite possibly the best instrument of its kind’ [10], p.59]. It has been extensively used in adult populations with the User's Guide documenting some 43 validation studies [2, 3]. Since the seminal paper reporting on the validity of the measure with adolescents [14], it has also become a popular instrument with younger populations. We identified 82 papers that reported on data from the age group 12–19 years at recruitment. However, the validity data to support the use of the GHQ with adolescents were limited. Including the Winefield study [18], there were only four investigations that used only adolescent subjects and there were no age-specific data on adolescents younger than 15 years. Furthermore, even though the User's Guide states that the GHQ is not suitable for use with children [2] we were unable to locate any studies that have systematically attempted to assess the minimum age or developmental stage at which the questionnaire becomes valid.
Age appropriateness
Before using self-report measures of psychopathology with young persons, it is necessary to ascertain whether they can accurately report their symptoms; from what age or developmental stage their self-reports are reliable; and whether their experiences of psychiatric symptoms are stable across the age range in which the instrument will be used. In consideration of the rapid developmental and cognitive changes in young people, psychometric tests need to be backed by appropriate age, sex and clinical norms versus non-clinical norms [31, 32]. These normative data are not available for use of the GHQ with adolescents, especially in relation to the transition from childhood to adolescence. Neither are the data establishing a unitary concept of ‘psychological distress’ across adolescence.
Only one (Chinese translation) validation study (GHQ-30) covered the full age range of adolescence (11–20) and found that it was highly correlated with other clinical scales [21]. One limitation with this study was that it did not report an analysis of the GHQ by age or establish a minimum age for its use. The youngest age specific data were for 15-year-old girls [16] and by analogy it could be asserted that the GHQ would be valid for boys of this age, especially as sex is claimed not to be a confounding variable in adult samples [2, 3]. However, there are no age-specific data for males younger than 17 to support this assumption.
This limited evidence to support the use of the GHQ with young adolescent males also needs to be tempered with the knowledge that there appear to be systematic differences between girls and boys on the GHQ from a national cohort study [33] and from a number of school based studies that report that girls either had significantly higher scores or a greater proportion of ‘at risk’ cases than boys [5, 34–37]. In addition, there was some evidence of a significant effect of age [27, 38] and practitioners need to be aware of a number of stressors that may be confounded with age in late adolescence, for example, school examinations [5], employment status [39], job satisfaction [8] and perceived financial strain associated with unemployment [40] in young persons. Therefore, there are dangers in making assumptions based on single sex data or in extrapolating from one age group to another.
The appropriateness of the CIS [17] which was used as a ‘gold standard’ in some of the GHQ validation studies to identify adolescent psychopathology and the alternative methods of identifying cases with this interview has been criticised [16]. However, with older subjects, the overall misclassification ratio for GHQ ‘cases’ was less than 9% against this structured interview [24]. Hence, in future investigations a specific adolescent assessment or diagnostic criteria should be used as part of the validation process.
Screening characteristics
As a guideline, a screening test with a sensitivity and specificity of at least 0.80 is considered acceptable [24]. Using these criteria, there is support for the use of the GHQ-28 with adolescents from the Banks investigation [14], but both the GHQ-12 and 30 fell below this level. In the study by Mann and coworkers, the sensitivity of the GHQ-28 was only 0.61 [16] and both an Australian [18] and an Italian [23] study failed to reach the benchmark level and had ‘unacceptable’ levels of sensitivity and specificity. In the studies that contained older subjects, the classification indices were more ‘acceptable’ with both the Yugoslavian and Spanish studies passing the 0.80 thresholds for sensitivity and specificity [24, 25] and in an English speaking Chinese sample, only the sensitivity was marginally below this guideline [19]. Thus, with the exception of the Banks study [14], some questions remain about the validity of the test with adolescents either as a function of its screening ability or because the validation data included older subjects.
The component structure of the GHQ-28 was originally determined by pragmatic as well as theoretical considerations, with a six component solution rejected to ensure the separation of the anxiety and depression indicators [9]. Thus, it was not surprising that a replication in young persons reported an interpretable six component solution [27]. Of more concern in establishing the validity of the GHQ with adolescents, was the different component structures between age groups [30] and the possibility of age (or cultural) related differences in interpreting some questions [27]. Nevertheless, there was support for the validity of the GHQ from a study that replicated the original structure of the GHQ-28 in a sample of English 15-year-old girls [26].
Cross-cultural appropriateness?
A further issue arises when an instrument developed in one language and for one culture is translated and used in another culture, as even within the same language there may be different idioms necessitating the replacement of particular questions [2]. As a minimum standard, questionnaires should be independently translated from the original language, into the new language and then back to the original language to imply the equivalence of the two versions [41]. A more sophisticated approach to translation may involve a group of bilingual experts in the field examining the underlying concepts in addition to the exact vocabulary [42]. One of the main arguments for using the GHQ with adolescents from other cultural and language backgrounds is that it has been validated in a number of countries and languages with adults and there should be appropriate translations available to use if it is validated with adolescents.
In the four non-English validation studies reviewed, only the Yugoslavian study did not document the translation process or source [25]. The remaining studies provided either details of the process or had suitable versions available for use in their research. Prior to the investigation with young persons, the GHQ-60 and 28 had been translated into Spanish by two psychiatrists to retain the essential meaning of the questions and validated in adult samples [43, 44]. The Italian study [23] did not describe the translation process but an Italian version of the GHQ-30 had previously been validated in a general practice setting where independent bilingual experts checked the translation [45]. The Chinese version of the GHQ was validated in stages including administering the test to English speaking Chinese students [19] and to bilingual students, with the translation performed by the author [20].
An important limitation of this review was that although translated versions of the GHQ were eligible for inclusion, the literature search was restricted to English language publications. Hence, it is likely that there are further validation studies, particularly on translated versions of the GHQ that have been published in non-English language journals, which would provide further support for the use of translated versions of the test. The issue of cross-cultural validity is of importance as 59 out of the 82 adolescent studies we identified, were conducted either partially or completely outside the UK.
Implications
Before the GHQ can be used with adolescents, investigations to identify important features are required. First, to assess the validity of the GHQ by age, to establish an appropriate minimum age for its use, and particularly to define the transition from childhood to adolescence. Second, to compare the appropriateness of adolescent versus adult ‘gold standard’ criterion interviews. Third, to assemble normative data, especially by age, sex and clinical status for young persons. And fourth, to validate the existing and future translated versions of the GHQ with adolescents.
The limitations in the validity data also have implications for instruments that have used the GHQ as part of their validation process. We identified six scales that have used the GHQ as a criterion measure with adolescents, age range 13–19. Of these, there were data to support the use of the GHQ in two instances (Newcastle Adolescent Behaviour Screening Questionnaire [46] and the Multidimensional Scale of Perceived Social Support [47]). In the remaining four, supporting data were weak (High School Stressor Scale [48], Hunter Opinions and Personal Expectations Scale [49] and the People In Your Life scale [50]), or not available (Family Environment Scale [51]), reducing the validity claimed for these instruments with respect to adolescents.
However, the study does indicate that the GHQ seems appropriate for adolescents at the older end of the spectrum (females aged 15–19 and males aged 17–19) in the UK and Hong Kong. In the remaining reports, there were either a high proportion of misclassified cases or a proportion of young adults or both. Thus, validity of the instrument for other adolescent populations remains to be demonstrated. Where valid, the GHQ can be used with binary scoring as a screening tool and there was also support for using Likert scoring to index the severity of distress. While questions remain about the validity of all versions of the GHQ, the GHQ-28 had the best characteristics in comparison with the other shortened versions and it seems unlikely that any gains from the use of the GHQ-60 would justify the extra time needed to administer the full survey.
