Abstract
Epidemiological and community-based surveys have a critical role to play in promoting a better understanding of risk factors and aetiology for mental disorders, quantifying the prevalence and consequences of mental illness and providing a basis for policy development, planning and service delivery. Consequently it is of the utmost importance that the measures used in survey research are well understood by researchers analysing survey data, and clinicians, policy-makers and others whose work will be informed by survey findings.
This paper examines the characteristics of two indices of physical and mental health commonly used in community surveys: the Short Form-12 (SF-12) summary measures of physical health (SF-PH) and mental health (SF-MH) [1]. Developed as a brief alternative to the SF-36 [2], the SF-12 scales are widely used in health survey research, capturing 91% and 92% of the variance in the longer-form SF-36 physical and mental health summary scales, respectively [2]. In the Australian context, these measures have been used in surveys such as the PATH Through Life Project [3] and the National Survey of Mental Health and Wellbeing [4]. Despite their extensive use, there is some uncertainty regarding the interpretability of the SF summary scales [5].
Taft et al. [5] examined the extent to which the SF-36 summary scales adequately represent the eight dimensions of functional status and quality of life (physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional and mental health) that form the original SF-36 subscales. Several counter-intuitive characteristics of the summary scales were identified as resulting from the process of applying negatively weighted factor coefficients to items representing mental health when scoring SF-36 physical health, and items representing physical health when scoring SF-36 mental health. It was demonstrated that maximum scores on the physical health summary are only attainable for respondents who provide high ratings on the physical health subscales (good physical health) and very low ratings on the mental health subscales (poor mental health). Similarly, minimum scores on the physical health summary are only attainable for those who provide low physical health subscale ratings, and high mental health subscale ratings. The same scoring artefacts also apply to the mental health summary scale. While the influence of the negatively weighted subscales is most striking when considering their effects at the extreme ranges of the summary scales, all scores throughout the possible range are subject to being weighed up or down by the effects of negatively weighted subscales [5]. The same scoring issues apply to the shorter-form SF-12 scales, where responding to items that represent one domain (e.g. physical health) in a way that indicates poor health, in most cases results in a positive increment to health measured on the alternative summary scale (e.g. mental health).
These scoring issues represent the effects of applying factor scores based on orthogonal rotation, which assumes that measures are uncorrelated, to constructs that have an actual association [6, 7]; in the case of the SF-12, physical and mental health, which have been shown to co-occur in clinical and population-based studies [8, 9]. The scoring method that underlies the SF-12 summary scales might be regarded as having some utility, in producing a measure of mental functioning apparently independent of physical functioning (and vice versa). However, the benefits of this approach may be offset by increased difficulty in interpretability. For example, it is not possible to determine the extent to which a high score on SF-MH reflects very good mental health, very poor physical health, or a combination of the two.
We investigated circumstances where use of the SF-12 summary scales could introduce interpretational difficulties using data from the PATH Through Life Project. To investigate possible limitations associated with the SF12 scoring, results obtained using the SF-12 were compared with results obtained using the RAND-12 summary scales [10]. The RAND scoring system is based on the same items that constitute the SF scales, however, factor weights used in the RAND protocol were calculated using oblique rotation. Consequently, using the RAND scoring, physical health (RAND-PH) and mental health (RAND-MH) components are allowed to correlate, and the scoring artefacts associated with the negatively weighted items that contribute to the SF-12 scoring described above do not apply.
We aimed to examine the nature and strength of associations between summary measures of health calculated using the SF and RAND approaches to scoring, and alternative measures of physical and mental health obtained in the PATH interview. Given the positive association between physical and mental health [8, 9], a monotonic relationship between these variables would be expected. However, the effects of negatively weighted subscales in determining scores at the extreme ranges of the SF-12 summary scales could produce non-monotonic associations between SF-12 scale scores and the alternative cross-domain measures. As responses indicating poor mental health are weighted to produce higher physical health scores in the SF-12, we investigated whether this would result in a non-monotonic association between SF-Physical scores and an alternative independent measure of psychological distress.
Method
Participants and procedures
The PATH Through Life Project is a community survey of 7485 people aged 20-24, 40-44 and 60-64 years, living in Canberra and Queanbeyan, Australia. Details of the sampling procedure are reported in Jorm et al. [11]. Response rates were 58.6% for 20- to 24-year-olds, 64.6% for 40 to 44-year-olds and 58.3% for 60 to 64-year-olds.
Participants were required to complete a questionnaire using a hand-held computer and a trained interviewer administered additional physical and cognitive tests. Only measures pertinent to the present study are reported here. The Human Research Ethics Committee of The Australian National University approved the study protocol.
Measures
Physical and mental health summary scores
The SF-12 physical health (SF-PH) and mental health (SF-MH) summary scales are each calculated using the same subset of 12 items selected from the SF-36. Either one or two SF-12 items represent each of the eight SF-36 subscales. Summary scales are standardized to produce means of 50 with standard deviations of 10 in the US population with higher scores indicating better health [1].
Under the RAND-12 scoring, only the six items measuring aspects of physical health load on the physical health component score (RAND-PH), and the six items measuring mental health load on the mental health component score (RAND-MH) [10]. As with the SF-12 scales, the RAND-12 component scores are standardized to a mean of 50 with a standard deviation of 10 in the general US population, with higher scores indicating better health.
Psychological distress
Goldberg's depression and anxiety scales [12], which assess symptoms experienced in the past month, were used as an alternative measure of mental health or psychological distress. The scales consist of nine depression and nine anxiety items (symptoms). The scales were summed to form a composite measure of psychological distress, with possible scores ranging from 0 to 18, and higher values indicating more symptoms [13].
Chronic physical conditions
Participants reported whether they experienced any chronic physical conditions (e.g. heart trouble, cancer, arthritis) from a list provided.
Reported conditions were summed to produce a total index of chronic physical conditions with a possible range of 0-9.
Statistical analysis
Age group differences in measures of physical and psychological health were assessed using analysis of variance. Pearson correlation was used to investigate bivariate associations between variables within and across the domains of physical and mental health, with Fisher's exact test used to assess differences between correlation coefficients. Cross-domain relationships between psychological distress and SF-PH and RAND-PH scales were further explored by fitting lowess lines using the Stata graphical interface (StataCorp, College Station, TX, US). Lowess lines are calculated using an iterative process based on weighted least squares regression, and provide graphical representations of any non-linear trends in the relationships between two variables [14]. SPSS version 12.0.1 (SPSS Inc., Chicago, IL, US) and Stata version 8.0 were used for data analysis.
Results
Descriptive statistics
Physical health (SF-PH, RAND-PH, chronic physical conditions) and mental health (SF-MH, RAND-MH, psychological distress) scores provided by the three age cohorts are shown in Table 1. Both the SF-PH and the RAND-PH scores showed a decline with increasing age (SF-PH: F(2, 7447) = 223.12, p < 0.001; RAND-PH: F(2, 7454) = 81.16, p < 0.001). The number of chronic conditions reported showed a corresponding increase with age (F(2, 7445) = 658.55, p < 0.001).
Physical and mental health summary statistics by age group
SF-PH, SF-12 physical health; SF-MH, SF-12 mental health; RAND-PH, RAND-12 physical health; RAND-MH, RAND-12 mental health.
Scores on the mental health summary scales increased with age (SFMH: F(2, 7447) = 371.02, p < 0.001; RAND-MH: F(2, 7450) = 232.47, p < 0.001) indicating that mental health was better in the older-age groups. Psychological distress also showed a decline with increasing age (F(2, 7436) = 266.23, p < 0.001).
Intercorrelations among the summary health measures are displayed by age group in Table 2. SF-PH and RAND-PH scores were strongly associated in each age group, as were SF-MH and RANDMH scores, indicating broad agreement across scales. The results for the SF summary measures show weak negative relationships between the SF-PH and SF-MH in the younger age groups, while a weak positive relationship between these measures emerged for 60- to 64-year-olds. In contrast, the correlations between physical and mental health as measured by the non-orthogonal RAND-PH and RANDMH scales were much stronger and ranged from 0.48 to 0.59. The results using the RAND summary measures showed a tendency for the association between physical health and mental health to increase with increasing age.
Intercorrelations among summary measures of physical and mental health by age group
∗∗p < 0.01; ∗p < 0.05. SF-PH, SF-12 physical health; SF-MH, SF-12 mental health; RAND-PH, RAND-12 physical health; RAND-MH, RAND-12 mental health.
Associations between measures of physical and mental health
Figure 1 provides a graphical representation of the associations between the SF and RAND measures of physical health and psychological distress for each age group, using lowess lines. Cross-domain correlation coefficients representing the relationships between the SFPH, RAND-PH and psychological distress are shown in Table 3. For each age group, the relationship between RAND-PH scores and psychological distress is close to monotonic throughout the range of values on both scales. However, the relationship between SF-PH scores and psychological distress takes a marked departure from monotonicity, with a sharp upward turn at around a score of 57 on the SF-PH, indicating that very high physical functioning (positive health), as measured by the SF-PH is associated with higher levels of mental health symptomatology (poorer mental health).
Lowess plots of psychological distress by physical functioning ((——) SF-PH, (– – – –) RAND-PH) for three age groups.
Correlation coefficients representing within and across domain comparisons of relationships between SF12 and RAND-12 summary scales, and measures of chronic physical conditions and psychological distress
∗∗p < 0.01; ∗∗∗p < 0.001. SF-PH, SF-12 physical health; SF-MH, SF-12 mental health; RAND-PH, RAND-12 physical health; RANDMH, RAND-12 mental health.
The non-monotonic associations between the SF-PH and psychological distress are reflected in the weaker correlation coefficients representing this relationship for each age group, relative to correlations between RAND-PH scores and psychological distress (Table 3). Fisher's exact test revealed a significant difference between these correlation coefficients in each age group (20-24: Z = 30.08, p < 0.001; 40-44: Z = 29.32, p < 0.001; 60-64: Z = 24.25, p < 0.001). The nature of the relationships between SF-PH and psychological distress scores demonstrates the extent to which SF-PH scores are influenced by responses to the SF-12 items indicating poor mental health. The point at which the lowess curves take a marked departure from monotonicity in each age group is instructive, as it represents the point of the SF-PH scale that reflects the maximum possible score (56.58) based on positive responses to the items (endorsing the highest ratings for physical and mental health). In our sample, 32.0% of 20- to 24-year-olds, 27.1% of 40- to 44-year-olds and 14.5% of 60- to 64-year-olds provided SF-PH scores that were above the expected range based on positive responses. For the SF-MH, 0.8% of 20- to 24-year-olds, 1.7% of 40- to 44-year-olds and 9.6% of 60- to 64-year-olds provided scores that fell above the maximum of 60.76 based on mental health items.
The associations between the mental health summary scores and chronic medical conditions shown in Table 3 were stronger in analyses that included the RAND-MH relative to those that included the SF-MH (20-24: Z = 5.41, p < 0.001; 40-44: Z = 8.11, p < 0.001; 60-64: Z = 18.16, p < 0.001). However, the differences in the size of the associations were not as marked as those observed for the SF-PH and RAND-PH summaries and psychological distress.
Discussion
This study investigated the performance of the SF-12 physical and mental health summary scales in a populationtion-based sample of young, middle-aged and older adults, with a view to informing the effective use and interpretation of measures of physical and mental health in epidemiological or community-based surveys. Examination of associations between SF-12 and RAND-12 summary scores and alternative measures of physical and mental health within and across domains, indicated that the RAND scales generally had stronger within-domain associations with depression and anxiety symptoms and chronic health conditions than the corresponding SF-12 scales.
The results also indicated that RAND-12 mental health was more strongly associated with chronic physical conditions than SF-12 mental health. Similarly, RAND-12 physical health was more strongly associated with depression and anxiety symptoms than SF-12 physical health. The impact of negatively weighted SF-12 items reflecting aspects of mental health on the relationships between SF-PH scores and psychological distress is illustrated in Fig. 1, where lowess curves showed significant departures from monotonicity. The results illustrate potential difficulties associated with determining the extent to which high SF-PH scores reflect good physical health, or poor mental health. The effects are most clearly evident at scale extremes, where summary scores are pushed outside of the expected range, as a result of negatively weighted items. However, the negatively weighted items exert an influence on scores throughout the possible range. While the extent of these effects could be determined for any given individual by examining their responses to the SF-12 items, it is difficult to ascertain the overall impact of negatively weighted items on the non-extreme range of scores across an entire sample.
The developers of the SF-36 have highlighted the impressive number of validation studies, and referred to the multitude of published studies that have used the SF-36 summary scales, in defending the appropriateness of the existing method of scoring [15]. The SF-36 and SF-12 summaries may indeed be broadly effective in capturing the concepts of physical and mental functioning for use in many analytical contexts. However, our results indicate that the scoring procedures underlying the SF-12 could complicate interpretation of results. The scoring artefacts are of particular relevance to clinical research, where groups under study that are defined by the presence of symptoms in one health domain (e.g. physical) are likely to produce inflated scores in the other domain (e.g. mental) when using the SF summaries [16, 17]. Our results indicate that use of the SF-12 summaries could also produce erroneous conclusions in large-scale epidemiological research, particularly when cross-domain associations are under investigation.
Conclusions
Community surveys can provide vital information for future research and policy planning concerned with mental health. Consequently, it is critical that researchers are fully aware of the implications for interpretation that arise from applying different methods of scoring to commonly used measures of mental and physical health; in this case the SF-12 summary scales. Our results highlight potential difficulties that can arise in interpreting the associations involving the SF-PH and SF-MH arising from the application of negative factor loadings. The results also indicate that these difficulties can be overcome by using non-orthogonal summary measures of physical and mental health that are derived from the same items, using the RAND approach to scoring. It should be emphasized that the subscales comprising the full SF-36 (physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional and mental health) are not subject to the same scoring artefacts as the SF-36 summary scales. Consequently the use of subscales representing aspects of physical and mental health available in the SF-36 remains a valuable approach to the measurement of health and functional status.
Given the common use of the SF-12 items, it is likely that the measure will continue to be included in community surveys, including the second National Survey of Mental Health and Wellbeing planned for Australia in 2007. We recommend, however, that summary measures of physical and mental health be derived using the RAND-12 scoring in addition to the traditional SF-12 scoring procedures. We also recommend that researchers using summary measures of health status consider carefully the implications for interpretability before deciding upon the most appropriate form of the scale to use in their analyses.
Footnotes
Acknowledgements
Funding for data collection was provided by Unit Grant no. 973302 and Program Grant no. 179805 from the National Health and Medical Research Council and a grant from the Australian Rotary Health Research Fund. Bryan Rodgers is supported by NHMRC Research Fellowship no. 148948, Kaarin Anstey by NHMRC Research Fellowship no. 179839, Anthony Jorm by NHMRC Research Fellowship no. 148947 and Peter Butterworth by NHMRC Public Health (Australia) Fellowship no. 316970. Thanks to Helen Christensen, Trish Jacomb, Karen Maxwell and the team of PATH interviewers for their contribution to the research.
