Abstract
The importance of routine outcome measurement in mental health care is being increasingly recognized in several countries, most comprehensively in Australia [1] and New Zealand [2]. Significant literature has grown in the areas of instrument selection [3–5], development [6] and testing [7],[8]. Comparatively less is known about howwell outcome instruments and combinations of these, perform in routine practice.
Outcomes, services utilization and cost data were collected in an Australian Mental Health Classification and Service Costs study (MH-CASC [9]) in 1996 with the primary aim of developing a mental health service casemix classification [10]. This study provided the model for the New Zealand mental health Classification and Outcomes Study (CAOS) which provides the raw data for this paper [11],[12].
The main purpose of these studies was to develop casemix classifications. However, they also provide a great deal of data on the performance of the outcome measures that were used. This paper focuses on the adult instruments used in the CAOS study. In particular, we examine how well constituent items were completed, key psychometric properties, the relationships between them and the relationship with selected demographic and clinical characteristics of consumers.
Method
In 1999, the New Zealand Health Research Council commissioned the Mental Health Casemix Classification and Outcomes Study (CAOS) to: (i) develop a first version of a national casemix classification for specialist mental health services that built on the one developed in the Australian MH-CASC project; and (ii) to trial the introduction of outcome measurement into routine clinical practice [9]. The scope included all specialist mental health services provided directly by District Health Boards (DHBs) except alcohol and drug services and residential services. The study was conducted in full or part in eight of the country's 21 DHBs, which collectively accounted for 22% of all New Zealand mental health services.
In addition to collecting service utilization and cost data, the study collected data on consumer characteristics, which are the focus of this paper. Consumer characteristics included demographics (date of birth, sex, ethnicity and area of usual residence), psychiatric diagnosis (principal and additional), episode information (dates and reasons for start and end), legal status and ratings on outcome measures. The key adult outcome measures were the Health of the Nation Outcome Scales (HoNOS) [13] and a short form of the Life Skills Profile (LSP) [9],[14]. The HoNOS is a 12-item clinician-completed instrument developed in the UK in the mid-1990s. It was designed specifically for use with people with a mental illness and is widely regarded as a measure of severity of mental health disorder. The LSP is a 39-item instrument developed in Australia in 1989 as a measure of disability and functioning in people with mental disorder. The 16-item version used here was developed in the mid-1990s for use in the MH-CASC project.
Data collection was governed by the key concept of an ‘episode of care’ [11],[15]. Consumers were rated on the instruments at the start and end of each episode and for those episodes that were ongoing, at intervals of 90 days. In this study, episode start data were collected from all those consumers under care at the beginning of the study, and those who entered during the study period. Similarly, episode end data coincided with the end of the study or episode closure.
Results
Data were obtained from a total of 12 576 consumerswho collectively accounted for over 19 000 episodes of care. The male: female ratio was 53.5:46.5. The consumer sample covered all age groups, although just over half were in the range 20–50 years. The ethnic distribution across all ages included 20% Maori and 5% Pacific Island peoples, which differed slightly from the national census figures (15% and 4%, respectively). Before analysis, the dataset was prepared by the removal of a number of problematic ratings. These included problems with dates (invalid dates, or mismatch between collection dates and episode dates), duplicates and totally blank assessments. This process left 20 748 HoNOS assessments and 17 721 LSP-16 assessments.
Instrument compliance and item completion
In total, 95% of adult episodes had at least one HoNOS completed but only 58% had a HoNOS completed at both episode start and episode end, with episodes beingmore likely to have the end assessment than the start assessment missing. There was a consistent pattern for more items to be missing at the start of episodes than at the end and in inpatient settings compared to community settings. In inpatient settings, items 11 (accommodation problems) and 12 (occupation and leisure problems) were the most prone to attract missing ratings (approximately 13%), followed by item 8 (other behavioural and psychological problems), item 9 (relationship problems) (both 5%), item 10 (activities of daily living) and item 3 (drug and alcohol misuse) (both approximately 4%). All other items were missing in less than 2% of assessments. The most commonly missing item in assessments conducted in the community was item 8 (2%), followed by item 12 (1.5%). In inpatient settings 79% of HoNOS assessments had no missing items, compared to 95% in community settings. The lower rate of complete forms in inpatient settings was because of the much higher omission rate of items 11 and 12.
The study protocol called for the LSP-16 to be collected for both inpatient and community episodes, but this tended not to happen in inpatient episodes. Seventy-eight per cent of episodes had at least one LSP-16 completed and 53% had an LSP-16 completed at both episode start and episode end. As with the HoNOS, episodes were more likely to have the end assessment than the start assessment missing. In all contexts (inpatient and community, episode start and end), 98% or more of LSP-16 assessments were completed with no missing items. For each of the 16 items of the LSP-16, the missing rate was less than or equal to 0.5%, irrespective of setting or collection occasion.
Psychometric properties
The subscale structure of the HoNOS was explored using confirmatory factor analysis (CFA) [16]. One main application of CFA is to assess the goodness-of-fit between a set of data and a presumed underlying structure. The original work on the current version of the HoNOS was based on exploratory factor analysis, but no details were published [17]. For the present purposes, the only published study of the subscale structure of the HoNOS [18] was used. That study used 2035 HoNOS assessments with no missing ratings; the present data comprises 18 908 similar HoNOS records.
In CFA, assessment of the goodness-of-fit between themodel and the data is achieved through a variety of fit indices [19]. Acceptable levels for these are shown in Table 1 which also shows the fit indices for the conventional four-factor structure from the earlier study and the present data. In both these analyses, the same technical adjustments (allowing certain error terms to be correlated) were made.
Fit indices following confirmatory factor analyses (CFAs) of Health of the Nation Outcome Scales (HoNOS) subscale structure, by data source
†Root mean square residual; ‡root mean square error of approximation; normed fit index; Tucker–Lewis index. CAOS, Classification and Outcomes Study.
18] proposed an alternative, five-factor, structure. This retains the conventional Impairment and Social subscales, but has a newBehaviour subscale comprising only items 1 and 3, a new Depression subscale comprising items 2, 7, 8 and 9 and a single item Hallucinations/delusions ‘subscale’ comprising only item 6. This is displayed diagrammatically in Fig. 1. The ovals show the subscales, the rectangles the items and the arrows the relationships between the items and the subscales. For the sake of clarity, the diagram omits: (i) error terms and their correlations; and (ii) indications that the subscales themselves are correlated.
Five-subscale model of HoNOS.
18]; the fit indices for the five-factor structure in the present data were within or around the recommended levels and similar to those obtained by Trauer [18], reinforcing that the five-factor structure provides a far better fit to the data than the conventional four-factor structure.
One desirable feature of a subscale is internal consistency (usually measured by Cronbach's alpha (α)), which suggests that the items are tapping aspects of the same attribute and not several different attributes. The Impairment and Social subscales are identical in both models, hence the α values are the same. Internal consistency is not applicable to the Hallucinations/delusions ‘subscale’, since it comprises a single item. The internal consistency of the Behaviour subscale was hardly affected by the reduction from three to two items (α of 0.57 and 0.53, respectively) and the Depression subscale was more internally consistent (0.62) than the Symptoms subscale (0.30). The Cronbach's α values for the total score (i.e. all 12 items) was 0.80. This is considerably higher than the values of 0.72 in the Victorian Round I study [20], 0.73 in the Stedman et al. study [21] and 0.75 in the Victorian HoNOS field trial [22].
One further matter of interest in the HoNOS is the performance of item 8. This item allows the clinician to select from among nine listed problems and rate this problem using the usual rating of 1–4. Just over half (10 526) of the HoNOS ratings had item 8 rated as zero, meaning that none of the list of problems applied. Of the 10 222 assessments in which item 8 was rated, anxiety and panic was rated in 34% of assessments, sleep in 18% and reactions to severely stressful events and traumas in 17%. Thus, of the nine problems listed, three accounted for approximately 74% of the ratings. This is similar to findings elsewhere: the same three choices jointly accounted for 46% of all item 8 assessments in the Victorian HoNOS field trial [22] and for 68% in the Victorian ‘Round One’ data [20].
A CFA similar to that performed with the HoNOS data was conducted on the over 17 000 LSP-16 recordswith no missing ratings. None of the goodness-of-fit indices achieved the recommended levels indicating that therewas a poor fit between the data and the four-subscale structure. Further analyses did not suggest a satisfactory alternative model. The Cronbach's α values for the Withdrawal, Self-care, Compliance and Antisocial subscales were 0.83, 0.77, 0.81 and 0.80, respectively, indicating good levels of internal consistency. The Cronbach's α values for the whole scale (i.e. all 16 items) was 0.89 which is identical to that found in the Victorian Round I data [20].
Relationships between the instruments
There were over 15 000 occasions when a HoNOS and an LSP-16 were obtained on the same consumer, on the same date, at the same point (start or end) of an episode and with no missing ratings on either instrument. The highest correlations of subscales between the two instruments were the Behaviour subscale of the HoNOS with the Antisocial subscale of the LSP-16 (0.59) and the Social subscale of the HoNOS with the Withdrawal subscale of the LSP-16 (0.58). There were also very substantial correlations between Social (HoNOS) and both Self-care and Antisocial (both 0.52) and between Impairment (HoNOS) and Self-care (0.48). The correlation of 0.68 between the two total scores was very high, representing nearly 50% of shared variance.
The present correlations are remarkably close to those obtained from the Victorian data [20], especially the correlation of the two total scores. It is also noteworthy that a correlation of−0.65 between the total scores of the HoNOS and the full form of the LSP was found by Stedman et al. [21] in their field trial of outcome measures (their correlation was negative because the full LSP is scored in the opposite direction to the LSP-16).
Relationships with consumer characteristics
This section examines the relationships between assessment scores and age, sex, principal psychiatric diagnosis and legal status. The relationship between ethnicity and consumer outcomes will be addressed elsewhere.
Age
Correlations were calculated between consumer age and HoNOS subscale and total scores. All valid assessments (approximately 20 000) were used for this purpose. Using the five-subscale scores described in an earlier section, the correlations of age with the Behaviour, Impairment, Depression, Hallucinations/delusions and Social subscales and the total score were −0.17, 0.32, −0.08, −0.06, 0.00 and −0.01, respectively. Because of the large numbers of observations, even small correlations achieved statistical significance. Leaving aside the smallest correlations, it is apparent that on average younger consumers were rated higher on the Behaviour subscale and older consumers were rated higher on the Impairment subscale. The correlation between age and the total HoNOS score was very close to zero despite some significant correlations with the subscales. It appears that the combined effect of positively and negatively correlated items was to produce a total score that is unrelated to age.
Approximately 17 000 valid LSP-16 ratings were used in the correlations with age. The correlations of age with the Withdrawal, Self-care, Compliance and Antisocial subscales and the total score were 0.00, 0.15, −0.05, −0.08 and 0.03, respectively. Only the correlation with Self-care is of any real significance, indicating that on average older consumers were rated as having greater problems in this area. Correlations with individual item scores were all quite small (less than ±0.13) apart from item 16 (What sort of work is this person generally capable of?) where the correlation was 0.46, indicating that older consumers were rated as having greater difficulties in this respect.
Sex
For eight of the 12 HoNOS items, males scored significantly higher
(worse) than females. On three items (5, 7 and 8) females scored significantly worse than males and on one (item 2) there was very little difference. The differences in scores between men and women were significant for all the subscales with the exception of Impairment. Men were rated significantly higher than women on the Behaviour, Hallucinations/delusions and Social subscales, while women were rated higher than men on the Depression subscale. The mean total score (12.01) for males was significantly higher than that of females (10.35) (t(20 746) =15.2, p<0.001). However, in real terms, it is doubtful whether the average difference of 1.66 points on a scale with a range of 48 points is clinically significant.
For 15 of the 16 LSP-16 items, males scored significantly higher (worse) than females. The exception was item 9 (Maintenance of an adequate diet), on which females scored significantly worse thanmales. Despite being statistically significant, the actual differences were very small for all 16 items. Consistent with the majority of the item scores, males scored higher on all the subscale scores and the total score. For both sexes, Withdrawal was the most highly rated subscale and Antisocial the lowest.
Principal psychiatric diagnosis
Principal psychiatric diagnoses were coded according to the ICD-10 Classification of Mental and Behavioural Disorders ICD-10-AM [23]. The nine largest broad diagnostic classes are presented in Table 2, which shows the mean total HoNOS scores in inpatient and community settings.
Mean Health of the Nation Outcome Scales (HoNOS) and Life Skills Profile (LSP-16) total score by setting and diagnosis
For certain of the diagnostic groups (organic, schizophrenia and stress) there was little difference in the HoNOS total scores between inpatient and community settings. For others (mood, anxiety and eating), inpatient scores were higher (worse) than community scores and, for substance misuse and personality disorders, the community mean was higher than the inpatient mean. The highest HoNOS totals were associated with substance misuse, personality and organic disorders, and the lowest with anxiety, mood, eating, stress and obsessive–compulsive disorders.
Table 2 also shows the mean LSP-16 scores; in the inpatient setting, only the schizophrenia and mood disorders groups had sufficient numbers to report. The LSP-16 total scores formed two distinct groups. Organic, substance misuse, schizophrenia and personality disorders were associated with relatively high scores, while mood, anxiety, obsessive– compulsive, stress and eating disorders were associated with relatively low scores. However, there was almost no difference between the scores for schizophrenia and mood disorders in the inpatient setting.
Legal status
Of the over 4500 inpatient episodes with complete HoNOS and legal status data, 72% were involuntary, while of the over 14 500 community episodes, 19% were involuntary. In both inpatient and community settings, the mean scores on the Behaviour, Impairment, Hallucinations/ delusions and Social subscales were significantly higher in involuntary than voluntary episodes. The reverse was true of the Depression subscale, which was significantly higher in voluntary than involuntary episodes in both inpatient and community settings. Although the total scores of involuntary episodes were significantly higher than that of voluntary episodes in the community, the difference was not significant in inpatient settings, primarily because of the opposite effect of the Depression subscale.
Since 97% of episodeswith valid LSP-16 and legal status data were in the community, we only report the community findings. All differences between involuntary and voluntary were significant; for all subscales and the total score, the mean of the involuntary episodes was higher than the mean of the voluntary episodes.
Discussion
One crucial aspect of the performance of instruments in the field is whether and how they are completed. We draw a distinction between compliance (whether an instrument is administered in accordance with the data collection protocol) and completion (whether all of the constituent items are validly completed). Compliance rates for the HoNOS were higher than for the LSP-16, but the reverse was true for completion rates. Although over 90% of HoNOS and LSP-16 assessments were completed in full, certain items were problematic. HoNOS items 11 and 12 were the most often not completed, especially in inpatient settings, followed by item 8; this is similar to findings elsewhere [20]. These items have repeatedly been found to have low reliability [22],[24] and this has implications for collection protocols and for training. The problems with items 11 and 12 in inpatient settings may be related to a restricted focus on symptoms and have led some to use only the first 10 items [25]. However, clinicians appeared to have no particular problems completing the LSP-16 items. As outlined in the CAOS final report [11], possible reasons for low compliance and completion include: (i) few mechanisms to feed findings back to clinicians resulting in data collection fatigue over the course of the study; (ii) staff perceiving the additional data collection associated with the study as a burden and irrelevant to their work; and (iii) staff misperceptions of the reasons for the study, such as cost containment and service funding, resulting in lack of support for it.
The factor structure of the HoNOS is consistent with other work that indicated that the conventional foursubscale structure does not fit the obtained data well [18]. Better fit, and arguably better clinical meaning, with no loss of internal consistency, can be achieved through a five-subscale structure, albeit at the expense of some loss of parsimony (since five subscales are less parsimonious than four). Agreeing a subscale structure is important for a number of reasons, not least for the specification of the output of information systems. Although we believe the five-subscale structure is superior, there will be certain situations in which the 12 item scores are the most appropriate. Neither does a four-subscale structure fit the LSP-16 well, although there is no alternative currently available, nor did the data suggest an alternative. However, all four subscales show good internal consistency, as does the total scale. For many purposes, including clinical review and service planning, it is more useful to work with total and subscale results rather than with the individual scores.
The relationship between the HoNOS and the LSP-16 is strong, particularly in their total scores and this is consistent with findings elsewhere [20],[21]. The highest correlations are in those domains that one would expect to be closely related, that is, behavioural problems and antisocial behaviour, and social problems and withdrawal. However, the correlations are not strong enough to suggest that one instrument may substitute for the other – apart from the Social subscale of the HoNOS, the content and coverage of the two instruments are different.
Some HoNOS items were associated with younger age, while others were associated with older age, resulting in a total score that was unrelated to age. This is consistent with the findings of an earlier study [22]. It is doubtful if any differences between males and females were clinically significant. Three diagnostic clusters had higher HoNOS total scores – substance misuse, personality disorder and organic disorders; the same three conditions have previously been associated with high HoNOS scores [20]. In both inpatient and community settings, 718 ROUTINE OUTCOME MEASURES IN ADULT MENTAL HEALTH CARE involuntary status was associated with higher scores, which was not unexpected.
Only one LSP-16 item was correlated with age. Males had higher (worse) ratings overall; however, the actual difference in scores was very small for all 16 items. Consumers with organic, substance misuse, schizophrenia and personality disorderswere rated higher (worse) on the LSP-16 than consumers with mood, anxiety, obsessive– compulsive, stress and eating disorders. As with the HoNOS, involuntary legal status was consistently associated with higher scores.
New Zealand has now adopted the HoNOS for routine use but not the LSP-16. This decision was based on feedback that many participating clinicians and consumers found the LSP-16 to be inappropriate and unacceptable, notwithstanding its apparent ease of use. Given this decision, another measure of function is required, with both mainstream and mental health specific measures being possible contenders.
In conclusion, the findings support the utility and acceptability of the HoNOS in routine adult clinical care and provide further insight into its underlying structure. Future challenges include how to use outcome measure information to improve patient care, achieving agreement on an appropriate measure of disability and functioning in this consumer group and the introduction of complementary consumer and carer-rated measures.
Footnotes
Acknowledgements
The study on which this paper is based involved many people in many capacities. We are particularly grateful for the contributions of Phillipa Gaines, Alison Bower, Bill Buckingham, Philip Burgess and the many management and clinical staff in the participating District Health Boards.
