Abstract
Keywords
The importance of routine outcome assessment in mental health care has been increasingly recognized. It has been officially adopted in the UK [1], Australia and New Zealand [2,3] and is being implemented in many other countries [4], but there are still few services anywhere where it has been successfully achieved [5–9].
In the last years there has been a shift in emphasis from information collection towards the local and national use of mental health information with the objective to achieve better outcomes for mental health service users and increase the accountability of these services [10–12]. But how exactly to use outcome data to improve the quality of treatment of people with mental disorders has still to be clarified [13]. In the late 1990s some authors lamented the lack of effect that the results of mental health research had on decision making, wondering if one possible explanation could be that researchers failed in communicating in clear and understandable terms [14–17]. In the last ten years outcome research has made attempts to fill this gap but still many methodological pitfalls limit its capability to adequately inform clinicians and managers [13,18,19].
Routine data collection generates large real-world datasets. Outcome data, however, refer to widely heterogeneous populations treated by mental health services which, across different countries, are often differently structured and financed. In addition, methodological problems limit efforts to understand individual clinical changes in large patient samples [20].
Since people assessed at the same time point could be at different stages in their illnesses, one of the first distinctions to be made is among patients with different lengths of contact with mental health services. The length of contact is, in fact, a proxy for duration of illness which is known to negatively predict clinical outcome [21]. Tansella and colleagues used case-register data to study episodes of care of 1423 first ever psychiatric patients, finding that the probability of new episodes of care and their duration gradually increased with the number of episodes in the same subject [22]. In another study by the same group, patients with unplanned first contacts were found to be at risk for placing higher demands on mental health services [23].
Study aims
The aim of the study was to assess the feasibility of routinely collecting outcome data in everyday clinical settings in nine Italian community mental health services (CHMS). A secondary objective was to compare clinically significant change at six and 12 months in three groups of patients with differing lengths of contact with services and to identify predictors of individual change.
Method
Study design
A prevalence cohort of people attending the participating CMHSwas selected and a naturalistic two-wave observational design was utilized. Outcome data were collected as part of routine clinical care and for purposes of quality improvement, thus no specific informed consent was obtained.
Setting
In Italy, the shift from hospital-based services has been completed, and the psychiatric care model involves a community integrated network of inpatient and outpatient mental health facilities. CMHS are uniformly distributed across the country and are fairly comparable in terms of facilities, personnel resources, organizational features and provision of clinical interventions.
Nine CHMS, which are the main providers of mental health care in a total area of about 1 695 000 inhabitants, agreed to take part to the study. Five of them were in northern and central Italy: Busto Arsizio (catchment area 172 000 inhabitants), Saronno (134 000 inhabitants), ‘Niguarda 2’, Milan (70 000 inhabitants), Imola (130 000 inhabitants), Arezzo (337 000 inhabitants); and four in southern Italy, Campobasso (231 000 inhabitants), ‘Napoli 48’, Naples (100 000 inhabitants); ‘Napoli 50’, Naples (100 000 inhabitants), ‘Caserta 2’ (421 000 inhabitants).
Study population
Each individual who had contact with any CMHS setting during an index period of two weeks in 2003 was included. Basic socio-demographic data and ICD-10 clinician diagnoses were collected. The cohort was then examined at three time points: at recruitment (T0), six months (T1) and 12 months (T2). Every effort was made to achieve the highest possible follow-up rate without major changes in routine practice.
For the purposes of the present analysis, patients were classified on the basis of their history of contacts with mental health services. Thus, they were divided into three groups: (i) first time users (FTUs), people who came into contact with the service for the first time in 2003; (ii) short-term users (STUs), whose first contact with the service was in 2001 and 2002; (iii) long-term users (LTUs), whose first contact with the service was before 2001.
Measures
Health of the Nation Outcome Scales (HoNOS) were developed as a standardized assessment tool for routine use in mental health services [24]. Independent studies have evaluated its reliability [25], sub-scale structure [26], sensitivity to change [27] and appropriateness for routine clinical use in busy psychiatric services [28,29]. We adopted the Italian version of HoNOS as the only outcome measure [30]. All clinical staff (psychiatrists, clinical psychologists or nurses) were trained by the study team in the correct use of HoNOS through residential training workshops. Most HoNOS assessments were performed by nurses and clinical psychologists.
Statistical analysis
For the analyses of the present study we chose not to adopt a proper longitudinal approach [31]. Our aim, in fact, was to analyse clinical outcome in a way that could better inform professionals regarding the differential impact on patients of mental health programmes. Thus, to measure clinical change we opted for a two-wave design and the use of reliable change [17].
The Jacobson and Truax approach to measure reliable and clinically significant change (RCSC) has been widely used to identify meaningful individual clinical improvement in large groups of patients [20,32]. In a previous study we calculated RCSC parameters on the HoNOS score and applied them to two subsequent assessments of patients attending mental health services [33]. The identification of RCSC involves two parameters: reliable change (RC) and the cut-off of clinical significance (CS) [32]. We adopted internal consistency, Cronbach's coefficient alpha (α), as a parameter of the reliability of measures and calculated the RC index on the study cohort (n = 2059) taking into account HoNOS assessments at recruitment (T0). To calculate the CS cut-off, less disabled patients were chosen as the reference group and normative data were drawn directly from the study cohort [34]. For each subject initial severity was also assessed by a different rater (usually the treating psychiatrist) through the Clinical Global Impression Scale (CGI-S) [35]. Patients were divided into two groups and the cut-off separating mildly and moderately (CGI-S ratings of 1, 2, 3, 4) from severely ill (CGI-S ratings of 5, 6, 7) was calculated. Finally, the two criteria were combined, obtaining a five-level classification: reliable and clinically significant improvement, reliable improvement, stability, reliable deterioration and reliable and clinically significant deterioration.
Categorical variables were analysed using the chi-square statistic (χ2). Continuous non-parametric variables where analysed using the Mann-Whitney test. The outcome was calculated within each group using a paired sample t-test analysis. Standardized effect sizes were computed using Hedges’ g. The magnitude of change (i.e. mean difference between two observations) in different groups was compared using the Mann-Whitney test. Given the rather large sample size, group differences were tested at the more stringent significance level of p < 0.01.
Multivariate logistic regression analysis was conducted to identify potential causal relationships between socio-demographic or clinical variables at first assessment and RCSC at 12 months. Patients with an initial HoNOS score <10 were excluded from multivariate analyses as for mathematical reasons they could not fulfil RCSC criteria for improvement at follow up.
Two separate analyses were done, the first taking into account only the FTU group, the second the whole sample. Two models were tested for the prediction of clinical remission. Thus, we selected two sets of independent variables. Both models included two continuous variables (age and HoNOS score), four dichotomous variables (sex, employment, service setting and geographical area), and four categorical variables (education, marital status, living conditions and diagnosis). In the model for the whole sample we added a dichotomous variable indicating the history of the service contact: STU and LTU groups were collapsed and the resulting variable was coded 0 for ‘long term contact’ and 1 for ‘first contact’. Both models were then re-analysed adding a variable indicating whether there was reliable improvement at six months (coded as 0 ‘unimproved’ and 1 ‘improved’).
All analyses were done with SAS version 9.1.
Results
Participants
During the index period a total of 2128 patients were seen in the participating mental health services; the great majority of them (89.4%) were enrolled at CMHS. Of the eligible patients, 3.2% were lost to follow up. The resulting data set included 2059 patients, who constituted the study cohort. Most of the sample were LTUs (68.6%), whereas FTUs (14.2%) and STUs (17.2%) constituted two smaller, better-balanced groups. As shown in Table 1, there were some statistically significant differences between FTUs and LTUs. LTUs were older, less educated, were more often unmarried, and more frequently enrolled in northern Italy. FTUs and STUs showed only one statistically significant difference in terms of area of residence: STUs were more frequently enrolled in northern Italy. Finally, there were more missing values in the FTU group, but this is probably due to systematic errors in data collection.
Socio-demographic characteristics of the three study groups (n = 2059).
*Statistically significant differences; Mann-Whitney test (p = 0.01); comparisons were performed adopting FTUs as reference group.
As shown in Table 2, FTUs were more frequently enrolled in general hospital psychiatric units (GHPUs) than the other two groups. Compared to LTUs, FTUs were more likely to have major depression or an anxiety disorder, and less likely to have a schizophrenic or bipolar disorder. Compared to STUs, FTUs were less frequently diagnosed with a personality disorder. FTUs HoNOS ratings at recruitment were higher than STUs but lower than LTUs. Similarly, the proportion of FTUs classified as severe at CGI was mid way between STUs and LTUs, and only the comparison of FTUs and LTUs showed a statistically significant difference.
Clinical characteristics of the three study groups at first assessment (n = 2059)
†CMHCs, community mental health centres; DCCs, day care centres; GHPUs, general hospital psychiatric units. ‡HoNOS, Health of the Nation Outcome Scales. *Statistically significant differences; Mann-Whitney test (p < 0.01); comparisons were performed adopting FTUs as reference group.
Outcomes
The clinical change in the three groups at six and 12 months was compared. Table 3 shows the mean HoNOS scores at enrolment (T0), at six months (T1) and at 12 months (T2), with the mean differences and effect sizes.
Change in the HoNOS scores of the three study groups (n = 2059)
*Statistically significant difference; paired sample t-test (p < 0.01). ▪Statistically significant difference of the magnitude of change; Mann-Whitney test (p < 0.01); comparisons were performed adopting FTUs as reference group.
The outcome was better for FTUs than for longer term users, with the greatest improvement during the first six months after enrolment.
Within the first six months of observation (T0–T1), there were statistically significant drops in mean HoNOS ratings, with differences in effect size (Table 3). However, improvement decreased during the second period (T1–T2) for all the three groups. Only LTUs still showed a slight improvement which resulted in a statistically significant difference between six and 12 month ratings. However, they did not show a statistically significant difference in terms of magnitude of change as compared to FTUs (Table 3).
At 12 months (T0–T2) there were statistically significant reductions in mean HoNOS ratings for all three groups. However, the size of the changes varied. FTUs showed the greatest improvement, with an effect size of 0.6. STUs and LTUs had smaller improvements: 0.3 and 0.2. Statistically significant differences were found in terms of magnitude of change, FTUs showing greater improvements at both six and 12 months (Table 3).
FTU clinical outcome was also analysed in terms of single HoNOS items. Statistically significant reductions in mean scores between T0 and T2 were found for:
‘other mental/behaviour problems’ (mean Δ = 0.89, ES = 0.84);
‘overactive, aggressive, disruptive or agitated behaviour’ (mean Δ = 0.55, ES = 0.57);
‘depressed mood’ (mean Δ = 0.45, ES = 0.44);
‘hallucinations/delusions’ (mean Δ = 0.44, ES = 0.42);
‘self-injury’ (mean Δ = 0.25, ES = 0.42);
‘problems with relationship’ (mean Δ = 0.39, ES = 0.34);
‘problems with daily living activities’ (mean Δ = 0.28, ES = 0.27);
‘problems with occupation and activities’ (mean Δ = 0.25, ES = 0.27).
No statistically significant changes were evident for ‘alcohol/drug related problems’, ‘cognitive problems’, ‘physical illness/disability problems’ and ‘problems with living conditions’.
Individual longitudinal change
Reliable change (RC) parameters were computed on the basis of HoNOS ratings at first assessment (n = 2059). The RC index was 8 (as in previous work [33]), whereas CS cut-off was 10. Depending on the fulfilment of RC criteria at follow up (T2) patients were classified as ‘stable’, ‘improved’ or ‘worsened’. Among the FTUs, 76% were stable, 21.6% improved and 2.4% worsened at 12 months. In the STU and LTU groups there were higher proportions of stable patients (88.4% and 89.5% respectively), lower proportions of improved patients (8.5% and 7.3% respectively) and similar proportions of worsened patients (3.1% and 3.2% respectively). All comparisons of the FTU group and the other two showed statistically significant differences (p < 0.01). There were no statistically significant differences between STUs and LTUs.
Figure 1 shows longitudinal changes in individual HoNOS scores for the FTU group. The two diagonal lines indicate the upper and lower limits of reliable change. For cases falling inside this area a less than 8-point change in total score might be attributable to measurement error or other random effects. Subjects falling below or above this area had either reliable improvement or worsening. The horizontal and vertical dashed lines indicate the cut-off for clinical significance: a HoNOS score of at least 10. Thus, when a subject had a change in HoNOS score of at least 8 points and moved across the clinical cut-off this was considered a RCSC [20,32].

Reliable and clinically significant individual changes (RCSC) at 12 months (T2) for first contact users (FTUs) (n = 292).
Predictors of reliable and clinically significant improvement
For the present analysis, the sample comprised 1035 patients (50.3% of the total) with initial HoNOS scores ≥10, 135 were FTUs (46.2% of the FTU subsample).
Odds ratios were not statistically significant in any of the four models tested with regard to the socio-demographic characteristics shown in Table 1.
In the FTU group model, odds ratios were statistically significant for service setting, diagnosis and initial HoNOS rating. Initial admission to a GHPU and a diagnosis of affective disorders or of other mental disorders were positively associated with clinical improvement at 12 months. Baseline HoNOS ratings showed a positive incremental association. When added to the model, the variable indicating reliable improvement at six months showed the highest association with 12 month clinical improvement. After this adjustment, all the previous statistically significant effects persisted (Table 4).
Predictors of reliable and clinically significant improvement at 12 months (T2) (n = 1035)
NS, not significant. *Multivariate logistic regression analysis.
Applying the predictive model to the whole sample, statistically significant odds ratios were found for sex, baseline HoNOS ratings, service setting and history of service contact. There was a statistically significant positive association with 12 month clinical improvement for higher baseline HoNOS ratings, for baseline admissions to a GHPU and for FTUs, whereas male sex showed a negative association (Table 4).
The adjustment for six months improvement had a slight effect on odds ratios. Again, the variable indicating six months improvement was the best predictor of 12 months clinical improvement, and the adjustment did not cancel the positive effect of higher baseline HoNOS ratings, baseline admission in GHPU, and of FTUs. The only exception was the negative effect of gender which disappeared (Table 4).
Discussion
Key results
The clinical outcome of a large prevalence sample of patients receiving ordinary, routine mental health care was evaluated. The cohort was representative of the population presenting for treatment in nine well-defined Italian catchment areas and most of the patients were followed up at six and 12 months with only 3% attrition. This study has, therefore, demonstrated that it is feasible within a group of Italian CMHS to routinely collect outcome data and to use them to assess individual clinical change.
As expected, comparisons of STUs and LTUs did not find statistically significant differences and only the FTU group clearly differed. First contacts were slightly younger and better educated, and more frequently had a non-psychotic disorder. On the whole, they had a better one year outcome than longer term service users. Statistically significant differences in the magnitude of change were seen at group level, through statistical significance tests, and individual level, from the calculation of RCSC.
Reliable improvement at T1 showed the highest association with clinical improvement at T2 for both first contacts and longer term users. The RCSC graph (Figure 1) offers a valid tool to provide real-time feedback to clinicians and service managers. It is a versatile diagram on which various dimensions of individual clinical change can be illustrated.
Limitations
Since the importance of involving patients in their own health care is increasingly being recognized, the lack of a patient-reported assessment is certainly a limitation of the present report [9].
Some methodological issues on the identification of clinically significant change could be raised. First, HoNOS inter-rater reliability was not assessed and we are aware that even small levels of unreliability may represent an important source of bias in the evaluation of clinical outcome. RC index, however, is a quite conservative method to identify individual change and its application should overcome this problem. Second, the adoption of a sub-group of less disabled patients as reference group could be questionable. Normative data, in fact, were drawn directly from the study cohort and not from a general population sample. However, we used a different scale (CGI-S) to distinguish between mildly ill and severely ill. This may represent a convergent validation of HoNOS cut-off of clinical significance and can be considered a step further from the methods previously proposed [33,34].
Interpretation
Since drop-out rates in Italian psychiatric practice are reported to vary between 17% and 46% [36,37], the very small attrition at follow up observed in our study needs to be interpreted. First of all, this result must be acknowledged as a success of the study. Follow-up assessments, in fact, have involved an active search of all recruited subjects in order to achieve the highest possible follow-up rate. On the other hand, a selection bias in the way patients were recruited should be taken into account. In fact, we cannot exclude that priority was given to the patients who were most likely to be easily followed up.
About 18% of first contacts showed marked improvement at follow up, and remission was predicted on the basis of initial severity and reliable change at six months (T1). These same variables, however, also predicted remission in longer term patients, whose illness was significantly more severe at first assessment. What lies then behind the first contacts’ greater improvement?
Apart from initial severity, statistically significant differences between first contacts and longer term service users were seen in two clinical variables: diagnosis and setting of first assessment. First contacts were more frequently non-psychotic, and more frequently had anxiety, depressive or personality disorders. They were also assessed more frequently while in an acute psychiatric ward.
Being admitted to an acute ward at first assessment was a strong predictor of remission in the total sample and in FTUs. In first contacts, however, the predictive value of setting was not independent of the effect of reliable six month improvement. This could confirm the theory that the FTU group may contain a larger proportion of episodic forms. Consistently with this, moderate and large effect sizes in the FTU group were found for HoNOS items 1 and 8, referring respectively to ‘overactive, aggressive, disruptive or agitated behaviour’ and to ‘other mental and behavioural problems’, a miscellaneous item covering mainly neurotic symptoms.
In conclusion, FTUs’ greater improvement might be related to the fact that they had less persistent forms of mental illness, and were recruited and assessed when their symptoms were at their worst and the patients were in hospital. Their margin for improvement was wider, and thus the well-known regression to the mean phenomenon could have contributed to their clinically significant change [20,38].
On the other hand, the fact that patients with a longer history of service contact were more severe at inclusion and had a less favourable outcome is consistent with previous research that found duration of service contact as a valid criterion to identify persistence of illness in patients with current severe dysfunction [39].
Craig et al. found six month clinical status was predictive of 24 month outcome in first admission patients with schizophrenia [40]. Our study confirmed this in a larger and more representative sample of first contacts. Improvement at six months predicted remission at follow up and this relationship was not affected by diagnosis or other clinical characteristics. Our results, however, showed that this was also true for longer term users, and since they were at different stages in their illnesses when the six month evaluation was conducted we must deduce that reliable improvement at T1 did not reflect only early improvement.
Conclusions
First time users had a better outcome at one year and this may be due to the fact they had less persistent forms of mental illness. The fact that only 22% of first time and about 7% of longer term users achieved reliable improvement at one year suggests the need for an improvement of outcome monitoring. The HoNOS, being a relatively brief instrument, may not be sensitive to small changes that may well be important and valuable at the individual level. Moreover, the fact that the great majority of the sample appeared stable across time highlights limitations in using only clinician rated scales to capture clinical change. On the other hand, maintaining stability and preventing deterioration might also be considered good outcomes.
In a recent paper by Burgess et al. (2009) exploring alternative methods to evaluating effectiveness, the reliable change index resulted by far the most conservative approach compared with effect size and standard error of measurement statistics [41]. In light of the observed differences, the authors argued that in any routine outcome measurement exercise the degree of effectiveness demonstrated by services will depend on the specific statistical indicator used [41]. In the present report we propose the concurrent use of reliable change and effect size in order to judge outcome at both the individual and the group level.
Reliable improvement at T1 was the best independent predictor of clinically significant change at T2 for people at different stages of illness and could therefore be adopted as a useful clinical indicator for treatment planning.
The study results gave an expected and realistic picture of the one-year outcome of a representative sample of patients attending a group of Italian CMHS. RCSC proved to be a valuable tool to identify individual clinical change in a routine dataset and has potential utility in the programme or outcome evaluation domains as a means of communicating with policy makers, providers and the wider public.
Footnotes
Acknowledgements
We are especially grateful to patients and health professionals who participated. Moreover, we would like to thank the mental health directors of Niguarda Cà Granda Hospital (Milan), Busto Arsizio Hospital, Imola Local Sanitary Agency (ASL), Arezzo ASL 8, ‘Centro Molise’ ASL 3, Napoli ASL 1,and Caserta ASL 2. We thank Miss Baggott who helped with language editing of the manuscript.
