Abstract
Background:
Functioning is one of the key domains emphasised in the routine assessment of outcomes that has been occurring in specialised public sector mental health services across Australia since 2002, via the National Outcomes and Casemix Collection. For adult consumers (aged 18–64), the 16-item Life Skills Profile (LSP-16) has been the instrument of choice to measure functioning. However, review of the National Outcomes and Casemix Collection protocol has highlighted some limitations to the current approach to measuring functioning. A systematic review was conducted to identify, against a set of pre-determined criteria, the most suitable existing clinician-rated instruments for the routine measurement of functioning for adult consumers.
Method:
We used two existing reviews of functioning measures as our starting point and conducted a search of MEDLINE and PsycINFO to identify articles relating to additional clinician-rated instruments. We evaluated identified instruments using a hierarchical, criterion-based approach. The criteria were as follows: (1) is brief (<50 items) and simple to score, (2) is not made redundant by more recent instruments, (3) relevant version has been scientifically scrutinised, (4) considers functioning in a contemporary way and (5) demonstrates sound psychometric properties.
Results:
We identified 20 relevant instruments, 5 of which met our criteria: the LSP-16, the Health of the Nation Outcome Scales, the Illness Management and Recovery Scale–Clinician Version, the Multnomah Community Ability Scale and the Personal and Social Performance Scale.
Conclusion:
Further work is required to determine which, if any, of these instruments satisfy further criteria relating to their appropriateness for assessing functioning within relevant service contexts, acceptability to clinicians and consumers, and feasibility in routine practice. This should involve seeking stakeholders’ opinions (e.g. about the specific domains of functioning covered by each instrument and the language used in individual items) and testing completion rates in busy service settings.
Background
The International Classification of Functioning, Disability and Health (ICF) recognises functioning as an essential component of health and wellbeing (World Health Organization [WHO], 2001). The ICF emphasises functioning over disability, focusing on what people have the potential to do and actually do, irrespective of their mental (or physical) health conditions. The ICF stresses two key elements of functioning: ‘activity’ (the execution of tasks) and ‘participation’ (involvement in life situations) (WHO, 2001).
Functioning is one of the key domains that has been emphasised in the routine assessment of outcomes that has been occurring in specialised public sector mental health services across Australia since 2002, via the National Outcomes and Casemix Collection (NOCC) (Burgess et al., 2015). Under the NOCC protocol, various outcome measurement instruments are administered for all consumers at set points in their episode of care. For adults (aged 18–64) receiving care in non-admitted settings, the main instrument used to assess functioning to date has been the Life Skills Profile (LSP-16) (Buckingham et al., 1998a, 1998b). The Health of the Nation Outcome Scales (HoNOS) (Wing et al., 1998, 1999, 2000), which is primarily used to assess severity of symptoms, also contains a small number of items that relate to functioning. The LSP-16 and the HoNOS are both clinician-rated. More information about the full NOCC suite of instruments and the framework that guides their administration can be found elsewhere (Department of Health, 2015).
Measures of functioning are also important for casemix classification and funding purposes. With respect to the latter, in 2016, the Independent Hospital Pricing Authority (IHPA) released the Australian Mental Health Care Classification (AMHCC) Version 1.0, a national classification for mental health care (IHPA, 2016). It is based on available consumer-level clinical and treatment information, including information gathered from the instruments administered under the NOCC protocol. Of relevance, the AMHCC Version 1.0 classification uses LSP-16 scores as one indicator of case complexity for adult consumers in community settings.
In 2013, NOCC was reviewed by the National Mental Health Information Development Expert Advisory Panel (NMHIDEAP) which gathered information from a variety of sources, including multi-modality stakeholder consultations and analysis of NOCC data. Some specific issues were identified through those consultations regarding the use of the LSP-16 with adult consumers. These included that it is not strengths-based, it uses outdated language, the wording of some items is unclear and completion rates are lower than desired (NMHIDEAP, 2013). Notwithstanding these issues, experience with the LSP-16 over an almost 20-year period is invaluable in informing broader considerations in the measurement of functioning. For adults, the review recommended that the NOCC suite of instruments be rationalised and that a simple clinician-rated instrument be developed that assesses functioning and symptomatology and, potentially, other relevant domains. Such an instrument might take the form of a single existing instrument, or alternatively, it might be a composite of several instruments, but either way it should be brief.
Since the review, NMHIDEAP (2015) has proposed a ‘domain framework’ that should guide developments in the measurement of functioning. This emphasises personal recovery, social recovery and clinical recovery. NMHIDEAP has also suggested several options for how the new instrument should be developed: augmenting the HoNOS with a measure of functioning that replaces the LSP-16 and, if necessary, some additional clinically relevant items, or constructing a new instrument that is purpose-designed to cover all of the areas in the domain framework (again, this might have the HoNOS at its core).
We conducted the current systematic review in order to inform considerations about how functioning should be captured within the NOCC protocol. We did this as part of our role with the Australian Mental Health Outcomes and Classification Network (AMHOCN), which has been responsible for data management, training and service development, and analysis and reporting related to NOCC since 2003 (Burgess et al., 2012). Our starting point was two reviews of functioning measures that had been conducted for different purposes. One of these looked at instruments that might be used in community-managed organisations in Australia (AMHOCN and Community Mental Health Australia [CMHA], 2013), and the other considered instruments that might be used in clinical services in New Zealand (Lutchman et al., 2007; Waikato Evaluation Team, 2005). Once we had considered the instruments that were shortlisted in these reviews, we conducted our own systematic review of the academic literature. We sought to identify articles that had been published since the original reviews, as well as any articles that might have been missed by these reviews. Our review aimed to answer the following question: What are the most suitable existing clinician-rated instruments that might be used to routinely measure consumer functioning for adult consumers in Australian specialised public sector mental health services?
Method
The current systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines (Moher et al., 2009). We conducted an iterative search of MEDLINE and PsycINFO from their respective years of inception to April 2016 for journal articles that described relevant functioning instruments. In the first iteration, we searched titles and abstracts using the following search string: (‘mental’ OR ‘psychiatr*’) AND (‘social function*’ OR ‘personal function*’ OR ‘community function*’ OR ‘social abilit*’ OR ‘personal abilit*’ OR ‘community abilit*’ OR ‘social perform*’ OR ‘personal perform*’ OR ‘community perform*’ OR ‘occupation* function’ OR ‘occupation* perform*’ OR ‘community participat*’ OR ‘community involve*’ OR ‘work’ OR ‘leisure’ OR ‘educat*’ OR ‘personal relationship*’ OR ‘interpersonal relationship*’ OR ‘social inclusion’ OR ‘living skill*’ OR ‘life skill*’ OR ‘self-care’). In the second iteration, we searched titles only for the names of identified instruments, in order to ensure that we picked up as many relevant articles on each as possible. We also searched the reference lists of key review papers and articles on individual instruments. Our search was restricted to English-language articles.
At the title and abstract screening and full-text screening stages, we excluded articles that made reference to instruments that could not be readily rated by a clinician without recourse to other information (e.g. consumer-rated instruments, instruments that required structured or semi-structured interviews with consumers or other informants, instruments that involved a systematic extraction of information from case notes). We also excluded articles on instruments that were designed for use with non-adult populations or clinically defined sub-populations (e.g. instruments designed for use with children and adolescents or older persons, instruments designed for use with people with intellectual disabilities, instruments designed for use in forensic mental health settings). In addition, we excluded articles on instruments that assessed only a limited aspect of functioning (e.g. instruments that were exclusively about activities of daily living, instruments that focused only on work performance).
Once we had identified our pool of relevant articles, we assessed whether each of the given instruments they described might be candidates for routinely assessing changes in functioning of consumers in Australian public sector mental health services. We did this using a hierarchical, criterion-based approach based on one that we used for a previous review of recovery instruments (Burgess et al., 2011). Under this approach, we progressively excluded instruments from further consideration if they did not meet a specific criterion. The criteria were as follows:
Is brief (<50 items) and simple to score;
Is not made redundant by more recent instruments;
Relevant version has been scientifically scrutinised;
Considers functioning in a contemporary way;
Demonstrates sound psychometric properties.
For each instrument meeting the above criteria, we extracted and summarised information describing its purpose and structure and its psychometric properties. The psychometric properties considered were as follows:
Validity, or the extent to which the instrument measures what it intends to measure. Three types of validity were examined: construct validity, concurrent validity and predictive validity.
Reliability, or the extent to which the instrument gives stable, consistent results. Three aspects of reliability were considered: internal consistency, inter-rater reliability and test–retest reliability.
Sensitivity to change, or the extent to which, assuming the instrument is valid and reliable, it demonstrates the capacity to detect change over time.
Results
Overview of identified articles and instruments
An overview of the identified articles is provided in Figure 1. In total, our search identified 5907 journal articles. Removal of duplicates and screening titles and abstracts left 335 full-text articles, of which 81 were excluded when the full text was reviewed. The remaining 254 articles provided information about 20 clinician-rated instruments designed to measure functioning. Table 1 profiles the 20 instruments, describing them in terms of when and where they were developed, the domains they assess and their item structure.

Article selection.
Profile of clinician-rated instruments designed to assess functioning.
Refers to the date of published information on the original version of the instrument.
Hierarchical, criterion-based assessment of the instruments
Criterion 1: is brief (<50 items) and simple to score
Figure 2 shows that 13 of the 20 instruments meet the first criterion. The exceptions are the FACE Core Assessment, the Level of Functioning Scale (LFS), the Life Functioning Assessment Inventory (L-FAI), the Multi-Function Needs Assessment (MFNA), the Residential Competency Scale (RCS), the Social Adjustment Behavior Rating Scale (SABRS) and the Social Functioning Index (SFI). The L-FAI is complex to score because the domains it assesses are given status scores (reflecting general performance) and grade scores (reflecting more specific performance levels within the grade). The remaining exceptions range in length from 50 items (the FACE Core Assessment) to 134 items (the MFNA), making them unsuitable for use in routine outcome measurement. These instruments are excluded from further analysis.

Summary of instruments meeting criteria at each level of the hierarchy.
Criterion 2: is not made redundant by more recent instruments
Figure 2 shows that the majority of the remaining 13 instruments remain in contention when this criterion is examined. The two exceptions are the Global Assessment of Functioning (GAF) and the Social and Occupational Functioning Assessment Scale (SOFAS). The GAF was introduced in the revised version of the third edition of the Diagnostic and Statistical Manual of Mental Disorders, Revised (DSM-III-R) as a means of assessing ‘adaptive functioning’ (American Psychiatric Association [APA], 1987), but was eliminated from subsequent versions of the DSM because it was regarded as being inadequate for assessing a construct like functioning that may be volatile and may not operate independently of symptomatology and because of the training required for it to be used appropriately (Suzuki et al., 2015). The GAF was replaced by the SOFAS, on the grounds that the SOFAS assessed social and occupational functioning independently of symptom severity (Hendryx et al., 2001). In turn, the SOFAS has been superseded by the Personal and Social Performance Scale (PSP), which demonstrates stronger psychometric performance (Morosini et al., 2000). This sequence of instrument development and replacement led us to eliminate the GAF and the SOFAS from further consideration.
Criterion 3: relevant version has been scientifically scrutinised
We considered whether the relevant version of each of the remaining 11 instruments had been subjected to scientific scrutiny. To satisfy this criterion, the given instrument had to have been assessed by investigators who were independent of the original instrument developers, and the results of that assessment had to have been published in the peer-reviewed literature. It should be noted that the LSP-16 is included among these instruments. The LSP-16 is a short version of its parent instrument, the LSP-39 (Rosen et al., 1989). We focused on the LSP-16 as the ‘relevant’ instrument because this is the version of the instrument that is in current use in specialised public sector mental health services in Australia. Reference is made to studies scrutinising the LSP-39, as appropriate, however. Figure 2 indicates that six instruments satisfied this criterion. Those which have not been subjected to scientific scrutiny are the Disability Rating Form (DRF), the Mini-ICF-APP, the Need of Support and Service Questionnaire (NSSQ), the Profile of Community Psychiatry Clients (PCPC) and the Uniform Client Data Instrument (UCDI). These were excluded from further examination.
Criterion 4: considers functioning in a contemporary way
We evaluated whether the remaining six instruments consider functioning in a contemporary way. Figure 2 shows that we removed the Rehabilitation Evaluation Hall and Baker (REHAB) at this point. This instrument was developed in 1984, in the era of deinstitutionalisation, and was designed for use with residents of long-term psychiatric facilities who were being relocated to community residential support settings. It takes a limited view of functioning and not one that recognises the capacity of people with mental illness to lead contributing lives. It primarily deals with activities of daily living and only includes relatively few items on other aspects of functioning. Most of these are framed negatively, falling into the ‘deviant behaviours’ subscale of the instrument.
Criterion 5: demonstrates sound psychometric properties
Table 2 summarises the psychometric properties of the five remaining instruments. All five have been subject to independent psychometric testing by investigators other than the original developers. Figure 2 shows that all five have relatively sound psychometric properties, although some caveats are worth noting here. For example, the HoNOS has been extensively examined in its entirety, but less attention has been paid to the social subscale which contains the four items (Items 9–12) that relate to functioning that are relevant here. When this subscale and its component items have been assessed, they have sometimes performed less well than other elements of the instrument (particularly Items 11 and 12, which relate to living conditions and occupation and activities, where functioning is not independent of opportunities). The LSP-16 has undergone more limited psychometric testing, however, and most of the information on its psychometric properties comes from assessments of its parent instrument, the LSP-39. The Illness Management and Recovery Scale–Clinician Version (IMRS-C) has also undergone limited testing; further information on its inter-rater reliability and sensitivity to change would be desirable. Across all instruments, some consistent gaps were evident. Notably, we found only one or two studies examining the predictive validity for each of the IMRS-C, Multnomah Community Ability Scale (MCAS), PSP, HoNOS social subscale and LSP-16 (as opposed to the LSP-39). Moreover, the measures used to establish predictive validity for each instrument were diverse, making it difficult to compare their relative performance. Information about sensitivity to change was also limited – being absent for the LSP-16 (as opposed to the LSP-39) and the MCAS, and available from only two studies for the IMRS-C, both conducted within a single programme context.
Psychometric properties of instruments meeting Criteria 1–5.
More detail on the psychometric properties of the HoNOS can be found elsewhere (Pirkis et al., 2005). The information presented in this table relates primarily to the HoNOS items that are concerned with functioning (Items 9–12).
The level of reliability of an instrument is traditionally measured by a kappa value. Kappas of ≤0.20 are regarded as poor, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as good and ≥0.81 as very good.
Some information on the LSP-16 is drawn from studies using the 20-item LSP-20 which includes all LSP-16 items.
Discussion
We used a hierarchical, criterion-based approach to identify candidate instruments for measuring functioning among adult consumers of specialised public sector mental health services. By the end of the elimination process, we had reduced 20 potential instruments to 5: the HoNOS, the IMRS-C, the LSP-16, the MCAS and the PSP. The HoNOS, the MCAS and the PSP were all shortlisted in the two previous reviews that we drew upon, and the LSP-16 was shortlisted in the Australian review (AMHOCN and CMHA, 2013) but not the New Zealand one (Lutchman et al., 2007; Waikato Evaluation Team, 2005). The IMRS-C was not identified in either of these previous reviews, so it did not feature in their shortlists.
The current review is a first step in further developing the measurement of functioning. All five of the above instruments are recommended for consideration as clinician-rated instruments that might be used to routinely measure adult consumers’ functioning in Australian mental health services. However, further work is required to consider the appropriateness of the candidate instruments for assessing functioning in relevant service contexts, their acceptability to clinicians and consumers, and the feasibility of using them in routine practice. The consideration process should be systematic and structured. It should involve seeking stakeholders’ opinions about, for example, the specific domains of functioning covered by each instrument and the language used in individual items. Ideally, the process should also involve some real-world testing of clinicians’ completion of the instruments in specialised community mental health settings. Completion rates for the most recent available year (2014–2015), for the two instruments that are already part of the NOCC suite, showed that the HoNOS was completed at 83% of review/discharge collection occasions, and the LSP-16 was competed at 71% (NMHIDEAP, 2013). Similar field testing of the other three instruments is desirable.
A key consideration in terms of appropriateness relates to the capacity of each of the instruments to measure outcomes meaningfully within relevant service contexts. Our review examined a range of psychometric properties, including sensitivity to change, and all of the instruments performed reasonably well on at least several of these. Further testing is needed, however, to address gaps regarding predictive validity and sensitivity to change, which presently limit the extent to which conclusions can be drawn about the way the instruments work across the domains of functioning they are intended to measure and the contexts in which they would be implemented, and to compare their relative performance. There has also been increasing discussion in the literature regarding the distinction between reflective and formative indicators and the selection of appropriate measurement models for each (Bollen and Bauldry, 2011). Future investigations could consider these issues in relation to the measurement of functioning.
Other factors need to be taken into account too, however. For instance, in specifying the time period covered by the instruments, it is necessary to ensure that no two rating periods overlap. The PSP asks about the consumer’s general functioning, without specifying time period, so this is not an issue for this instrument. The HoNOS covers the previous 2 weeks, which is sufficiently short that the issue of potentially overlapping assessment periods is minimised in most cases. The LSP-16 covers the last 3 months, as does the IMRS-C. The MCAS has rating periods of 3 months and 1 year, depending on the specific item. Consideration might be given to exploring whether these instruments can be modified to cover shorter time periods. Precedents exist for these sorts of modifications; an alternative version of the MCAS exists which has a rating period of 1 month (Dickerson et al., 2003). Any such modifications would need to be tested.
The five shortlisted instruments each have items that address ‘activities’ and ‘participation’, identified as core elements of functioning in the ICF. In part, this is because our initial exclusion criteria meant that instruments that only measured activities (or, more specifically, activities of daily living) were discarded before they reached the point of review. The various instruments placed differing emphasis on these two elements, however, and included divergent domains within each of them. The HoNOS assesses relationships, activities of daily living, living conditions and occupation and activities. The IMRS-C covers recovery, management and biology. The LSP-16 focuses on withdrawal, self-care, compliance and anti-social behaviour. The MCAS considers interference with functioning, adjustment to living, social competence and behavioural problems. The PSP provides a rating that is based on socially useful activities, personal and social relationships, self-care and disturbing and aggressive behaviours. When the appropriateness, acceptability and feasibility of the five instruments are explored, consideration should be given to stakeholders’ beliefs about the precise domains that should be assessed and the relative emphasis that should be placed on ‘activities’ and ‘participation’. Future work could also build upon the current review by evaluating the psychometric properties of the identified instruments separately in relation to the measurement of ‘activities’ and ‘participation’.
The scope of the current review was restricted to clinician-rated measures of functioning for adult consumers that could be used in as part of the new instrument that was recommended by NMHIDEAP (2013). This meant that we excluded consumer-rated instruments and clinician-rated instruments that sought information via consumer interviews or case note reviews, doing so at the stage of screening the abstracts and full text of identified journal articles. More than 80 additional, but out-of-scope, instruments designed to measure functioning were eliminated at this pre-review stage. Some of these instruments undoubtedly have merit. For example, the Camberwell Assessment of Need Short Appraisal Schedule (CANSAS) (Phelan et al., 1995) is popular and has sound psychometric properties, but was excluded because it involves a structured interview in which clinician, consumer and carer views of need can be recorded separately. If the examination of appropriateness, acceptability and feasibility of the five instruments does not yield positive findings, then consideration might be given to broadening the search criteria and identifying additional instruments (albeit ones that might need to be modified to be fit for purpose).
Decisions about whether or not to use one of the five identified instruments – or to seek alternatives – should not be made in isolation. The clinician-rated instruments in the current NOCC suite are complemented by various consumer-rated instruments. At present, these primarily relate to levels of distress and other psychological symptoms, but there is an appetite for broadening these to include constructs like social inclusion and recovery. AMHOCN has reviewed existing recovery and social inclusion instruments (Burgess et al., 2011; Coombs et al., 2013) and has developed and trialled a new social inclusion instrument (the Living in the Community Questionnaire) (AMHOCN, 2015). There is an argument that these constructs are closely related to functioning, particularly the ‘participation’ element of functioning. There is also an argument that whereas a consumer’s level of functioning can be assessed by either a clinician or by the consumer himself or herself, social inclusion and recovery are more appropriately measured by the consumer because of their experiential nature. Consideration should be given to how the selected clinician-rated measure of functioning complements proposed consumer-rated social inclusion and recovery instruments.
Identifying an appropriate clinician-rated functioning instrument should not stop with adult consumers. The current review excluded instruments that were designed for specific populations, including children and adolescents and older people. Norms around functioning are clearly age-related to some extent, so it makes sense that functioning instruments that have utility for adult consumers may not do so for younger and older consumers. For younger consumers, levels of maturity will impact functioning. For older consumers, physical and cognitive abilities may play a role. Age-specific functioning instruments are required for these groups, and we would recommend a similar process for identifying them.
We acknowledge that our review had some limitations. Despite our best efforts, we may have missed some relevant and potentially useful instruments designed to assess functioning (e.g. if our search terms did not pick up articles related to them or if these articles were not indexed in the two academic databases we used). Also, we may have missed some articles relating to the instruments we did identify, so our examination of the psychometric properties of the final five may not have been exhaustive. In addition, the articles we did retrieve did not always provide optimal detail on the instruments they described (particularly with respect to the specific items on these instruments), so it is possible that we misinterpreted information about some of them. Finally, we cannot rule out possible publication bias. Studies showing that an instrument has good psychometric properties are more likely to be published than studies that do not. Having said that, our Criterion 3 required that included instruments had to have been scrutinised by investigators who were independent of the original instrument developers; this should have increased the extent to which the assembled evidence base included studies by investigators who did not have a vested interest in showing that a given instrument has sound psychometric properties.
These limitations aside, we believe that the current review can help to inform decisions about which clinician-rated instruments hold promise for assessing whether functioning improves, deteriorates or does not change for adult consumers of Australian mental health services. Further work is required to determine which, if any, of these instruments satisfy further criteria relating to appropriateness, acceptability and feasibility.
Footnotes
Acknowledgements
The authors would like to acknowledge feedback from members of the National Mental Health Information Development Expert Advisory Panel (NMHIDEAP) on a previous version of this systematic review.
Declaration of Conflicting Interests
The authors declare no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Australian Mental Health Outcomes and Classification Network (AMHOCN) is funded by the Australian Government Department of Health.
