Abstract

The Fourth National Mental Health Plan [1] identified five priority areas for outcome and action, one of which is ‘Accountability – measuring and reporting progress’. The desired outcome is that: ‘The public is able to make informed judgements about the extent of mental health reform in Australia, including the progress of the Fourth Plan, and has confidence in the information available to make these judgements. Consumers and carers have access to information about the performance of services responsible for their care across the range of health quality domains and are able to compare these to national benchmarks.’ and the corresponding indicator for this outcome is ‘The proportion of services publicly reporting performance data’ (p. 81).
On the same page it is noted that such reporting is not yet the norm, and there are no relevant data sets available at present. It is recognized that ‘Consideration will need to be given to systematic means of monitoring progress against this indicator.’ Given the relative underdevelopment of this area, and the fact that all mental health jurisdictions are now committed to the implementation of the Plan, it is timely to consider some of the issues concerning the public reporting of health performance data, especially as it relates to mental health.
Our coverage begins with a consideration of the rationale for public reporting, followed by a consideration of its effects on different groups of stakeholders. We then examine some of the measurement issues involved, and outline what is known about what to report, and how to report it. The currently small literature on public reporting in mental health is reviewed. We conclude with a broad summary of what the literature appears to be telling us, and some of its implications on how work on this priority area should proceed. Most of the relevant literature is from overseas, hence the strongly international flavour.
Rationale
Most rationales for public performance reporting reduce to an instance or variant of the two proposed by Berwick et al. [2]: selection, whereby a ‘user’ of healthcare (e.g. patient, purchaser) can compare what is available and make a value-based choice, and change, whereby providers (e.g. clinicians, local managers) use comparative institutional data to improve the quality of their services. Another potential use of public performance reporting is for regulators and politicians to demonstrate public accountability [3,4], and this is consistent with the wording in the Plan. In relation to selection of services by the general public, it has been suggested that public reporting makes more sense in a competitive, market environment than in a public system where consumers have limited or no capacity to ‘shop around’ [5].
In addition to these reasons, Pidd [6] has put forward the idea of public reporting in public services as a tool of external control. That is, the reported measures and indicators can be used as a basis for a system of incentives and disincentives, thus providing external interests a degree of control over services, such as health and education, which are historically delivered at the discretion of professional providers. Propper and Wilson also note ‘a general shift in the use of information on performance away from primarily being used for internal management control purposes towards use of these data for external accountability and control’ [5, pp. 254–255].
Most commentators agree that performance data should be collected and used; ‘The question…is not whether these data should be collected, but whether they should be made public’ [7, p. 267]. There is no doubt that non-public feedback to organizations can have a marked beneficial effect. Merle et al. [8] investigated the quality of care in three hospitals, using indicators selected by professionals. After sharing each other's information, there were significant improvements in most of the areas reported. Williams et al. [9] tracked the performance over a two-year period of over 3,000 accredited hospitals on 18 standard indicators of quality. All participating hospitals received quarterly feedback in the form of comparative reports throughout the study. They found significant improvements on 15 of 18 measures, and no measure showed a significant deterioration. A few studies have examined whether there is a difference between providing comparative performance results confidentially or publicly. Guru et al. [10] compared cardiac surgery mortality rates over three periods: no reporting, confidential reporting, and public reporting. Rates improved significantly between the first two periods but not between the second and the third. Such findings are compatible with the suggestion of Bird et al. that ‘“Naming” is not a pre-requisite for public accountability and may have dis-benefits besides its apparent attractiveness in promoting public choice’ [3, p. 23].
Effects of public performance reporting
It is convenient to consider the effects of public performance reporting through the potential impacts on the different parties that stand to gain (or maybe lose) from it.
Health managers and executives
Some studies have examined the perceptions and reactions of health managers and executives. Davies [11] conducted interviews with senior individuals from six hospitals in the USA. His interviewees were generally antipathetic towards publicly released comparative data, with concerns revolving around validity of the data and distortions of clinical priorities. By contrast, the executives interviewed by Barr et al. [12] were generally favourable, seeing the practice as supporting quality improvement initiatives. Goldman et al. [13] interviewed executives of Californian safety-net hospitals (SNHs, which serve Medicaid, uninsured or underinsured patients, and underserved areas) on their concerns about public reporting and pay-for-performance. Most said they used data gathered in performance reports to improve quality, and affirmed a long-standing commitment to quality improvement. Alongside these views there were significant concerns, which included lack of resources (staff, training and technology) to capture and process the data, and difficulties in gaining physician buy-in. Several thought that the free-market economic model on which public reporting and pay-for-performance is based is not pertinent to SNHs, which are often effectively the only provider for the poor and underprivileged. Other concerns revolved around hospitals being judged on case-mix models that made no allowance for the unfavourable circumstances that their patients were admitted from or discharged to.
Health provider organizations
Other studies have looked at the effect on health provider organizations. Most of them have found that publicly reported performance measures stimulated activity at the organizational level. Marshall et al. [4] concluded that hospitals respond to the publication of comparative performance data with internal changes, especially in a competitive environment. Fung et al. [14] and Shekelle et al. [15] both summarized work since the Marshall et al. review in 2000, and both found good evidence that public reporting stimulated quality improvement activity. Even more recently, Shekelle [16], commenting on the selection and change pathways (see above) that underpin public reporting, thought that there was little evidence of effects for the former but quite good evidence for the latter. Whether this increased activity translates into improved quality is an open question; Heckman et al. [17] highlighted the distinction between focused activity and productive activity.
Direct health providers
The evidence of effects on direct care providers relates mainly to physicians and surgeons. Marshall et al. [4] summarized evidence to 1999 that physicians were ‘interested in report cards but were sceptical about the validity of current examples and were unwilling to use them in practice, either in terms of sharing the information with patients or using the data to influence their own referral patterns’ (p. 57). Ringel [18], writing on behalf of practising neurologists, raised a number of issues that are equally relevant in mental health. Among his concerns was the common observation that many patients do not do as well as they might on account of poor compliance with treatments known to be effective. He also questioned whether the reliability of the measures used was adequate to assess the often highly idiosyncratic clinical trajectories of patients who may well share a common principal diagnosis.
Casalino et al. [19] conducted a national survey with physicians seeking their views on pay-for-performance and public reporting. Asked ‘If accurate, measures of the quality of individual physicians’ performance should be made public’, 68% disagreed (35% strongly), and to the question ‘If accurate, measures of the quality of individual medical groups’ performance should be made public’, 55% disagreed (29% strongly). Responses to other questions revealed that over 80% felt that present measures of quality did not adequately adjust for medical condition or socio-economic status, and that measuring quality would divert physicians’ attention from important types of care for which quality is not measured, and may lead physicians to avoid high-risk patients. In a survey of surgeons, Neuman et al. [20] found that while most (80–90%) felt that a national, surgeon-developed, risk-adjusted system of outcome assessment would improve quality of care and identify areas for improvement, less than half (45%) thought these data should be available publicly. In the Australian context, Jacobs and McDaid have noted ‘Initially, the majority of clinicians have perceived the Australian government's primary objective for introducing the [clinical outcome] measures to be financial management rather than to ensure the quality of services’ [21, p. 434].
Consumers and the general public
Marshall et al. summarized the evidence to 1999 as showing that ‘the currently available performance data had minimal impact on consumer choice’ [4, p.64], and more recent work [22,23] suggests that this situation has not changed. Marshall et al. observed that it was unclear whether this lack of effect was on account of access, comprehension or motivational factors. Schneider and Lieberman [24] offered some reasons: despite much experimentation with different formats for public reports, most report cards are not useful documents; they tend to overwhelm consumers with too much information and leave it to the reader to figure out what the data mean. Just as importantly, the data typically presented in such reports are not what consumers often regard as indicators of quality. Another study (described in Canto [25]) showed that the public is more likely to rely on recommendations of friends, family members, and coworkers or from health professionals they know than on standard quality indicators; respondents indicated that they would rather choose a surgeon they had seen before but who was not well rated (50%) than a surgeon whom they had not seen before (38%). Hibbard [26] has suggested three possible reasons for the repeated finding that consumers show little interest in the information on healthcare quality that is made available to them: (i) consumers are largely unaware of the widespread problems in healthcare quality. Surveys have shown that the public believes that the technical quality of care is uniformly high, and that hospitals do not differ much in safety or quality; (ii) the public define ‘good quality’ very differently from experts and industry leaders. When asked, consumers mostly mention access, cost, choice of doctor, and doctor qualifications; and (iii) to understand most public reports involves processing a large volume of information, weighing some factors more than others, and bringing all the factors together into a conclusion, all of which is cognitively burdensome.
Unintended consequences
Some of the nervousness about public performance reporting appears to relate to the possibility that, as well as beneficial effects considered earlier, there may be unintended consequences that may actually be harmful to service quality. The classic work here is that of Smith [27]; his list appears in Table 1 (the descriptors are taken from Pidd [6]).
While most of the discourse about unintended consequences relates to negative consequences, it is also recognized that some consequences could be unintended yet positive. That is, the stimulation of quality improvement activities, and generally heightened consciousness of good practice, could lead to improvements in areas not covered by the performance indicators that are being reported. Werner et al. [28] examined the public reporting of nursing home performance on the Centers for Medicare & Medicaid Services Nursing Home Compare website and concluded that it was associated with improved performance on both reported and unreported measures. In a related study, the same research group [29] compared improvement rates on indicators that were and were not publicly reported. After the introduction of public reporting they found improvements in two of three publicly reported indicators, but no improvement, and in some cases deterioration, in another indicator that was not publicly reported. Ganz et al. [30] conducted a more rigorous test of this idea. In a controlled trial, some medical practices received a specific intervention to improve certain problems in elderly patients, while other practices received a control condition. The point of the study, however, was to see whether there was any improvement in indicators that were not targeted. The intervention appeared to have an impact, in that two of the three targeted indicators improved, but there was no change in intervention and control practices in the non-targeted indicators. This result confirms the findings of most others that institutional behaviour often improves in relation to indicators that are targeted for reporting, but further suggests that this improvement may not generalize to areas that are not reported.
Pidd [6] has made the point that Smith's list of unintended consequences is not an argument against performance measurement per se, but rather an indictment of clumsy or thoughtless implementation. He cited ‘Goodhart's Law’, which is that, ‘any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes’ [31], a corollary of which is ‘A risk model breaks down when used for regulatory purposes’ [32].
As well as warnings about comparatively short-term unintended consequences, some have written about possible long-term corrosive effects. Lester and Roland [33] fear that public reporting with an emphasis on control may erode the very medical professionalism that enabled quality improvement initiatives to flourish in the first place, by shifting the balance too far away from trust [34,35].
Measurement issues
Risk adjustment
There is unanimity that data should be presented so that comparisons are fair (comparing ‘like with like’). But this is easier said than done; risk adjustment is quite technical, with tricky decisions having to be made about what to adjust for and what not, how to make the adjustment, and then there is the challenge of how to present adjusted performance results in a way that the average interested person can understand.
Thomas drew attention to the distinction between statistical and attributional validity, the latter being ‘the degree to which changes in outcomes can be attributed to the care being evaluated’, which is the key aspect ‘in the context of using risk-adjusted measures to motivate practice changes or to monitor provider performance’ (p. 182) [36].
Jacobs & McDaid, writing on performance measurement in mental health services, pointed out that controlling for case-mix in mental health is particularly challenging because they are frequently based on diagnosis, which has been shown to be a fairly poor predictor of service use [21]. Also, they note that while there has been ‘a lot of work on the risk adjustment of outcomes for specific interventions in mental health and some on risk adjustment for the development of payment systems…there has been very little work on the risk adjustment of indicators for the purpose of comparing the performance of multiple providers’ [21, p.452]. One exception is a study by Dow et al. [37] who risk-adjusted two outcome measures (global rating of functioning and a consumer satisfaction measure) using data on 7,000 individuals over a three-year period from 24 state-funded providers in Florida. There was significant variation between providers on the two outcome measures but the risk adjustment had a fairly small impact on their overall rank ordering. Nevertheless, it had a major effect for a few specific providers, particularly those with small caseloads. This sensitivity to method was also found by Hendryx and Teague, who compared different risk adjustment models, also in the field of mental health. They found that clinical data (ratings by clinicians and consumers) had a sizeable additional effect over purely administrative data, and that ‘the particular form of the model will lead to different conclusions about comparative treatment agency performance’ (p. 247) [38].
Li et al. [39] expressed the need for risk adjustment by virtue of the broad array of consumer characteristics that can affect outcomes, which may not be randomly distributed among facilities. They explained how the choice of statistical methodology may affect the quality rankings. In particular, they contended that classical regression may not be appropriate for data in which patients’ characteristics are ‘clustered’ within facilities, because a basic assumption of classical regression is that all observations in a dataset are independent, and this is hardly ever the case, since patients within the same facility will receive similar levels of care. ‘Ignoring the “clustering” of patients due to “quality” or other factors may invalidate the empirical risk-adjustment model and lead to incorrect quality estimates’ (p.84). The point about clustering of observations within agencies has been echoed by Shahian and Edwards [40], who advocated the use of data analytic techniques that explicitly account for such clustering.
Shahian and Normand [41] said that, despite their increasingly widespread use, considerable confusion exists among consumers, the media, payers, and providers as to the correct meaning and interpretation of risk-adjusted outcomes, incorrectly believing that risk adjustment ‘levels the playing field’ to permit direct comparison of one provider with another (pp. 1955–1956). They showed that while it is usually valid to compare the performance of one hospital with the average of its peers, it may not be valid to compare a given pair of hospitals, because their profile of risk factors (i.e. case-mix of patients) may be quite different. That is, the case-mix-adjusted performance of a hospital is a valid indication of its performance with the kinds of patients that it treats, but between-hospital comparisons on the same outcomes are only valid for the kinds of patients that are treated in both hospitals.
Shahian et al. [42] drew attention to what is reasonable and what is not reasonable to risk-adjust for. They question whether one should adjust for such fixed organizational characteristics as size, census division, and teaching status when comparing hospitals, since that has ‘the potential to adjust away important performance differences’. Romano [43] delved deeper into this issue. Citing the findings of Austin et al. [44], who found that fewer hospitals were outliers when judged against similar (e.g. non-teaching) hospitals than when judged against all hospitals in their locality, he pointed out that ‘This finding is not surprising; it is intuitively obvious that comparing hospitals with peer institutions will lead to a smaller group of outliers, on average, than comparing the same hospitals to all institutions in a state or province’ (p. 921). On this basis, he went on to recommend that true differences in performance between (e.g. teaching and non-teaching) hospitals should not be obscured by artificially comparing only ‘like with like’. Indeed, it is practically inevitable that there will be fewer outliers when comparing ‘like with like’ because they are necessarily more homogeneous. According to Romano, ‘the only defensible peer group for a public reporting program is geographically defined to include hospitals that compete (or potentially compete) in the same market’ (p. 922). Romano further pointed to the difficulty of deciding what constitutes a relevant peer group of provider organizations – who makes the decision, and on what basis? There is a ‘slippery slope’ involving an ever-narrowing definition of what ‘similar’ means, until ultimately each provider organization can claim, with some justification, to be unique.
Data quality
Most of the work from surgery and education has benefited from quite high quality data. For example, there is little risk of mortality being misreported, and school exam marks are also quite reliable. In addition, both death and school test results have close to 100% reporting. It is not at all clear that change scores on mental health outcome measures, as opposed to process measures (like length of stay, or compliance with standards) are anything like as robust. One study reviewing data quality in the UK, USA and Australia [45] referred to the challenges in developing meaningful sets of national indicators of health system performance, and the general paucity of accurate and accessible clinical data. This is echoed by opinion from Queensland that administrative hospital data are often inaccurate and incomplete [46].
In what they describe as the largest multi-facility study undertaken to date, Mor et al. [47] studied the reliability of nurses’ ratings of nursing home residents’ health problems. While average reliability levels were good, substantial inter-facility variations were found, disagreements between raters were non-random, and subsequent analyses [48] revealed that directional bias in the data could have resulted in significant differences in the relative rankings of the facilities. Sangl et al. [49] obtained similar results. Goldstein and Speigelhalter [50] have noted that ‘No amount of fancy statistical footwork will overcome basic inadequacies in either the appropriateness or the integrity of the data collected’, a point echoed by Shahian et al. [51].
Should indicators be relative or absolute?
Mor [52] observed that ‘establishing minimums as measured by particular quality measures may not be appropriate in all cases, since many areas of performance have no evidence-based standards that could determine a minimum [and] Conversely, relying on only empirically based benchmarks (e.g. below the median) may ‘institutionalize the poor performance of providers operating at the median’ (p. 341). The point here is that if performance benchmarks are continually adjusted to represent the current situation, it could appear that no overall improvement is occurring. For example, if there is a consistent systemwide 5% annual improvement, a service that is median will remain median year on year, and a misleading and possibly demoralizing impression conveyed that it hasn't changed. This would be an argument for using absolute anchors, whereby an increasing proportion of services will meet the absolute standard over time, and systemwide improvement can be demonstrated.
Single or multiple measures
Some attention has been paid to the question of whether a single, composite index of performance should be cited, or whether the various subdomains of performance should be cited separately, with no attempt at aggregation or summarization. Many favour composite indicators, perhaps for their apparent simplicity and presumed robustness [22,53,54]. Others have warned of problematic features of composite measures [55, pp. 264–265]. Bird et al. [3, p. 18] argued that amalgamation into a summary index can seriously distort the performance assessment and pointed out that there are value judgements implicit in weighting the components, which may differ legitimately between stakeholders; as Pidd [6] has asked: who determines these weights and what values should they take?
Fong et al. [56] compared quality indicators from four databases and found little pairwise correlation among them, leading them to recommend assessing performance across multiple measures. O’Brien et al. [57] compared eleven different ways of combining nationally endorsed cardiac surgery process and outcome measures into summary measures of performance. Although the various methods produced results that were highly inter-correlated, sensitivity analyses showed that up to about 10% of organizations displayed dramatic changes in the overall rankings depending on the method used. In a companion piece, the same group [58] noted that the existing method of combining process and outcome measures had not been validated, and that different weightings of process versus outcomes metrics can lead to highly divergent provider rankings.
A simple global rating system may be helpful to free agents in choosing a restaurant, but less useful for assisting organizations to improve the quality of their services. Jacobs et al. [59] found considerable instability in hospitals’ positions in league tables based on their annual star ratings published by the NHS in the UK.
League tables
Perhaps the classic work on measurement and interpretation issues with league tables in relation to institutional performance is that of Goldstein and Speigelhalter [50] whose paper is followed by discussion of the issues by many leading experts in the fields of statistics and education. One theme is the need to take account of ‘model-based uncertainty’ by which they mean the natural uncertainty around a school's or hospital's mean score. Leckie and Goldstein [60, p. 844] demonstrated this by showing the rankings of 266 schools, after adjustment for several covariates, along with their 95% confidence intervals. The inherently imprecise nature of school effects (due to the small numbers of pupils within school cohorts) is clearly apparent. Only 168 (63%) of schools were significantly different from the overall average, and most schools were not significantly or meaningfully different from many of their neighbours in the rankings. As Goldstein and Speigelhalter say ‘with current data, even after adjustment, finely graded comparisons between institutions are impossible’ (p. 397). They also noted (p. 405) that some outputs are influenced by factors that are extrinsic to the organization, and as such not ones for which it might properly be held accountable.
In relation to school league tables, Bird et al. [3] noted that they have been abolished in certain jurisdictions, and ‘It appears that dissatisfaction with the lack of contextualisation and the negative “side effects” have been important factors in influencing public and political opinions’ (p. 6). Rankings embody considerable uncertainty, and they may change between reporting periods. Further, rankings from different rating agencies are often highly divergent [61].
Also, ranking by itself conveys nothing about the adequacy of performance in absolute terms, that ‘being ranked lowest on this occasion does not immediately equate with genuinely inferior performance should be widely recognized; and reflected in the method of presentation’ [3, pp. 20–21]. This relates to the earlier discussion of relative versus absolute indicators. League tables and other presentations that essentially rank organizations according to their scores on an indicator may be acceptable for judging relative standing, but they can underplay an equally (or more) important question of whether any or all of the organizations meet some standard, which might be either a minimum standard or a standard of excellence. The highest ranked organizations might still deliver poor service, and the lowest ranked might still be adequate, or even good. As Neil et al. put it: ‘Half of the surgeons on any league table will be, by definition, worse than average. For public reporting what matters most is not a ranking, but rather that surgeons are shown to meet acceptable performance standards’ [7]. Their point is equally applicable to mental health services. The recently introduced performance reporting system for schools in Australia (MySchool) has explicitly stated that it will not publish league tables on its Internet site [62].
The form and content of public reports
What to report
Edgeman-Levitan and Cleary concluded from their study that: ‘One format will not suit all: for example some patients wanted a summary whereas others wanted detailed information’ [63]. Similarly, Leatherman and McCarthy [64] point out that ‘Gearing a public report exclusively toward the needs of consumers may maximize its use for empowerment but limit its usefulness in giving cues to providers for self-improvement. On the other hand, the information useful to providers may be confusing to consumers’ (p. 95), and they cite instances of different versions of a report for the public and for providers.
Hibbard and Jewitt [65] thought that consumers often do not understand indicators because ‘they have no understanding of the healthcare context within which the indicator operates’, which led them to conclude that ‘importance alone is not a sufficient reason to include indicators in report cards that are designed for consumer use’, while Marshall et al. summarized the evidence in their review as indicating that ‘consumers expressed a desire for a wide range of information on quality but did not necessarily know how to use it and wanted intermediaries to make sense of it on their behalf’ [4, p. 62]. Elsewhere they commented on the requisite quality of the data to be reported thus:
At one extreme are the purists who say that only high quality indicators of outcome should be used, because only these will be credible, and therefore acceptable to clinicians and useful to purchasers. The other extreme is that the quality of the data is not as important as the principle of openness, and that process measures are more useful than outcomes. They felt that the middle ground is that public indicators should be as good as possible but do not have to be perfect, and waiting for the best will retard progress and delay the potential benefits. [4, p. 78].
The case for the more purist position has been articulated by Rothberg et al. who felt that until good data are available, ‘it may be preferable to report nothing at all, rather than report data that are misleading. In the rush to make hospitals accountable, enthusiasm has often outstripped science, and several measures have had to be revised for unintended consequences’ [22].
In terms of mental health more specifically, Oldham et al. [66] said that the first National Healthcare Quality Reports published by the Department of Health and Human Services in the USA in 2003 stated that ‘mental illness is a clinical area without “broadly accepted” and “widely used” measures of quality’ (p.16), and by 2005 there had been no substantial progress in this regard.
How to report
There is some work on the dos and don'ts of how to present performance information. Based on perceptual and cognitive psychology, there are good and bad practices when presenting information in text, numbers and graphs.
Marshall et al. [4, p. 85] thought that readability and brevity were relevant factors. On presentation of data, Bird et al. [3] advised that:
There must be a strong focus on the real objectives and the simplest mode of presentation that avoids being misleading.’ and that ‘It is virtually always necessary that some direct or indirect indication of variability is given. Simplicity does not mean discarding measures of uncertainty either in tables or figures. Insistence on single numbers as answers to complex questions is to be resisted. (pp. 19–20)
As Hibbard and Peters [67] have observed: ‘The assumption that the provision of relevant information is sufficient to increase informed decision-making is too simplistic’ and it may even be an impediment (p. 414). The cognitive task is formidable, since it includes technical terms and complex ideas, compares multiple options on several variables, and requires the decision-maker to weight differentially the various factors according to individual values, preferences, and needs. As they (maybe rhetorically) ask: ‘How can we inform without overwhelming and bewildering consumers?’ (p. 415). They highlight the need to make information not only understandable, but also personally relevant. In general, vivid presentations engender more engagement with the material, as does ‘tailoring’, which means reducing the information to what most matches the consumer in terms of, for example, age, gender, and ethnic group. In another paper [68] they have suggested that ‘Less is more’ when it comes to designing reports for consumers.
Based on the idea that consumers would value report information more highly if they understood it better, but their understanding was restricted by not being conversant with some of the concepts embodied in the reports, Hibbard et al. [69] set out to see whether consumers who receive quality information within a framework translated into plain language would have greater understanding of and value the information more highly than consumers who receive the same information without the framework or without the translation. They found that plain language with the framework was the best comprehended and technical language without the framework the least.
Mattke et al. [70] reviewed 18 Internet-based nursing home reporting systems and found limitations and deficiencies with all of them. They proceeded to develop a new and better one, explicitly taking into account the needs and preferences of the target audience. They reported that all stakeholders regarded the final design as an acceptable compromise, and they attributed the success of the venture to (i) honest and collaborative decision-making and implementation process, and (ii) the tailoring of the content and presentation to the primary target audience.
One of the most thorough treatments of how to present reports for consumers is by Vaiana and McGlynn [71]. Taking as their starting point the frequent finding that consumers do not find performance reports useful, they adopt a cognitive science perspective. Most reports are designed by content experts, using their own frame of reference, and not from that of the prospective user of the report, who approaches the exercise from a ‘knowledge-construction’ rather than an ‘information-telling’ stand-point. Most readers have limited attention spans and do not read documents in their entirety, and if they do not find what they are looking for quickly, tend to give up. Cognitive science provides report designers with much good advice about how to make reports, among other things, more readable and understandable. They advise that (i) instead of having builders of Internet sites present the information they want in the format they choose (the information-telling perspective), users should be able to select the information they want, when they want it, in the format with which they are most comfortable (the knowledge-construction perspective); (ii) formatting and organization should be compatible with what is known about how humans perceive, process, and understand information; and (iii) the capabilities of the Internet should be specifically exploited to present information in usable and flexible ways.
Mental health
Compared to areas such as cardiac surgery, education, and certain chronic physical conditions, there has been little work in the mental health area.
Donnelley [72] reported the Mental Health Benchmarking Project conducted in Scotland. The project's aims were to compare aspects of performance in the areas of cost, quality, efficiency, and sustainability. On the vexed questions of recording and reporting, they say:
From the work we have done, we conclude our challenge is to develop ‘good enough’ recording and reporting systems in the first instance that may only partially meet the needs of all the stakeholders (Government, Health Boards, staff, service users, general public), whilst developing a clear vision of the final shape of what is needed to support benchmarking and continuous improvement. (p.5)
Bremer et al. [73] conducted semi-structured interviews with 28 individuals associated with 24 pay-for-performance behavioural health programmes in the USA. They observed that ‘Many programs struggled to obtain accurate and valid data on quality and outcomes of care, and the public reporting of results was not widespread’ (abstract p. 1419).
Jacobs and McDaid [21] have provided an overview of performance measurement in mental health. They describe a lack of consensus on which of aspects of performance should be used, but note that certain user-focused domains, such as responsiveness of service delivery and cultural appropriateness, are becoming more prominent.
Oldham et al. [66] reviewed some of the evidence of shortcomings in the (American) mental health area. They refer to Bauer's [74] review that found adherence to guidelines at a level of only 27%, and to McGlynn et al. [75] who found, for example, that only 10.5% of patients with alcohol dependence received recommended care involving five indicators, while 57.7% of patients with depression received recommended treatment involving 14 indicators. Also, the Institute of Medicine [76] found that only five of 21 studies had documented adequate adherence to specific recommendations in clinical practice guidelines for the treatment of various mental and substance use disorders. Oldham et al. go on to say that the feedback of objective measurement has a role to play in reducing undesirable clinical variation. They note that there is increasing pressure for external performance measurement, although ‘the infrastructure needed to measure, analyze, and publicly report data on mental health and substance abuse care remains less well developed than that for general health care’ (p. 12).
Stein et al. [77] conducted focus groups with 41 Medicaid-enrolled mental healthcare consumers and their family members, seeking their views on current uses of provider performance information, examples of which were made available to them, and then asking what they would like to have. The themes that arose were that they wanted to (i) have publicly reported provider information that was easily accessible and updated frequently (ii) know more about provider services, allowing for more informed choices about care with that provider (iii) know whether they would be able to use this information in making choices at provider organizations, e.g. could they choose their clinicians and the types of services they might want? (iv) know whether consumers would receive care in a timely manner, (v) know about provider flexibility and responsiveness in scheduling appointments; i.e. could they make and change appointments at times to suit them?, and (vi) be able to converse directly with a psychiatrist, rather than an intermediary such as a receptionist or a nurse.
Summary
Our review of the literature identified three main reasons for public reporting of service performance data: providing a basis for patients and purchasers to select among competing services, providing motivation for quality improvement, and affording a means of external leverage. It is unclear whether public reporting confers any advantage over confidential reporting. Responses to the prospect or actuality of public reporting vary according to stakeholder group, with managers tending to ambivalence and clinicians to opposition. Patients generally like the idea of performance reporting, but are unaware of such reports, and tend not to use them. Organizations respond to performance reports with increased activity in the areas reported, but there is little evidence of spread to non-reported functions, and it is unclear whether the increased activity converts to improved quality of services delivered. There are numerous technical issues, such as risk adjustment, data quality, whether performance measures should be relative or absolute, or aggregated or separate. Most expert opinion is against ranking organizations into ‘league tables’, which have proved to be highly contentious, and serve as a focus for opposition to public reporting. There is general consensus that reports should be Internet-based and customizable by the user.
Conclusions and Implications
It is clear from this rather selective overview that the initiative to publicly report the performance of public mental health services is highly complex, not least because of the differing stances and understandings of the main stakeholder groups. Even in specialisms where process and outcome indicators appear to be more straightforward than in mental health, implementers struggle with conceptual and methodological issues; for example, in the area of surgical site infection, it appears that an indicator suitable for public reporting is still being debated [78].
As was pointed out at the beginning of this editorial, the current National Mental Health Plan [1] recognizes that more work needs to be done to develop this indicator; it has been observed elsewhere that public reporting is easy to draft, but challenging to implement [22]. There is consistent advice [12,79] that there should be a high level of active participation of key users, noting that ‘Mere consultation is not sufficient’ [80]. An oft-made criticism of public reporting is the lack of confidence in the data that are being reported and in the risk adjustment applied [81]. Since much of the basic data will be initially collected by clinicians, it is vital that any resultant system is acceptable to and accepted by them, since a lack of credibility is likely to lead to poor quality and quantity of data, and even data distortions. Indeed, the ambivalence or opposition of many in the field to the idea of public reporting could be a serious impediment. While some of the concerns may be exaggerated and ill-informed, concepts such as league tables and ‘naming and shaming’ [5,82] have proved to be highly emotive and aversive. It will be important for the field to understand that public reporting and payment for performance typically go hand in hand, with the former being the natural precursor of the latter [83].
As to what any ultimate reports should actually look like, there is considerable practical advice from cognitive scientists on lessening the mental load of readers of reports. One of their most helpful points is that the task is not how to present the information, but rather, how can the reader be assisted to make a decision? This then poses a question more fundamental than content and formatting, namely, as has recently been asked [79]: ‘What do we want to achieve by publishing patient care performance information?’ (p.398). Responding by saying that such reporting is mandatory, or by reiterating the theoretical rationale, does not clarify the actual utility of the report to the end-user.
As we have seen, such reports tend to be under-utilized by the general public, but responded to, albeit in a highly targeted fashion, by services themselves. Whether the reports are actually doing their job in terms of the initial objectives (informing choice, promoting quality, making services accountable) tends to be a neglected question. It has been noted that the growth in public reporting has not been matched by a growth of research on impact [79]; for example, a literature search for evidence of public reporting of surgeons’ performances influencing patients’ choices failed to find a single article [84]. At the outset we noted that the desired outcome in the National Mental Health Plan is for the public to be able to make informed judgements; what is left unstated is what actions should flow from those judgements. A clear implication is that the implementation of public reporting should be accompanied by a ‘strong’ evaluation (i.e. against explicit criteria) of its direct and indirect effects, and whether the outlay constitutes value for money. A model for the evaluation of Australian mental health policy initiatives is [85].
How public reporting of mental health service organizational performance should be implemented in Australia is difficult to discern, although a start, acknowledging the complexities, has been made [86], suggesting a gradual, staged approach. This sounds good, but there may be a temptation to do the easier aspects early and leave the more difficult for later. The initiative has the potential to significantly alter the organizational landscape.
Footnotes
Acknowledgements
The author is particularly grateful for the input of Tim Coombs.
