The Public Reporting of Organizational Performance in Mental Health: Coming Soon to a Mental Health Service Near You

Abstract

The Fourth National Mental Health Plan [1] identified five priority areas for outcome and action, one of which is ‘Accountability – measuring and reporting progress’. The desired outcome is that: ‘The public is able to make informed judgements about the extent of mental health reform in Australia, including the progress of the Fourth Plan, and has confidence in the information available to make these judgements. Consumers and carers have access to information about the performance of services responsible for their care across the range of health quality domains and are able to compare these to national benchmarks.’ and the corresponding indicator for this outcome is ‘The proportion of services publicly reporting performance data’ (p. 81).

On the same page it is noted that such reporting is not yet the norm, and there are no relevant data sets available at present. It is recognized that ‘Consideration will need to be given to systematic means of monitoring progress against this indicator.’ Given the relative underdevelopment of this area, and the fact that all mental health jurisdictions are now committed to the implementation of the Plan, it is timely to consider some of the issues concerning the public reporting of health performance data, especially as it relates to mental health.

Our coverage begins with a consideration of the rationale for public reporting, followed by a consideration of its effects on different groups of stakeholders. We then examine some of the measurement issues involved, and outline what is known about what to report, and how to report it. The currently small literature on public reporting in mental health is reviewed. We conclude with a broad summary of what the literature appears to be telling us, and some of its implications on how work on this priority area should proceed. Most of the relevant literature is from overseas, hence the strongly international flavour.

Rationale

Most rationales for public performance reporting reduce to an instance or variant of the two proposed by Berwick et al. [2]: selection, whereby a ‘user’ of healthcare (e.g. patient, purchaser) can compare what is available and make a value-based choice, and change, whereby providers (e.g. clinicians, local managers) use comparative institutional data to improve the quality of their services. Another potential use of public performance reporting is for regulators and politicians to demonstrate public accountability [3,4], and this is consistent with the wording in the Plan. In relation to selection of services by the general public, it has been suggested that public reporting makes more sense in a competitive, market environment than in a public system where consumers have limited or no capacity to ‘shop around’ [5].

In addition to these reasons, Pidd [6] has put forward the idea of public reporting in public services as a tool of external control. That is, the reported measures and indicators can be used as a basis for a system of incentives and disincentives, thus providing external interests a degree of control over services, such as health and education, which are historically delivered at the discretion of professional providers. Propper and Wilson also note ‘a general shift in the use of information on performance away from primarily being used for internal management control purposes towards use of these data for external accountability and control’ [5, pp. 254–255].

Most commentators agree that performance data should be collected and used; ‘The question…is not whether these data should be collected, but whether they should be made public’ [7, p. 267]. There is no doubt that non-public feedback to organizations can have a marked beneficial effect. Merle et al. [8] investigated the quality of care in three hospitals, using indicators selected by professionals. After sharing each other's information, there were significant improvements in most of the areas reported. Williams et al. [9] tracked the performance over a two-year period of over 3,000 accredited hospitals on 18 standard indicators of quality. All participating hospitals received quarterly feedback in the form of comparative reports throughout the study. They found significant improvements on 15 of 18 measures, and no measure showed a significant deterioration. A few studies have examined whether there is a difference between providing comparative performance results confidentially or publicly. Guru et al. [10] compared cardiac surgery mortality rates over three periods: no reporting, confidential reporting, and public reporting. Rates improved significantly between the first two periods but not between the second and the third. Such findings are compatible with the suggestion of Bird et al. that ‘“Naming” is not a pre-requisite for public accountability and may have dis-benefits besides its apparent attractiveness in promoting public choice’ [3, p. 23].

Effects of public performance reporting

It is convenient to consider the effects of public performance reporting through the potential impacts on the different parties that stand to gain (or maybe lose) from it.

Health managers and executives

Some studies have examined the perceptions and reactions of health managers and executives. Davies [11] conducted interviews with senior individuals from six hospitals in the USA. His interviewees were generally antipathetic towards publicly released comparative data, with concerns revolving around validity of the data and distortions of clinical priorities. By contrast, the executives interviewed by Barr et al. [12] were generally favourable, seeing the practice as supporting quality improvement initiatives. Goldman et al. [13] interviewed executives of Californian safety-net hospitals (SNHs, which serve Medicaid, uninsured or underinsured patients, and underserved areas) on their concerns about public reporting and pay-for-performance. Most said they used data gathered in performance reports to improve quality, and affirmed a long-standing commitment to quality improvement. Alongside these views there were significant concerns, which included lack of resources (staff, training and technology) to capture and process the data, and difficulties in gaining physician buy-in. Several thought that the free-market economic model on which public reporting and pay-for-performance is based is not pertinent to SNHs, which are often effectively the only provider for the poor and underprivileged. Other concerns revolved around hospitals being judged on case-mix models that made no allowance for the unfavourable circumstances that their patients were admitted from or discharged to.

Health provider organizations

Other studies have looked at the effect on health provider organizations. Most of them have found that publicly reported performance measures stimulated activity at the organizational level. Marshall et al. [4] concluded that hospitals respond to the publication of comparative performance data with internal changes, especially in a competitive environment. Fung et al. [14] and Shekelle et al. [15] both summarized work since the Marshall et al. review in 2000, and both found good evidence that public reporting stimulated quality improvement activity. Even more recently, Shekelle [16], commenting on the selection and change pathways (see above) that underpin public reporting, thought that there was little evidence of effects for the former but quite good evidence for the latter. Whether this increased activity translates into improved quality is an open question; Heckman et al. [17] highlighted the distinction between focused activity and productive activity.

Direct health providers

The evidence of effects on direct care providers relates mainly to physicians and surgeons. Marshall et al. [4] summarized evidence to 1999 that physicians were ‘interested in report cards but were sceptical about the validity of current examples and were unwilling to use them in practice, either in terms of sharing the information with patients or using the data to influence their own referral patterns’ (p. 57). Ringel [18], writing on behalf of practising neurologists, raised a number of issues that are equally relevant in mental health. Among his concerns was the common observation that many patients do not do as well as they might on account of poor compliance with treatments known to be effective. He also questioned whether the reliability of the measures used was adequate to assess the often highly idiosyncratic clinical trajectories of patients who may well share a common principal diagnosis.

Casalino et al. [19] conducted a national survey with physicians seeking their views on pay-for-performance and public reporting. Asked ‘If accurate, measures of the quality of individual physicians’ performance should be made public’, 68% disagreed (35% strongly), and to the question ‘If accurate, measures of the quality of individual medical groups’ performance should be made public’, 55% disagreed (29% strongly). Responses to other questions revealed that over 80% felt that present measures of quality did not adequately adjust for medical condition or socio-economic status, and that measuring quality would divert physicians’ attention from important types of care for which quality is not measured, and may lead physicians to avoid high-risk patients. In a survey of surgeons, Neuman et al. [20] found that while most (80–90%) felt that a national, surgeon-developed, risk-adjusted system of outcome assessment would improve quality of care and identify areas for improvement, less than half (45%) thought these data should be available publicly. In the Australian context, Jacobs and McDaid have noted ‘Initially, the majority of clinicians have perceived the Australian government's primary objective for introducing the [clinical outcome] measures to be financial management rather than to ensure the quality of services’ [21, p. 434].

Consumers and the general public

Marshall et al. summarized the evidence to 1999 as showing that ‘the currently available performance data had minimal impact on consumer choice’ [4, p.64], and more recent work [22,23] suggests that this situation has not changed. Marshall et al. observed that it was unclear whether this lack of effect was on account of access, comprehension or motivational factors. Schneider and Lieberman [24] offered some reasons: despite much experimentation with different formats for public reports, most report cards are not useful documents; they tend to overwhelm consumers with too much information and leave it to the reader to figure out what the data mean. Just as importantly, the data typically presented in such reports are not what consumers often regard as indicators of quality. Another study (described in Canto [25]) showed that the public is more likely to rely on recommendations of friends, family members, and coworkers or from health professionals they know than on standard quality indicators; respondents indicated that they would rather choose a surgeon they had seen before but who was not well rated (50%) than a surgeon whom they had not seen before (38%). Hibbard [26] has suggested three possible reasons for the repeated finding that consumers show little interest in the information on healthcare quality that is made available to them: (i) consumers are largely unaware of the widespread problems in healthcare quality. Surveys have shown that the public believes that the technical quality of care is uniformly high, and that hospitals do not differ much in safety or quality; (ii) the public define ‘good quality’ very differently from experts and industry leaders. When asked, consumers mostly mention access, cost, choice of doctor, and doctor qualifications; and (iii) to understand most public reports involves processing a large volume of information, weighing some factors more than others, and bringing all the factors together into a conclusion, all of which is cognitively burdensome.

Unintended consequences

Some of the nervousness about public performance reporting appears to relate to the possibility that, as well as beneficial effects considered earlier, there may be unintended consequences that may actually be harmful to service quality. The classic work here is that of Smith [27]; his list appears in Table 1 (the descriptors are taken from Pidd [6]).

Table 1.

Unintended consequences of public reporting of health performance information (Based on Smith [27] and Pidd [6]

Tunnel vision	Managers, faced with many different targets, choose the ones that are easiest to measure and ignore the rest
Sub-optimization	Managers choose to operate in ways that serve their own operation well but damage the performance of the overall system
Myopia	Managers focus their efforts on short-term targets at the expense of longer-term objectives
Measure fixation	The natural tendency, when outcomes are difficult to measure, to use indicators based on measurable outputs; when the indicator itself becomes the focus rather than the desired outcome
Misrepresentation	A form of fraud and occurs when performance data are either misreported or distorted to create a good impression
Misinterpretation	Interpreting trivial differences as real. Statistical measures being imprecise, there may be no real difference between many of the units sequenced in a (league) table
Gaming	When a canny manager deliberately underachieves in order to secure a lower target in the next round of activity
Ossification	When an indicator is past its sell-by date and has lost its purpose, but no one can be bothered to revise or remove it

While most of the discourse about unintended consequences relates to negative consequences, it is also recognized that some consequences could be unintended yet positive. That is, the stimulation of quality improvement activities, and generally heightened consciousness of good practice, could lead to improvements in areas not covered by the performance indicators that are being reported. Werner et al. [28] examined the public reporting of nursing home performance on the Centers for Medicare & Medicaid Services Nursing Home Compare website and concluded that it was associated with improved performance on both reported and unreported measures. In a related study, the same research group [29] compared improvement rates on indicators that were and were not publicly reported. After the introduction of public reporting they found improvements in two of three publicly reported indicators, but no improvement, and in some cases deterioration, in another indicator that was not publicly reported. Ganz et al. [30] conducted a more rigorous test of this idea. In a controlled trial, some medical practices received a specific intervention to improve certain problems in elderly patients, while other practices received a control condition. The point of the study, however, was to see whether there was any improvement in indicators that were not targeted. The intervention appeared to have an impact, in that two of the three targeted indicators improved, but there was no change in intervention and control practices in the non-targeted indicators. This result confirms the findings of most others that institutional behaviour often improves in relation to indicators that are targeted for reporting, but further suggests that this improvement may not generalize to areas that are not reported.

Pidd [6] has made the point that Smith's list of unintended consequences is not an argument against performance measurement per se, but rather an indictment of clumsy or thoughtless implementation. He cited ‘Goodhart's Law’, which is that, ‘any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes’ [31], a corollary of which is ‘A risk model breaks down when used for regulatory purposes’ [32].

As well as warnings about comparatively short-term unintended consequences, some have written about possible long-term corrosive effects. Lester and Roland [33] fear that public reporting with an emphasis on control may erode the very medical professionalism that enabled quality improvement initiatives to flourish in the first place, by shifting the balance too far away from trust [34,35].

Measurement issues

Risk adjustment

There is unanimity that data should be presented so that comparisons are fair (comparing ‘like with like’). But this is easier said than done; risk adjustment is quite technical, with tricky decisions having to be made about what to adjust for and what not, how to make the adjustment, and then there is the challenge of how to present adjusted performance results in a way that the average interested person can understand.

Thomas drew attention to the distinction between statistical and attributional validity, the latter being ‘the degree to which changes in outcomes can be attributed to the care being evaluated’, which is the key aspect ‘in the context of using risk-adjusted measures to motivate practice changes or to monitor provider performance’ (p. 182) [36].

Jacobs & McDaid, writing on performance measurement in mental health services, pointed out that controlling for case-mix in mental health is particularly challenging because they are frequently based on diagnosis, which has been shown to be a fairly poor predictor of service use [21]. Also, they note that while there has been ‘a lot of work on the risk adjustment of outcomes for specific interventions in mental health and some on risk adjustment for the development of payment systems…there has been very little work on the risk adjustment of indicators for the purpose of comparing the performance of multiple providers’ [21, p.452]. One exception is a study by Dow et al. [37] who risk-adjusted two outcome measures (global rating of functioning and a consumer satisfaction measure) using data on 7,000 individuals over a three-year period from 24 state-funded providers in Florida. There was significant variation between providers on the two outcome measures but the risk adjustment had a fairly small impact on their overall rank ordering. Nevertheless, it had a major effect for a few specific providers, particularly those with small caseloads. This sensitivity to method was also found by Hendryx and Teague, who compared different risk adjustment models, also in the field of mental health. They found that clinical data (ratings by clinicians and consumers) had a sizeable additional effect over purely administrative data, and that ‘the particular form of the model will lead to different conclusions about comparative treatment agency performance’ (p. 247) [38].

Li et al. [39] expressed the need for risk adjustment by virtue of the broad array of consumer characteristics that can affect outcomes, which may not be randomly distributed among facilities. They explained how the choice of statistical methodology may affect the quality rankings. In particular, they contended that classical regression may not be appropriate for data in which patients’ characteristics are ‘clustered’ within facilities, because a basic assumption of classical regression is that all observations in a dataset are independent, and this is hardly ever the case, since patients within the same facility will receive similar levels of care. ‘Ignoring the “clustering” of patients due to “quality” or other factors may invalidate the empirical risk-adjustment model and lead to incorrect quality estimates’ (p.84). The point about clustering of observations within agencies has been echoed by Shahian and Edwards [40], who advocated the use of data analytic techniques that explicitly account for such clustering.

Shahian and Normand [41] said that, despite their increasingly widespread use, considerable confusion exists among consumers, the media, payers, and providers as to the correct meaning and interpretation of risk-adjusted outcomes, incorrectly believing that risk adjustment ‘levels the playing field’ to permit direct comparison of one provider with another (pp. 1955–1956). They showed that while it is usually valid to compare the performance of one hospital with the average of its peers, it may not be valid to compare a given pair of hospitals, because their profile of risk factors (i.e. case-mix of patients) may be quite different. That is, the case-mix-adjusted performance of a hospital is a valid indication of its performance with the kinds of patients that it treats, but between-hospital comparisons on the same outcomes are only valid for the kinds of patients that are treated in both hospitals.

Shahian et al. [42] drew attention to what is reasonable and what is not reasonable to risk-adjust for. They question whether one should adjust for such fixed organizational characteristics as size, census division, and teaching status when comparing hospitals, since that has ‘the potential to adjust away important performance differences’. Romano [43] delved deeper into this issue. Citing the findings of Austin et al. [44], who found that fewer hospitals were outliers when judged against similar (e.g. non-teaching) hospitals than when judged against all hospitals in their locality, he pointed out that ‘This finding is not surprising; it is intuitively obvious that comparing hospitals with peer institutions will lead to a smaller group of outliers, on average, than comparing the same hospitals to all institutions in a state or province’ (p. 921). On this basis, he went on to recommend that true differences in performance between (e.g. teaching and non-teaching) hospitals should not be obscured by artificially comparing only ‘like with like’. Indeed, it is practically inevitable that there will be fewer outliers when comparing ‘like with like’ because they are necessarily more homogeneous. According to Romano, ‘the only defensible peer group for a public reporting program is geographically defined to include hospitals that compete (or potentially compete) in the same market’ (p. 922). Romano further pointed to the difficulty of deciding what constitutes a relevant peer group of provider organizations – who makes the decision, and on what basis? There is a ‘slippery slope’ involving an ever-narrowing definition of what ‘similar’ means, until ultimately each provider organization can claim, with some justification, to be unique.

Data quality

Most of the work from surgery and education has benefited from quite high quality data. For example, there is little risk of mortality being misreported, and school exam marks are also quite reliable. In addition, both death and school test results have close to 100% reporting. It is not at all clear that change scores on mental health outcome measures, as opposed to process measures (like length of stay, or compliance with standards) are anything like as robust. One study reviewing data quality in the UK, USA and Australia [45] referred to the challenges in developing meaningful sets of national indicators of health system performance, and the general paucity of accurate and accessible clinical data. This is echoed by opinion from Queensland that administrative hospital data are often inaccurate and incomplete [46].

In what they describe as the largest multi-facility study undertaken to date, Mor et al. [47] studied the reliability of nurses’ ratings of nursing home residents’ health problems. While average reliability levels were good, substantial inter-facility variations were found, disagreements between raters were non-random, and subsequent analyses [48] revealed that directional bias in the data could have resulted in significant differences in the relative rankings of the facilities. Sangl et al. [49] obtained similar results. Goldstein and Speigelhalter [50] have noted that ‘No amount of fancy statistical footwork will overcome basic inadequacies in either the appropriateness or the integrity of the data collected’, a point echoed by Shahian et al. [51].

Should indicators be relative or absolute?

Mor [52] observed that ‘establishing minimums as measured by particular quality measures may not be appropriate in all cases, since many areas of performance have no evidence-based standards that could determine a minimum [and] Conversely, relying on only empirically based benchmarks (e.g. below the median) may ‘institutionalize the poor performance of providers operating at the median’ (p. 341). The point here is that if performance benchmarks are continually adjusted to represent the current situation, it could appear that no overall improvement is occurring. For example, if there is a consistent systemwide 5% annual improvement, a service that is median will remain median year on year, and a misleading and possibly demoralizing impression conveyed that it hasn't changed. This would be an argument for using absolute anchors, whereby an increasing proportion of services will meet the absolute standard over time, and systemwide improvement can be demonstrated.

Single or multiple measures

Some attention has been paid to the question of whether a single, composite index of performance should be cited, or whether the various subdomains of performance should be cited separately, with no attempt at aggregation or summarization. Many favour composite indicators, perhaps for their apparent simplicity and presumed robustness [22,53,54]. Others have warned of problematic features of composite measures [55, pp. 264–265]. Bird et al. [3, p. 18] argued that amalgamation into a summary index can seriously distort the performance assessment and pointed out that there are value judgements implicit in weighting the components, which may differ legitimately between stakeholders; as Pidd [6] has asked: who determines these weights and what values should they take?

Fong et al. [56] compared quality indicators from four databases and found little pairwise correlation among them, leading them to recommend assessing performance across multiple measures. O’Brien et al. [57] compared eleven different ways of combining nationally endorsed cardiac surgery process and outcome measures into summary measures of performance. Although the various methods produced results that were highly inter-correlated, sensitivity analyses showed that up to about 10% of organizations displayed dramatic changes in the overall rankings depending on the method used. In a companion piece, the same group [58] noted that the existing method of combining process and outcome measures had not been validated, and that different weightings of process versus outcomes metrics can lead to highly divergent provider rankings.

A simple global rating system may be helpful to free agents in choosing a restaurant, but less useful for assisting organizations to improve the quality of their services. Jacobs et al. [59] found considerable instability in hospitals’ positions in league tables based on their annual star ratings published by the NHS in the UK.

League tables

Perhaps the classic work on measurement and interpretation issues with league tables in relation to institutional performance is that of Goldstein and Speigelhalter [50] whose paper is followed by discussion of the issues by many leading experts in the fields of statistics and education. One theme is the need to take account of ‘model-based uncertainty’ by which they mean the natural uncertainty around a school's or hospital's mean score. Leckie and Goldstein [60, p. 844] demonstrated this by showing the rankings of 266 schools, after adjustment for several covariates, along with their 95% confidence intervals. The inherently imprecise nature of school effects (due to the small numbers of pupils within school cohorts) is clearly apparent. Only 168 (63%) of schools were significantly different from the overall average, and most schools were not significantly or meaningfully different from many of their neighbours in the rankings. As Goldstein and Speigelhalter say ‘with current data, even after adjustment, finely graded comparisons between institutions are impossible’ (p. 397). They also noted (p. 405) that some outputs are influenced by factors that are extrinsic to the organization, and as such not ones for which it might properly be held accountable.

In relation to school league tables, Bird et al. [3] noted that they have been abolished in certain jurisdictions, and ‘It appears that dissatisfaction with the lack of contextualisation and the negative “side effects” have been important factors in influencing public and political opinions’ (p. 6). Rankings embody considerable uncertainty, and they may change between reporting periods. Further, rankings from different rating agencies are often highly divergent [61].

Also, ranking by itself conveys nothing about the adequacy of performance in absolute terms, that ‘being ranked lowest on this occasion does not immediately equate with genuinely inferior performance should be widely recognized; and reflected in the method of presentation’ [3, pp. 20–21]. This relates to the earlier discussion of relative versus absolute indicators. League tables and other presentations that essentially rank organizations according to their scores on an indicator may be acceptable for judging relative standing, but they can underplay an equally (or more) important question of whether any or all of the organizations meet some standard, which might be either a minimum standard or a standard of excellence. The highest ranked organizations might still deliver poor service, and the lowest ranked might still be adequate, or even good. As Neil et al. put it: ‘Half of the surgeons on any league table will be, by definition, worse than average. For public reporting what matters most is not a ranking, but rather that surgeons are shown to meet acceptable performance standards’ [7]. Their point is equally applicable to mental health services. The recently introduced performance reporting system for schools in Australia (MySchool) has explicitly stated that it will not publish league tables on its Internet site [62].

The form and content of public reports

What to report

Edgeman-Levitan and Cleary concluded from their study that: ‘One format will not suit all: for example some patients wanted a summary whereas others wanted detailed information’ [63]. Similarly, Leatherman and McCarthy [64] point out that ‘Gearing a public report exclusively toward the needs of consumers may maximize its use for empowerment but limit its usefulness in giving cues to providers for self-improvement. On the other hand, the information useful to providers may be confusing to consumers’ (p. 95), and they cite instances of different versions of a report for the public and for providers.

Hibbard and Jewitt [65] thought that consumers often do not understand indicators because ‘they have no understanding of the healthcare context within which the indicator operates’, which led them to conclude that ‘importance alone is not a sufficient reason to include indicators in report cards that are designed for consumer use’, while Marshall et al. summarized the evidence in their review as indicating that ‘consumers expressed a desire for a wide range of information on quality but did not necessarily know how to use it and wanted intermediaries to make sense of it on their behalf’ [4, p. 62]. Elsewhere they commented on the requisite quality of the data to be reported thus:

At one extreme are the purists who say that only high quality indicators of outcome should be used, because only these will be credible, and therefore acceptable to clinicians and useful to purchasers. The other extreme is that the quality of the data is not as important as the principle of openness, and that process measures are more useful than outcomes. They felt that the middle ground is that public indicators should be as good as possible but do not have to be perfect, and waiting for the best will retard progress and delay the potential benefits. [4, p. 78].

The case for the more purist position has been articulated by Rothberg et al. who felt that until good data are available, ‘it may be preferable to report nothing at all, rather than report data that are misleading. In the rush to make hospitals accountable, enthusiasm has often outstripped science, and several measures have had to be revised for unintended consequences’ [22].

In terms of mental health more specifically, Oldham et al. [66] said that the first National Healthcare Quality Reports published by the Department of Health and Human Services in the USA in 2003 stated that ‘mental illness is a clinical area without “broadly accepted” and “widely used” measures of quality’ (p.16), and by 2005 there had been no substantial progress in this regard.

How to report

There is some work on the dos and don'ts of how to present performance information. Based on perceptual and cognitive psychology, there are good and bad practices when presenting information in text, numbers and graphs.

Marshall et al. [4, p. 85] thought that readability and brevity were relevant factors. On presentation of data, Bird et al. [3] advised that:

There must be a strong focus on the real objectives and the simplest mode of presentation that avoids being misleading.’ and that ‘It is virtually always necessary that some direct or indirect indication of variability is given. Simplicity does not mean discarding measures of uncertainty either in tables or figures. Insistence on single numbers as answers to complex questions is to be resisted. (pp. 19–20)

As Hibbard and Peters [67] have observed: ‘The assumption that the provision of relevant information is sufficient to increase informed decision-making is too simplistic’ and it may even be an impediment (p. 414). The cognitive task is formidable, since it includes technical terms and complex ideas, compares multiple options on several variables, and requires the decision-maker to weight differentially the various factors according to individual values, preferences, and needs. As they (maybe rhetorically) ask: ‘How can we inform without overwhelming and bewildering consumers?’ (p. 415). They highlight the need to make information not only understandable, but also personally relevant. In general, vivid presentations engender more engagement with the material, as does ‘tailoring’, which means reducing the information to what most matches the consumer in terms of, for example, age, gender, and ethnic group. In another paper [68] they have suggested that ‘Less is more’ when it comes to designing reports for consumers.

Based on the idea that consumers would value report information more highly if they understood it better, but their understanding was restricted by not being conversant with some of the concepts embodied in the reports, Hibbard et al. [69] set out to see whether consumers who receive quality information within a framework translated into plain language would have greater understanding of and value the information more highly than consumers who receive the same information without the framework or without the translation. They found that plain language with the framework was the best comprehended and technical language without the framework the least.

Mattke et al. [70] reviewed 18 Internet-based nursing home reporting systems and found limitations and deficiencies with all of them. They proceeded to develop a new and better one, explicitly taking into account the needs and preferences of the target audience. They reported that all stakeholders regarded the final design as an acceptable compromise, and they attributed the success of the venture to (i) honest and collaborative decision-making and implementation process, and (ii) the tailoring of the content and presentation to the primary target audience.

One of the most thorough treatments of how to present reports for consumers is by Vaiana and McGlynn [71]. Taking as their starting point the frequent finding that consumers do not find performance reports useful, they adopt a cognitive science perspective. Most reports are designed by content experts, using their own frame of reference, and not from that of the prospective user of the report, who approaches the exercise from a ‘knowledge-construction’ rather than an ‘information-telling’ stand-point. Most readers have limited attention spans and do not read documents in their entirety, and if they do not find what they are looking for quickly, tend to give up. Cognitive science provides report designers with much good advice about how to make reports, among other things, more readable and understandable. They advise that (i) instead of having builders of Internet sites present the information they want in the format they choose (the information-telling perspective), users should be able to select the information they want, when they want it, in the format with which they are most comfortable (the knowledge-construction perspective); (ii) formatting and organization should be compatible with what is known about how humans perceive, process, and understand information; and (iii) the capabilities of the Internet should be specifically exploited to present information in usable and flexible ways.

Mental health

Compared to areas such as cardiac surgery, education, and certain chronic physical conditions, there has been little work in the mental health area.

Donnelley [72] reported the Mental Health Benchmarking Project conducted in Scotland. The project's aims were to compare aspects of performance in the areas of cost, quality, efficiency, and sustainability. On the vexed questions of recording and reporting, they say:

From the work we have done, we conclude our challenge is to develop ‘good enough’ recording and reporting systems in the first instance that may only partially meet the needs of all the stakeholders (Government, Health Boards, staff, service users, general public), whilst developing a clear vision of the final shape of what is needed to support benchmarking and continuous improvement. (p.5)

Bremer et al. [73] conducted semi-structured interviews with 28 individuals associated with 24 pay-for-performance behavioural health programmes in the USA. They observed that ‘Many programs struggled to obtain accurate and valid data on quality and outcomes of care, and the public reporting of results was not widespread’ (abstract p. 1419).

Jacobs and McDaid [21] have provided an overview of performance measurement in mental health. They describe a lack of consensus on which of aspects of performance should be used, but note that certain user-focused domains, such as responsiveness of service delivery and cultural appropriateness, are becoming more prominent.

Oldham et al. [66] reviewed some of the evidence of shortcomings in the (American) mental health area. They refer to Bauer's [74] review that found adherence to guidelines at a level of only 27%, and to McGlynn et al. [75] who found, for example, that only 10.5% of patients with alcohol dependence received recommended care involving five indicators, while 57.7% of patients with depression received recommended treatment involving 14 indicators. Also, the Institute of Medicine [76] found that only five of 21 studies had documented adequate adherence to specific recommendations in clinical practice guidelines for the treatment of various mental and substance use disorders. Oldham et al. go on to say that the feedback of objective measurement has a role to play in reducing undesirable clinical variation. They note that there is increasing pressure for external performance measurement, although ‘the infrastructure needed to measure, analyze, and publicly report data on mental health and substance abuse care remains less well developed than that for general health care’ (p. 12).

Stein et al. [77] conducted focus groups with 41 Medicaid-enrolled mental healthcare consumers and their family members, seeking their views on current uses of provider performance information, examples of which were made available to them, and then asking what they would like to have. The themes that arose were that they wanted to (i) have publicly reported provider information that was easily accessible and updated frequently (ii) know more about provider services, allowing for more informed choices about care with that provider (iii) know whether they would be able to use this information in making choices at provider organizations, e.g. could they choose their clinicians and the types of services they might want? (iv) know whether consumers would receive care in a timely manner, (v) know about provider flexibility and responsiveness in scheduling appointments; i.e. could they make and change appointments at times to suit them?, and (vi) be able to converse directly with a psychiatrist, rather than an intermediary such as a receptionist or a nurse.

Summary

Our review of the literature identified three main reasons for public reporting of service performance data: providing a basis for patients and purchasers to select among competing services, providing motivation for quality improvement, and affording a means of external leverage. It is unclear whether public reporting confers any advantage over confidential reporting. Responses to the prospect or actuality of public reporting vary according to stakeholder group, with managers tending to ambivalence and clinicians to opposition. Patients generally like the idea of performance reporting, but are unaware of such reports, and tend not to use them. Organizations respond to performance reports with increased activity in the areas reported, but there is little evidence of spread to non-reported functions, and it is unclear whether the increased activity converts to improved quality of services delivered. There are numerous technical issues, such as risk adjustment, data quality, whether performance measures should be relative or absolute, or aggregated or separate. Most expert opinion is against ranking organizations into ‘league tables’, which have proved to be highly contentious, and serve as a focus for opposition to public reporting. There is general consensus that reports should be Internet-based and customizable by the user.

Conclusions and Implications

It is clear from this rather selective overview that the initiative to publicly report the performance of public mental health services is highly complex, not least because of the differing stances and understandings of the main stakeholder groups. Even in specialisms where process and outcome indicators appear to be more straightforward than in mental health, implementers struggle with conceptual and methodological issues; for example, in the area of surgical site infection, it appears that an indicator suitable for public reporting is still being debated [78].

As was pointed out at the beginning of this editorial, the current National Mental Health Plan [1] recognizes that more work needs to be done to develop this indicator; it has been observed elsewhere that public reporting is easy to draft, but challenging to implement [22]. There is consistent advice [12,79] that there should be a high level of active participation of key users, noting that ‘Mere consultation is not sufficient’ [80]. An oft-made criticism of public reporting is the lack of confidence in the data that are being reported and in the risk adjustment applied [81]. Since much of the basic data will be initially collected by clinicians, it is vital that any resultant system is acceptable to and accepted by them, since a lack of credibility is likely to lead to poor quality and quantity of data, and even data distortions. Indeed, the ambivalence or opposition of many in the field to the idea of public reporting could be a serious impediment. While some of the concerns may be exaggerated and ill-informed, concepts such as league tables and ‘naming and shaming’ [5,82] have proved to be highly emotive and aversive. It will be important for the field to understand that public reporting and payment for performance typically go hand in hand, with the former being the natural precursor of the latter [83].

As to what any ultimate reports should actually look like, there is considerable practical advice from cognitive scientists on lessening the mental load of readers of reports. One of their most helpful points is that the task is not how to present the information, but rather, how can the reader be assisted to make a decision? This then poses a question more fundamental than content and formatting, namely, as has recently been asked [79]: ‘What do we want to achieve by publishing patient care performance information?’ (p.398). Responding by saying that such reporting is mandatory, or by reiterating the theoretical rationale, does not clarify the actual utility of the report to the end-user.

As we have seen, such reports tend to be under-utilized by the general public, but responded to, albeit in a highly targeted fashion, by services themselves. Whether the reports are actually doing their job in terms of the initial objectives (informing choice, promoting quality, making services accountable) tends to be a neglected question. It has been noted that the growth in public reporting has not been matched by a growth of research on impact [79]; for example, a literature search for evidence of public reporting of surgeons’ performances influencing patients’ choices failed to find a single article [84]. At the outset we noted that the desired outcome in the National Mental Health Plan is for the public to be able to make informed judgements; what is left unstated is what actions should flow from those judgements. A clear implication is that the implementation of public reporting should be accompanied by a ‘strong’ evaluation (i.e. against explicit criteria) of its direct and indirect effects, and whether the outlay constitutes value for money. A model for the evaluation of Australian mental health policy initiatives is [85].

How public reporting of mental health service organizational performance should be implemented in Australia is difficult to discern, although a start, acknowledging the complexities, has been made [86], suggesting a gradual, staged approach. This sounds good, but there may be a temptation to do the easier aspects early and leave the more difficult for later. The initiative has the potential to significantly alter the organizational landscape.

Footnotes

Acknowledgements

The author is particularly grateful for the input of Tim Coombs.

Declaration of interest: This work is based in large part on a consultancy with the Australian Mental Health Outcomes and Classification Network. The author alone is responsible for the content and writing of the paper.

References

1. Commonwealth of Australia. Fourth National Mental Health Plan: an agenda for collaborative government action in mental health 2009–2014. Canberra: Commonwealth of Australia, 2009.

2. Berwick

James

Coye

. Connections between quality measurement and improvement. Med Care 2003; 41:130–138.

3. Bird

Cox

Farewell

Goldstein

Holt

Smith

. Royal Statistical Society Working Party on Performance Monitoring in the Public Services. Performance indicators: good, bad, and ugly. London: Royal Statistical Society, 2003.

4. Marshall

Shekelle

Brook

Leatherman

. Dying to know: public release of information about quality of health care. London: Nuffield Trust, 2000.

5. Propper

Wilson

. The use and usefulness of performance measures in the public sector. Oxf Rev Econ Policy 2003; 19:250–267.

6. Pidd

. Perversity in public service performance measurement. Int J Product Perf Manag 2005; 54:482–493.

7. Neil

Clarke

Oakley

. Public reporting of individual surgeon performance information: United Kingdom developments and Australian issues. Med J Aust 2004; 181:266–268.

8. Merle

Moret

Pidhorz

. Does comparison of performance lead to better care? A pilot observational study in patients admitted for hip fracture in three French public hospitals. Int J Qual Health Care 2009; 21:321–329.

9. Williams

Schmaltz

Morton

Koss

Loeb

. Quality of care in US hospitals as reflected by standardized measures, 2002–2004. N Engl J Med 2005; 353:255–264.

10.

10. Guru

Fremes

Naylor

. Public versus private institutional performance reporting: what is mandatory for quality improvement? Am Heart J 2006; 152:573–578.

11.

11. Davies

HTO

. Public release of performance data and quality improvement: internal responses to external data by US health care providers. Qual Health Care 2001; 10:104–110.

12.

12. Barr

Giannotti

Sofaer

Duquette

Waters

Petrillo

. Using public reports of patient satisfaction for hospital quality improvement. Health Serv Res 2006; 41:663–682.

13.

13. Goldman

Henderson

Dohan

Talavera

Dudley

. Public reporting and pay-for-performance: safety-net hospital executives’ concerns and policy suggestions. Inquiry: Excellus Health Plan 2007; 44:137–145.

14.

14. Fung

Lim

Y-W

Mattke

Damberg

Shekelle

. Systematic review: the evidence that publishing patient care performance data improves quality of care. Ann Intern Med 2008; 148:111–123.

15.

15. Shekelle

Lim

Y-W

Mattke

Damberg

. Does public release of performance results improve quality of care? A systematic review. London: The Health Foundation, 2008.

16.

16. Shekelle

. Public performance reporting on quality information, Smith

Mossialos

Papanicolas

. Performance measurement for health system improvement: experiences, challenges and prospects. Cambridge: Cambridge University Press, 2010.

17.

17. Heckman

Heinrich

Smith

. The performance of performance standards. J Hum Resour 2002; 37:778–811.

18.

18. Ringel

, Grading neurologists: is it wise? Neurology Today 2005; 5:7–8.

19.

19. Casalino

Alexander

Jin

Konetzka

. General internists’ views on pay-for-performance and public reporting of quality scores: a national survey. Health Aff (Millwood) 2007; 26:492–499.

20.

20. Neuman

Michelassi

Turner

Bass

. Surrounded by quality metrics: what do surgeons think of ACS-NSQIP? Surgery 2009; 145:27–33.

21.

21. Jacobs

McDaid

. Performance measurement in mental health services. Smith

Mossialos

Papanicolas

. Performance measurement for health system improvement: experiences, challenges and prospects. Cambridge: Cambridge University Press, 2010.

22.

22. Rothberg

Benjamin

Lindenauer

. Public reporting of hospital quality: recommendations to benefit patients and hospitals. J Hosp Med 2009; 4:541–545.

23.

23. Utzon

Kaergaard

. Publication of healthcare quality data to citizens: status and perspectives. Ugeskr Laeger 2009; 171: 1670–1674.

24.

24. Schneider

Lieberman

. Publicly disclosed information about the quality of health care: response of the US public. Qual Health Care 2001; 10:96–103.

25.

25. Canto

. Selecting the ideal cardiovascular surgeon: is it possible with public dissemination of clinical outcomes? Editorial. Med Care 2007; 45:585–586.

26.

26. Hibbard

. What can we say about the impact of public reporting? Inconsistent execution yields variable results. Ann Intern Med 2008; 148:160–161.

27.

27. Smith

. On the unintended consequences of publishing performance data in the public sector. Int J Public Admin 1995; 18: 277–310.

28.

28. Werner

Konetzka

Kruse

. Impact of public reporting on unreported quality of care. Health Serv Res 2009; 44:379–398.

29.

29. Werner

Konetzka

Stuart

Norton

Polsky

Park

. Impact of public reporting on quality of postacute care. Health Serv Res 2009; 44:1169–1187.

30.

30. Ganz

Wenger

Roth

. The effect of a quality improvement initiative on the quality of other aspects of health care: the law of unintended consequences? Med Care 2007; 45:8–18.

31.

31. Goodhart

CAE

. Monetary relationships: a view from Threadneedle Street. Papers in Monetary Economics. Canberra: Reserve Bank of Australia, 1975.

32.

32. Danielsson

. The Emperor Has No Clothes: Limits to Risk Modelling. Journal of Banking and Finance 2002; 26:1273–1296.

33.

33. Lester

Roland

. Performance measurement in primary care. Smith

Mossialos

Papanicolas

Leatherman

. Performance measurement for health system improvement. Cambridge: Cambridge University Press, 2010.

34.

34. Checkland

Marshall

Harrison

. Re-thinking accountability: trust versus confidence in medical practice. Qual Saf Health Care 2004; 13:130–135.

35.

35. O'Neill

. Trust with accountability? J Health Serv Res Policy 2003; 8:3–4.

36.

36. Thomas

. Risk adjustment for measuring health care outcomes, 3rd edition. Book review. Int J Qual Health Care 2004; 16:181–182.

37.

37. Dow

Boaz

Thornton

. Risk adjustment of Florida mental health outcomes data: concepts, methods, and results. J Behav Health Serv Res 2001; 28:258–272.

38.

38. Hendryx

Teague

. Comparing alternative risk-adjustment models. J Behav Health Serv Res 2001; 28:247.

39.

39. Li

Cai

Glance

Spector

Mukamel

. national release of the nursing home quality report cards: implications of statistical methodology for risk adjustment. Health Serv Res 2009; 44:79–102.

40.

40. Shahian

Edwards

. Statistical risk modeling and outcomes analysis. Ann Thorac Surg 2008; 86:1717–1720.

41.

41. Shahian

Normand

S-LT

. Comparison of ‘risk-adjusted’ hospital outcomes. Circulation 2008; 117:1955–1963.

42.

42. Shahian

Hutter

Torchiana

Iezzoni

. Transparency: a mandatory requirement for risk models. J Am Coll Surg 2008; 206:1240–1242.

43.

43. Romano

. Peer group benchmarks are not appropriate for health care quality report cards. Am Heart J 2004; 148:921–923.

44.

44. Austin

Alter

Anderson

. The impact of the choice of benchmark on the conclusions of hospital report cards. Am Heart J 2004; 148:1041–1046.

45.

45. McLoughlin

Leatherman

Fletcher

Owen

. Improving performance using indicators. Recent experiences in the United States, the United Kingdom, and Australia. Int J Qual Health Care 2001; 13:455–462.

46.

46. Scott

Ward

. Public reporting of hospital outcomes based on administrative data: risks and opportunities. Med J Aust 2006; 184:571–575.

47.

47. Mor

Angelelli

Jones

Roy

Moore

Morris

. Inter-rater reliability of nursing home quality indicators in the US. BMC Health Serv Res 2003; 3:20.

48.

48. Roy

Mor

. The effect of provider-level ascertainment bias on profiling nursing homes. Stat Med 2005; 24:3609–3629.

49.

49. Sangl

Saliba

Gifford

Hittle

. Challenges in measuring nursing home and home health quality: lessons from the first national healthcare quality report. Med Care 2005; 43: 124–132.

50.

50. Goldstein

Speigelhalter

. League tables and their limitations: statistical issues in comparisons of institutional performance. J R Stat Soc A 1996; 159:385–443.

51.

51. Shahian

Torchiana

Shemin

Rawn

Normand

S-LT

. Massachusetts cardiac surgery report card: implications of statistical methodology. Ann Thorac Surg 2005; 80:2106–2113.

52.

52. Mor

. Improving the quality of long-term care with better information. Milbank Q 2005; 83:333–364.

53.

53. Staiger

Dimick

Baser

Fan

Birkmeyer

. Empirically derived composite measures of surgical performance. Med Care 2009; 47:226–233.

54.

54. Willis

Stoelwinder

Lecky

. Applying composite performance measures to trauma care. J Trauma Injury Infect Crit Care 2010; 69:256–262.

55.

55. Smith

Mossialos

Papanicolas

. Performance measurement for health system improvement: experiences, challenges and prospects. Cambridge: Cambridge University Press, 2010.

56.

56. Fong

Marsh

Stokan

Sang

Vinson

Ruhl

. Hospital quality performance report: an application of composite scoring. Am J Med Qual 2008; 23:287–295.

57.

57. O'Brien

Shahian

DeLong

. Quality measurement in adult cardiac surgery: part 2. Statistical considerations in composite measure scoring and provider rating. Ann Thorac Surg 2007; 83:S13–26.

58.

58. O'Brien

DeLong

Dokholyan

Edwards

Peterson

. Exploring the behavior of hospital composite performance measures: an example from coronary artery bypass surgery. Circulation 2007; 116:2969–2975.

59.

59. Jacobs

Goddard

Smith

. Are composite measures a robust reflection of performance in the public sector? Centre for Health Economics Research Paper 16. York: CHE, 2006.

60.

60. Leckie

Goldstein

. The limitations of using school league tables to inform school choice. J R Stat Soc A 2009; 172:835–851.

61.

61. Rothberg

Morsi

Benjamin

Pekow

Lindenauer

. Choosing the best hospital: the limitations of public quality reporting. Health Aff (Millwood) 2008; 27:1680–1687.

62.

62. Australian Curriculum, Assessment and Reporting Authority (ACARA). MySchool. Frequently Asked Questions about MySchool [accessed 17 March 2011] Available from URL: http://www.acara.edu.au/verve/_resources/FAQs.pdf.

63.

63. Edgeman-Levitan

Cleary

. What information do consumers want and need? Health Aff (Millwood) 1996; 15:42–56.

64.

64. Leatherman

McCarthy

. Public disclosure of health care performance reports: experience, evidence and issues for policy. Int J Qual Health Care 1999; 11:93–101.

65.

65. Hibbard

Jewitt

. Will quality report cards help consumers? Health Aff (Millwood) 1997; 16:218–228.

66.

66. Oldham

Golden

Rosof

. Quality improvement in psychiatry: why measures matter. J Psychiatr Practice 2007; 14:S8–17.

67.

67. Hibbard

Peters

. supporting informed consumer health care decisions: data presentation approaches that facilitate the use of information in choice. Annu Rev Public Health 2003; 24:413–433.

68.

68. Peters

Dieckmann

Dixon

Hibbard

Mertz

. Less is more in presenting quality information to consumers. Med Care Res Rev 2007; 64:169–190.

69.

69. Hibbard

Greene

Daniel

. What is quality anyway? Performance reports that clearly communicate to consumers the meaning of quality of care. Med Care Res Rev 2010.

70.

70. Mattke

Reilly

Martinez-Vidal

McLean

Gifford

. Reporting quality of nursing home care to consumers: the Maryland experience. Int J Qual Health Care 2003; 15:169–177.

71.

71. Vaiana

McGlynn

. What cognitive science tells us about the design of reports for consumers. Med Care Res Rev 2002; 59:3–35.

72.

72. Donnelley

. Mental Health Project Final Report: National Benchmarking Project Report 2. Edinburgh: The Scottish Government, 2008.

73.

73. Bremer

Scholle

Keyser

Houtsinger

Pincus

. Pay for performance in behavioral health. Psychiatr Serv 2008; 59: 1419–1429.

74.

74. Bauer

. A review of quantitative studies of adherence to mental health clinical practice guidelines. Harv Rev Psychiatry 2002; 10:138–153.

75.

75. McGlynn

Asch

Adams

. The quality of health care delivered to adults in the United States. N Engl J Med 2003; 348:2635–2645.

76.

76. Institute of Medicine. Improving the quality of health care for mental and substance-use conditions. Washington, DC, 2005. [cited 17 March 2011]. Available from URL: www.nap.edu/catalog.php?record_id = 11470#toc or www.iom.edu/?id=30858.

77.

77. Stein

Kogan

Essock

Fudurich

. Views of mental health care consumers on public reporting of information on provider performance. Psychiatr Serv 2009; 60:689–692.

78.

78. Astagneau

L'Heriteau

. Surveillance of surgical-site infections: impact on quality of care and reporting dilemmas. Curr Opin Infect Dis 2010; 23:306–310.

79.

79. Fletcher

. Hand hygiene and infection in hospitals: what do the public know; what should the public know? J Hosp Infect 2009; 73:397–399.

80.

80. Kravchuk

Schack

. Designing effective performance-measurement systems under the Government Performance and Results Act of 1993 Public Adm Rev 1996; 564:348–358.

81.

81. Guru

Naylor

Fremes

Teoh

. Publicly reported provider outcomes: the concerns of cardiac surgeons in a single-payer system. Can J Cardiol 2009; 25:33–38.

82.

82. Duckett

Collins

Kamp

Walker

. An improvement focus in public reporting: the Queensland approach. Med J Aust 2008; 189:616–617.

83.

83. Ferguson

. Reporting for provider performance: should we punish the bad, or try to make them all good? Am Heart J 2006; 152:410–413.

84.

84. Henderson

Henderson

. Provision of a surgeon's performance data for people considering elective surgery. Cochrane Database of Systematic Reviews 2010;11:CD006327.

85.

85. Bassilios

Pirkis

Fletcher

Burgess

Gurrin

King

Kohn

Blashki

. The complementarity of two major Australian primary mental health care initiatives. Aust N Z J Psychiatry 2010; 44:997–1004.

86.

86. Australian Mental Health Outcomes and Classification Network. Public reporting of organisational performance: a review of the literature, 2010.