Abstract
Objective
Since the Guangxi government implemented public county hospital reform in 2009, there have been no studies of county hospitals in this underdeveloped area of China. This study aimed to establish an evaluation indicator system for Guangxi county hospitals and to generate recommendations for hospital development and policymaking.
Methods
A performance evaluation indicator system was developed based on balanced scorecard theory. Opinions were elicited from 25 experts from administrative units, universities and hospitals and the Delphi method was used to modify the performance indicators. The indicator system and the Topsis method were used to evaluate the performance of five county hospitals randomly selected from the same batch of 2015 Guangxi reform pilots.
Results
There were 4 first-level indicators, 9 second-level indicators and 36 third-level indicators in the final performance evaluation indicator system that showed good consistency, validity and reliability. The performance rank of the hospitals was B > E > A > C > D.
Conclusions
The performance evaluation indicator system established using the balanced scorecard is practical and scientific. Analysis of the results based on this indicator system identified several factors affecting hospital performance, such as resource utilisation efficiency, medical service price, personnel structure and doctor–patient relationships.
Keywords
Introduction
In China, county-level public hospitals are the core providers of medical and health services in each county and form the top level of care in the rural three-tier healthcare network. In addition, these institutions connect the medical and health systems of urban and rural areas. Public county hospitals are used for the treatment of common diseases, rehabilitation from serious diseases and the referral of difficult diseases. Public county hospitals also oversee training and guidance for grassroots medical institutions and the management of natural disasters and public health emergencies. In 2009, the Chinese State Council approved the Opinions of the CPC Central Committee and the State Council on Deepening the Health Care System Reform 1 and the Implementation Plan for the Recent Priorities of the Health Care System Reform (2009–2011). 2 These contained five main tasks, one of which was to promote public hospital reform. Furthermore, county-level public hospital reform is a key component of public hospital reform, as it facilitates access to lower-cost medical services. In 2012, the General Office of the State Council issued the Opinions of Pilot Projects for County-level Public Hospital Reform, 3 which focused on county-level public hospitals and prioritised their development.
Based on central reform guidelines and the local context, Guangxi Province implemented two batches of county-level public hospital reform pilots in 2012 and 2013, which involved 115 hospitals in 40 counties. At the end of 2015, the remaining 103 county-level public hospitals in 36 counties were reformed; thus, the pilots achieved full coverage and substantial advances were made toward the principle of ‘ensure a foundation, strengthen the grassroots, construct the mechanism’. To further improve reform and identify problems affecting this process, we need to evaluate the performance of county-level public hospitals. Since the implementation of county-level public hospital reform in 2012, research in different Chinese provinces has focused on how to establish a set of scientific and effective indicator systems to evaluate county-level public hospital performance. As Guangxi is an underdeveloped region that is home to the Zhuang ethnic minority group, it differs from other provinces in terms of its social customs. Therefore, a matched performance evaluation system for Guangxi county hospitals that closely reflects the social and cultural context is needed.
The balanced scorecard (BSC), introduced by Kaplan and Norton in 1992, is a popular performance management system that categorises organisational goals into four measurable and operable perspectives: Learning and Growth, Financial, Customer and Internal Business Process. 4 The BSC has been successfully used worldwide in many institutions, such as government units, manufacturing companies, service organisations and non-profit companies.5–8 For example, researchers at Duke Children’s Hospital in the USA worked with managers using the BSC. After 3 years’ implementation of the system, they had turned the hospital’s deficit into a profit, reduced costs and increased patient satisfaction. 9 Early in 1994, the representatives of some Alberta and Ontario hospitals, the University of Toronto and government and policy groups explored the application of the BSC in hospital performance measurement in Canada. 10 The BSC system has also been used in Europe. In the UK, the BSC has been successfully used for key government projects; both the Olympic Delivery Authority and the High Speed 2 railway project have used the BSC to summarise their procurement policies. In addition, the UK Department of Health has used the BSC to evaluate the performance of the National Health Service’s information technology strategy. 11 In Sweden, Bern University Hospital has designed a BSC system for the department of anesthesiology 12 and in 2002, the Netherlands launched a campaign to establish performance evaluation indicators for the national health system. 13 In 2000, the BSC began to be used in healthcare in China and generated a wide range of research and applications.
Methods
Data source
To establish a performance evaluation indicator system for county hospitals, we consulted professional persons in healthcare and studied research on hospital performance evaluation in China and other countries. We generated an indicator framework based on the BSC. Then, we used the Delphi method to modify and improve the framework and produce a final indicator system. We used the indicator system in a case study of five county hospitals randomly selected from the third-batch county hospital reform pilots in Guangxi. To evaluate the hospitals, we used data from questionnaires distributed by the Guangxi Zhuang Autonomous Region Health and Family Planning Commission. The questionnaires were completed by medical staff in the relevant departments and collected by each hospital liaison. Trained investigators obtained patient satisfaction data using one-to-one questionnaire interviews at each hospital. The Topsis method was used with the indicator system to evaluate these hospitals’ performance. Microsoft Office Excel 2007 (Microsoft, Redmond, WA, USA) and IBM SPSS Statistics, version 19 (IBM Corp., Armonk, NY, USA) were used for all calculations. The study protocol was approved by the Medical Ethics Committee of Guangxi Medical University. All staff and patients provided verbal informed consent before the study began.
Establishment of the performance evaluation indicator system
The indicator system framework we constructed was based on BSC theory. The framework was generated by consulting professional persons in healthcare and reviewing research on hospital performance evaluation from China and other countries. Figure 1 shows the performance evaluation indicator framework used in this study. The Delphi method was used to filter the indexes and grade the importance of the indicators. The relative weights of each indicator were determined using the analytic hierarchy process method. Finally, the reliability and validity of the indicator system were tested.

Evaluation indicator framework based on the balanced scorecard.
The Delphi method
The importance of each index was categorised according to five levels: very important, important, normal, unimportant and very unimportant. We selected 25 experts from administrative units, universities and hospitals, choosing individuals with a good knowledge of county hospital reform. We administered self-designed questionnaires to the experts, who provided suggestions for modifying the indicator framework and graded the importance of the indicators. This feedback was used to revise the indicator system. Table 1 shows basic demographic information about the experts who participated in the Delphi process.
Basic information of experts who participated in the Delphi process
Reliability of the expert suggestions
We used Cr to test the reliability of the expert suggestions (Cr = the average of Ck, Ca and Cs; Ck = the knowledge level of experts, Ca = the experts’ judgement basis and Cs = the experts’ familiarity with each indicator). Larger values of Cr indicated greater expert reliability. Values of Cr >0.7 indicated good reliability of expert suggestions. 14 Different criteria were used to assign Ck, Ca and Cs values. Ck values were based on each expert’s professional title: senior titles were scored as 1.0, vice-senior titles as 0.9 and intermediate titles as 0.7. Ca values were based on types of judgement basis: theoretical analysis was scored as 0.8, practical experience as 0.6, knowledge from peers as 0.4 and intuition as 0.2. Cs values were based on expert familiarity with each indicator: very familiar was scored as 1.0, familiar as 0.75, generally familiar as 0.50, unfamiliar as 0.25 and very unfamiliar as 0.00.
Concordance of the expert suggestions
Once a consensus of expert opinion is reached, the Delphi process should be concluded. To test the concordance of expert suggestion, we calculated Kendall’s coefficient of concordance (W) using Equations (1) and (2). m represents the number of experts, n represents indicators graded by experts, Ri represents the summation of rank assigned to the ith indicator.
The analytic hierarchy process method
We then transformed the importance scores of the indicators into index-weighted scores using the analytic hierarchy process method. This method was proposed by T. L. Saaty in 1970 and is a popular multicriteria decision-making method that combines quantitative and qualitative analysis.
15
It has been widely used to calculate indicator weights in many studies on hospital management, environmental protection and other areas.16–18 The calculation process is as follows:
Based on Saaty’s scale of pairwise comparisons, we translated the importance to value aij using pairwise comparison between two indicators from the same level.
19
A judgement matrix was then produced: A = {aij}. We first calculated the initial weight coefficient Wi′ using Equation (3). In Equation (3), m represents the number of indicators in the same level, aij represents the scale value obtained by pairwise comparison between two indicators. The weight Wi was calculated using Equation (4):
After obtaining the indicator weights, we needed to determine the degree of consistency to check the logicality of the indicator system. The consistency ratio was calculated (CR, CR = CI/RI). Generally, if CR ≤ 0.1, matrix A is considered acceptable. Otherwise, the matrix needs to be adjusted.
20
In Equation (5), CI = the consistency index calculated using Equations (5) to (7), RI = the random index, with values assigned using Saaty’s scale of pairwise comparisons
21
and λmax represents the largest eigenvalue. A good consistency is generally assumed if m is no larger than 2; if m is larger than 2, the consistency is acceptable only if CR is less than 0.10.
22
Reliability and validity
After establishing the performance evaluation indicator system, we needed to check its reliability and validity. Reliability was measured using Cronbach’s coefficient alpha: an alpha larger than 0.6 indicated that the factors were reliable. 23 We measured both content validity and construct validity. Construct validity was measured using the Kaiser–Meyer–Olkin (KMO) and Bartlett’s tests. Content validity was assessed according to the source of the information used to develop the system.
Performance evaluation using the Topsis method
The Topsis method was used to evaluate the performance based on the established indicator system. Topsis (a Technique For Order Preference By Similarity To An Ideal Solution) is an effective multiobjective decision method. Its advantage is that it has no special data requirements and preserves the original data information.
24
In addition, the results can be presented in the form of ranks, which is very intuitive. Its calculation steps are as follows:
Normalise all data to allow comparisons across criteria. For efficiency indicators, larger values represent a more positive result, such as the indicator of cure rate. For cost indicators, larger values represent a more negative result, such as the indicator of outpatient expense.25 Negative indicators must be transformed into positive indicators using the reciprocal method or the difference method. Process the data using the normalisation method shown in Equation (8).
Find the optimal vector Z+ and worst vector Z− and calculate the difference (D+) between Zij and Z+ using Equation (9), and the difference (D−) between Zij and Z− using Equation (10); m represents the number of indicators, n presents the number of hospitals evaluated and aj represents the weight of each indicator.
Calculate the relative similarity (Ci) between Zij and the best solution using Equation (11).
Results
The performance evaluation indicator system and weights
All 25 invited experts responded (response rate: 100%). The Crs were 0.84, 0.80, 0.83 and 0.84 for the perspectives of Learning and Growth, Financial, Customer and Internal Business Process, respectively. The average Cr was larger than 0.7, which indicated that the expert suggestions had good reliability. Kendall’s coefficient of concordance (W) was 0.277 (χ2 = 235.458, P = 0.000 < 0.001), indicating that the expert opinions were consistent. Based on the Delphi expert opinions, we repeatedly modified the indicators and eventually developed a performance evaluation system with remarkable consistency (CR < 0.10). The performance evaluation indicator system contained 4 first-grade indicators, 9 second-grade indicators and 36 third-grade indicators. Table 2 shows the performance evaluation indicator system of Guangxi county-level public hospitals and the weights Wi.
Performance evaluation indicator system and weights (Wi)
The average Cronbach’s alpha was 0.837, which is larger than 0.6 and so indicates good reliability. The average KMO was 0.704, indicating that the data were suitable for factor analysis. Bartlett’s test was less than 0.001, indicating that the variables were correlated sufficiently for factor analysis to be performed. The factor analysis showed that the construct validity was acceptable. Furthermore, the development of the indicator system (from the framework construction to the calculation of the weights) had been approved by experts; therefore, the content validity was also appropriate. These tests suggested that our evaluation indicator system could provide reasonable results.
Performance evaluation results calculated using the Topsis method
Tables 3 to 6 show the initial data from the five county hospitals according to the four BSC perspectives. Table 7 shows the Ci and performance ranks of the five county hospitals from the four BSC perspectives and shows the total ranks. For example, hospital B performs the best and hospital D the worst; hospital A is the best in Internal Business Process and Learning and Growth. We discussed these results with the experts and confirmed their agreement of the interpretation.
Consistency index (Ci) and ranks for four balanced scorecard perspectives
Financial indicator data for hospitals A–E
Internal business process indicator data for hospitals A–E
Learning and growth indicator data for hospitals A–E
Customer indicator data for hospitals A–E
Discussion
Many methods are currently used to evaluate performance, such as the key performance indicator method, the target management method and the data envelope analysis method.26–30 However, many of these methods have shortcomings. For example, some performance evaluation methods focus on economic indicators and ignore the growth and development of medical staff, patient satisfaction and internal processes. Some methods place too much emphasis on objective indicators or, conversely, only use subjective surveys and thus lack an objective perspective. In addition, the theoretical foundation of some evaluation indicator systems is not comprehensive and relies on personal experience or judgement instead of consultation with relevant stakeholders. Although their performance evaluation goal is the same, indicator systems vary across different provinces. In view of the shortcomings of previous methods, this study used the BSC to establish an indicator system framework from four perspectives. The Delphi method was used to modify and expand the framework based on expert opinions. This study is the first to combine the BSC with performance evaluation for Guangxi county hospitals; as such, the results may be very useful for Guangxi hospital reform. The results indicated that the level of expert authority was high and the expert opinions tended to be consistent, suggesting that the reliability of the expert suggestions can be trusted. The indicator system was developed based on these expert opinions. As the system showed good reliability and validity, the results of the performance evaluation can be assumed to be accurate.
Analysis of performance evaluation indicator system
The weightings of the first-level indicators showed the following relationship: Financial > Internal Business Process > Customer > Learning and Growth. Each indicator had a different weight at different levels and further analysis of the indicators is discussed below.
Financial perspective
A government policy to cancel drug price increases has meant that all drugs must be sold at their purchase price. Because of this, hospitals have lost some of their income. To balance the income gap, the government has introduced measures such as adjusting the price of medical services, increasing government subsidies, strengthening hospital accounting and saving on running costs. However, these measures have had some negative effects such as inadequate compensation in some areas and inconsistent adjustment of medical service prices, which can make hospitals appear to be operating poorly.31,32 To meet growing medical demands, county hospitals purchase large medical devices, introduce medical expertise and develop advanced medical technology, all of which increases hospital debt. To prevent the reappearance of these problems in the new health care reforms, attention must be paid to good management of funds and efficient medical service price adjustments. Improper use of funds wastes health resources and affects the development of county hospitals. Therefore, public subsidies need to be used properly, medical service prices adjusted on a scientific basis and assets and liabilities controlled properly. The effective management of hospital finances would have a substantial effect on the development of county hospitals.
Internal business process and customer perspective
Finance was identified as the primary problem, but other issues are also important. Both Internal Business Process and Customer indicators are correlated with Finance. As mentioned above, the cancellation of drug price increases has substantially reduced hospital income (Finance). This is likely to reduce the salaries of medical staff and so decrease their enthusiasm for work, which affects work efficiency (Internal Business Process). The Internal Business Process indicator measures work efficiency and work quality status in county hospitals. The Customer indicator measures patient satisfaction with medical services. These two indicators reflect the patient-oriented approach of county hospitals, which are public welfare institutions. Internal Business Process had a greater weighting than Customer because the primary task of county hospitals is to guarantee the quality of medical services and work efficiency. Customer satisfaction is affected by many subjective factors like medical service quality, the service attitude of medical staff and media orientation. Regarding the scientific basis and reliability of performance evaluation, objective indicators have more stability and accuracy than subjective indicators, which may explain why Internal Business Process has a higher weight index than Customer.
Learning and growth perspective
Learning and Growth was ranked last of the four indicators for the following reasons. According to Chinese healthcare system reform policy, the goal of county hospitals is to treat common diseases, transfer patients suffering from difficult and complicated diseases, provide rehabilitation for patients with serious diseases, provide medical guidance and training to personnel in rural areas and oversee public services such as infectious disease control, natural disasters and emergency rescue. The central work of county hospitals focuses on regional medical treatment and public health, which require more practical work than teaching or scientific research. This explains why those indicators have a lower weight. However, county hospitals require a certain number of physicians, nurses and psychiatric beds to ensure medical quality and efficiency, which explains the higher weight for personnel structure. However, there is a lack of high-level talent in most county hospitals in China, (and little difference among county hospitals on this factor); therefore, it is meaningless to try to evaluate this indicator. 33 Furthermore, the flow of talented personnel is affected by regional economy and policy, which county hospitals cannot control. Counties in Guangxi Province are characterised by poor economy, education, living environment and access to cities; therefore, county hospitals will continue to experience problems in attracting talented personnel until the government implements policies to relieve these problems. Therefore, the personnel structure of the hospitals did not reflect a full range of talent and so this indicator was assigned a small weight.
Analysis of performance evaluation results
Hospital B was ranked first on performance. Hospital B scored highest on Finance, indicating that it would be relatively easy for this hospital to improve technology or to employ good staff. Moreover, the ratio of hospital B drug income was the lowest and the examination income ratio was similar to the best, which indicates that hospital B performed well in cancelling drug price increases and adjusting the examination price. Hospital B was ranked second on physician burden of medical treatment per day, which shows a good performance in treating common diseases of local residents. However, hospital B was ranked lowest on patient satisfaction; this result could be attributed to the large burden of medical staff. Excessive workloads can lead to staff being less patient and having a poor attitude to patients.
Hospital D was ranked last on performance. From a Finance perspective, the financial structure of hospital B was unscientific; government grants formed the main part of hospital income and management expenses were the main outgoing. From an Internal Business Process perspective, the physician burden of medical treatment per day was small and the turnover rate of hospital beds was low, which indicated that there were few patients and some beds were superfluous. From the Learning and Growth perspective, hospital D had a high ratio of beds to nurses and the staff structure was problematic: the ratio of health technical staff was low whereas the ratio of executives was high. However, hospital B scored highest on the Customer perspective, because it undertook more social welfare services and public health events than the other four hospitals. Because of its involvement in public services, hospital D received less revenue from medical services, which partly explains its poor medical performance.
Finally, from the Learning and Growth and Internal Business Process perspectives, hospital A performed well on medical quality, with a high utilisation ratio and many patients, which meant that hospital A scored well on treating common diseases of residents in county areas. Hospital B scored less than hospital A on patient expenses and drug income proportion, which is beneficial for patients. That is to say, hospital B performed better on solving the problem of expensive medical treatment. More importantly, hospital B had a higher score on the Financial perspective, and (because Finance was assigned the largest weight) therefore the overall performance score of hospital B was higher than that of hospital A.
Suggestions for the development of county hospitals
In terms of basic investment, the government should strictly control hospital construction criteria, bed numbers and the purchase of large equipment. Furthermore, it should forbid construction or the buying of large equipment if a hospital is in debt. County hospitals should adjust the number of beds according to county resident numbers. Once it reaches the standard scale set out in the national plan, a hospital should be barred from further expansion. Hospitals that exceed the standard or begin construction while in debt should be held accountable.
To reduce patient burden, county hospitals should set a reasonable price for medical services. The Guangxi government has implemented a zero margin drug profit policy and has claimed that county hospitals could address the income loss by adjusting medical service prices, saving costs and obtaining more government grants. However, price adjustments must reflect the labour value of medical staff while considering factors such as county economic development, medical insurance payment capacity and the medical cost burden of residents. County hospitals could obtain extra revenue by providing high-quality or distinctive services and reducing the cost of medical consumables and large medical equipment.
Addressing the shortage of qualified professional personnel is the most important issue for county hospital performance. To solve this problem and attract professionals from higher-level hospitals, a mechanism is needed to increase the personnel flow between urban and rural hospitals. County hospitals could introduce high-quality professionals using project employment, task employment or skills cooperation. However, to attract speciality or scarce personnel, or to address urgent staff shortages, hospitals should increase recruitment by reducing some requirements, such as education and age, and simplify the recruitment procedure. Furthermore, county hospitals should provide focal training to medical staff in key business positions and train core doctors while encouraging them to obtain in-service education.
Improving patient satisfaction and creating good relationships between doctors and patients is also beneficial for performance. Further education in the humanities is first needed for medical staff to strengthen their understanding of medical ethics and retain professionalism. Then, the media needs to strengthen publicity and guide public opinion to encourage people to respect and value health workers. County hospitals should perfect their patient complaint mechanisms and ethics committees should be established to investigate complaints about improper medical behaviour and improve communication channels. If necessary, local government should establish a medical dispute resolution body to ensure the appropriate regulation of medical services. To guarantee the lawful rights and interests of doctors and patients, medical violence must be strictly prohibited. Finally, it is necessary to develop medical accident insurance and medical liability insurance, and to establish a mechanism for sharing medical risk between doctors and patients.
Future research prospects
This study has some limitations. Using the BSC, we evaluated the performance of Guangxi county hospitals from an academic perspective and provided some recommendations for hospital reform. The large number of indicators makes this performance evaluation system problematic to implement in terms of cost and efficiency; further refinement of the system is needed before it can be fully implemented. Because of funding and personnel limitations, we only selected five county-level public hospitals for this case study. Therefore, the system needs to be tested further on a larger sample of hospitals. In addition, the applicability of the performance evaluation system for other types of county-level hospitals, such as Chinese medicine hospitals, needs further investigation.
In future research, we plan to apply this performance evaluation system to additional county hospitals. We are also aiming to expand the range of this case study and explore the use of the indicator system in other types of hospitals, such as county-level Chinese medicine hospitals and maternal and child health care hospitals. In addition, to verify the evaluation results, we aim to compare the suitability of different methods to evaluate performance, such as the comprehensive index method and the rank sum ratio method.
We are also planning further studies using this system to evaluate the performance of hospital departments. These performance results will be combined with management data to provide more comprehensive recommendations for hospital development and decision making.
Footnotes
Author contributions
Hongda Gao generated the initial idea for the study, analysed the data and wrote the manuscript. He Chen and Jun Feng revised the manuscript and modified the English language. Qiming Feng, the corresponding author, designed the study project and provided the funding sources. Jinmin Zhao, a co-corresponding author, participated in designing the study and carried out the study project. Xuan Wang, Shenglin Liang and Xianjing Qin participated in data collection and cleaning. All authors read and approved the final manuscript.
Acknowledgments
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. The authors would like to thank the Guangxi Zhuang Autonomous Region Health and Family Planning Commission for research coordination and data preparation, and thank all participants in this study for their cooperation.
Declaration of conflicting interest
The authors declare that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
