Abstract
In the implementation of diagnosis-related groups (DRGs), hospitals respond to price changes by incorporating more patients into the more profitable DRGs, thereby providing evidence for upcoding. This study proposes a two-stage DRGs grouper (ML-DRG) to alleviate the risk of upcoding. The ML-DRG employs machine learning methods to build a predictive model of patients’ clinical resource consumption and assigns the model output as the resource consumption index, which comprehensively considers various patients characteristics and is challenging to modify. We utilize the data from the Chengdu Healthcare Security Administration of China, covering the period from 2011 to 2018, to compare the performance of the proposed method with the 3 mainstream approaches. Our findings indicate that the intracranial hemorrhagic disease (BR1) group and respiratory infection/inflammation disease (ES2) group of ADRG were divided into 4 DRGs, with the coefficient of variation of each group being less than .8. Among the 4 grouping methods, ML-DRG demonstrated the best performance. These findings suggest that the application of ML-DRG may reduce the risk of upcoding by helping hospitals avoid selecting incorrect DRG codes for higher reimbursement rates.
● This paper proposes a two-stage DRG grouper to reduce the DRG upcoding risk.
● In the first stage, machine learning aims to output the patient’s expected cost. In the second stage, the decision tree model is grouped with the characteristics of medical insurance type.
● Scientific and reasonable DRG grouping based on a data-driven method is of great practical significance for the implementation of the DRG payment method.
● The proposed method can describe the non-linear complex impact of variables such as gender, age, Complication & Comorbidity, hospital grade, and medical insurance type on the level of patient resource consumption.
● The two-stage DRG grouper had the best performance and the lowest upcoding risk among the 4 grouping methods.
Introduction
The diagnosis-related groups (DRGs) are a patient classification scheme that relates the type of patients a hospital treats (ie, its case mix) to the costs incurred by the hospital. The DRGs aggregate patients into clinically similar groups with similar resource intensity, and each group of patients has demographic, diagnostic, and therapeutic attributes in common that determine their consumption of health care resources. Under this reimbursement system, hospitals improve service efficiency and control costs, allowing the hospitals obtain surplus benefits and the patients receive the care they need without unnecessary charges. However, under imposed budget constraints, there is a risk that incorrect DRG codes may be selected by health care institutions to obtain a higher reimbursement rate. This is referred to as DRG upcoding, 1 whose negative effect has been widely observed in health care payment systems in various countries.2 -4 A study found that 10 000 out of 60 000 annual reimbursed claims for present-on-admission infections (18.5%) were upcoded as hospital-acquired infections, costing Medicare $200 million. 5 It is evident that upcoding has caused considerable unreasonable compensation and wasted a huge amount of health care insurance funds. Moreover, upcoding occurred in hospitals to relieve financial stress, and without any increase in intensity or quality of care.6 -9 Therefore, it is necessary to conduct DRG grouping optimization research to limit the risk of upcoding caused by information asymmetry, to smoothly facilitate the implementation of DRGs.
In DRG grouping, it is essential to establish decision-making rules based on individual patient similarities, including age, gender, comorbidities, and other information. Healthcare institutions may exploit this rule to add non-existent Complication & Comorbidity (CC) to the patients’ reported medical records so that they can divide the case into DRGs with higher reimbursement.10,11 To mitigate the upcoding risk, the healthcare payment system in many countries have created DRGs with major diagnoses, the average length of stay, payment weight, and other factors, updating them annually.12 -14 The selection of other grouping features affects the rationality of DRG grouping.15 -17 Studies have shown that factors such as the type of medical insurance and the level of the hospital significantly influence the cost of hospitalization. 18 Therefore, when characterizing the consumption of medical resources by patients, the various types of healthcare insurance and healthcare institutions in countries or regions should be considered in the DRG grouping system.
With the increasing applications of artificial intelligence in the healthcare industry, the use of machine learning methods to develop the grouping system presents significant opportunities to overcome DRGs’ coding limitations and enhance the quality of care. 19 A method designed and updated by machine learning algorithms inherits the decision-making rules from the expert-oriented grouping and improved performance by incorporating continuous updates at minimal cost.20 -22 Machine learning and deep learning have been extensively applied in DRG upcoding detection,23,24 yet its application in the DRG grouping system remains relatively limited.
To address these gaps, this paper utilizes the characteristics of patients (such as medical insurance type, age, gender, CC, etc.) to explore a data-based grouping method developed using machine learning. To reduce the DRG upcoding risk induced by the grouping method and better fit the resource consumption level of patients, the prediction of patient costs by the method is used as the resource consumption index of patients. This indicator is difficult to alter, thereby limiting the potential for upcoding and enhances the ability to characterize the consumption of inpatient resources.
Method
Data Source
The data were collected from the Chengdu Healthcare Security Administration of China, covering medical expense and detailed diagnosis of patients enrolled in urban employee basic medical insurance scheme (UEBMI) and urban resident basic medical insurance scheme (URBMI) from 2011 to 2018. The initial sample size was 18 699 407 records, with 8 054 738 records retained after excluding cases with incorrect features (such as year, location, code logic, age, hospital grade, etc.). To ensure the stability of DRG grouping results, this study excluded outliers cases with expenses below the 1st percentile or above the 99th percentile in each ADRG group. After screening, valid data from a total of 7 893 466 records were obtained. ADRG grouping was performed on the aforementioned 7 893 466 cases, in accordance with the CHS-DRG grouping guidance document. In this study, 2 representative ADRGs were selected to subdivide the DRGs, namely intracranial hemorrhagic disease (BR1) and respiratory infection/inflammation disease (ES2). Figure 1 illustrates the detailed record selection process.

Flowchart for medical insurance data selection.
The Two-Stage DRG Grouper Developed Using Machine Learning
In the DRG grouping process, particular attention must be given to the subdivision of ADRG to DRG, ensuring that each group reflects a similar level of clinical complexity. Typically, a score is assigned to each patient using various methods (eg, CHS-DRG, PCCL, ECC), which represent the patient's clinical complexity (see Supplemental Material). Among these methods, adding non-existent CC may directly alter the patient’s DRG grouping, resulting in DRG upcoding. Machine learning approaches, particularly ensemble methods such as random forest and XGBoost, establish a mapping between the comprehensive clinical context of a case and its corresponding DRG category by leveraging data-driven, multi-feature correlation modeling. Its multi-model voting mechanism attenuates the influence of individual coding errors, thereby markedly decreasing DRG classification’s dependence on single-dimensional rules and reducing the susceptibility of traditional methods to both intentional and unintentional variations in diagnostic coding. 21
Both the PCCL and ECC models aim to depict the impact of CC on inpatient costs. The PCCL grouping method identifies the relationship between CC and hospitalization expenses using multiple linear regression. 13 The ECC uses a function to model the relationship between CC and hospitalization costs. 14 Both of the above models actually perform cost prediction, which is the strength of machine learning methods. Furthermore, machine learning can incorporate additional variables. The nonparametric Mann–Whitney U test was employed to examine differences in variables with 2 categories, such as gender, CC, and healthcare insurance type. Differences in continuous variables, including hospital grade and age, were assessed using the Kruskal–Wallis test. The results for disease BR1 and ES2 were shown in Supplemental Material, Tables S1 and S2. It was found that hospital level and healthcare insurance type were associated with statistically significant differences in hospital expenditure, and these characteristics were thus considered relevant for DRG grouping.
Following the analysis, a two-stage grouping method can be employed. In the first stage, machine learning methods (specific details of the model can be found in the Supplemental file) are used to develop a cost prediction model. This step aims to output the expected cost of the patient, which also reflects their clinical complexity. In contrast to CHS-DRG, PCCL, and ECC, which reflect medical costs through the straightforward accumulation of comorbidities during grouping, the random forest and XGBoost models employed in this study can automatically learn the nonlinear relationships between multiple patient features and medical resource consumption, thereby mitigating the risk of DRG upcoding associated with the manual addition of comorbidities. In the second stage, the decision tree model is used to group patients based on characteristics of medical insurance type, hospital level, and clinical complexity derived from the machine learning model, generating interpretable rules. This grouping method (ML-DRG) considers the output of machine learning as clinical complexity, offering transparency to the previously opaque results. Additionally, leveraging the robust fitting capabilities of machine learning, this method better uncovers the complex nonlinear relationship between various patient attributes, fully capturing patient specificity.
This method pertains to a clinical prediction model study and strictly adheres to the reporting guidelines outlined in the TRIPOD Statement to ensure the transparency and completeness of study design, data processing, and result reporting. 25
Evaluation Index of DRG Grouping Effect
The coefficient of variation (CV) reflects the degree of dispersion among sample medical records within the group. A CV value of less than 1 indicates high consistency of resource consumption within the group, allowing for its division into a DRG. A CV value below .8 signifies a superior grouping effect.
The calculation formula of CV is as follows:
Where
The reduction in variance (RIV) assesses the heterogeneity between DRGs by calculating the ratio of variation among groups to total variation after decomposing a dataset into multiple groups. A higher RIV value indicates greater heterogeneity between groups, signifying a superior grouping effect.
The RIV is calculated as follows:
Where
The Simulation Process of Upcoding Risk
To compare the upcoding risk associated with the 4 grouping methods (CHS-DRG, PCCL, ECC, ML-DRG), their robustness in handling changes in CC was assessed through simulation. In contrast to traditional static data verification, simulation can dynamically model the coding adjustment process, more accurately representing the actual clinical situation where coding varies with medical records, and more effectively reflecting the risk of upcoding.
The simulation process is illustrated in Figure 2. In the simulation, the upcoding probability (P) is first generated to indicate that cases with this ratio are upcoding. Specifically, a random number between 0 and 1 with a uniform distribution. If the random number exceeds (1 − P), it indicates that the case is upcoding. If a case record is not highly coded, the original data remains unchanged. When a case record is upcoded, a diagnostic code is randomly selected from the preset coding set and added to the diagnostic code sequence of the current record. The grouping outcomes of the simulation are then compared with the original results, and the total number of cases with changed grouping outcomes is recorded.

Simulation process of upcoding risk in grouping.
Results
Subgroup Outcomes—Based on CHS-DRG, PCCL, ECC, and ML-DRG
The medical insurance type and hospital grade were included as grouped variables in the subsequent DRG grouping process. Along with the clinical complexity indicators from different grouping methods, a total of 3 variables were combined. Further subdivision ceased when the CV fell below .8 to facilitate the evaluation of different grouping methods. Table 1 presents the grouping results by different grouping method.
Subgroup Outcome by Different Grouping Method.
Based on the variables of medical insurance type and hospital grade, the DRG grouping was further subdivided, indicating that a value of 1 corresponds to medical insurance for urban workers, while a value of 2 indicates medical insurance for urban and rural residents. A value of 1 for hospital grade indicates first-class hospitals, a value of 2 corresponds to secondary hospitals, and a value of 3 represents tertiary hospitals.
The disease BR1 records were classified into 18 DRGs using the CHS-DRG grouping method, with CV values for 3 DRGs exceeding 1. This indicates that the performance of CHS-DRG requires improvement.
A total of 8 DRGs were obtained after classifying the BR1 disease records using the PCCL grouping method. The CV values of all DRGs were below 1, with 6 DRGs exhibiting CV values less than .8. The disease ES1 records were classified into 4 DRGs, all of which had CV values below .8. This indicates that the performance of the PCCL method is satisfactory.
After the BR1 disease records were subdivided using the ECC method, a total of 10 DRGs were identified, with CV values below .8 observed in 6 of these groups. A total of 4 DRGs were identified for the disease group ES2, with all groups exhibiting CV values below .8. However, the CV values of a few DRGs still exceed 1, indicating that there is room for optimization in the grouping performance of this method.
A total of 4 DRGs were obtained from the BR1 disease records using the ML-DRG method, with each group having a CV value below .8, relying solely on the division of the ML value. Similarly, 4 DRGs were derived after grouping the ES2 disease patients, with the ML value being the sole variable used for grouping. The CV values of all DRGs were below .8, indicating that the performance of this grouping method is excellent.
Comparison of Effects Between Different Grouping Methods
To provide a comprehensive comparison of the performance of 4 grouping methods (CHS-DRG, PCCL, ECC, and ML-DRG), 5 indicators were selected for evaluation: weighted CV, RIV, the number of DRGs, the number of DRGs with CV below 1, and the number of DRGs with CV below .8. Comparing the grouping effects of BR1 and ES2 disease records reveals that the two-stage DRG grouper (ML-DRG) proposed in this study demonstrates superior performance across all indicators, as shown in Table 2. Although the number of DRGs generated by ML-DRG is relatively small, its CV value is lower than those of the other 3 grouping methods. This indicates that the differences among sample cases within the same DRGs are smaller, signifying a higher consistency of resource consumption within the group. The RIV value obtained using ML-DRG is higher, indicating greater heterogeneity among different groups, thus reflecting a superior grouping effect.
BR1 and ES2 Grouping Outcome Comparison.
To compare DRG grouping performance under varying upcoding probabilities (P), a set of probabilities was established, including 1%, 5%, 10%, 20%, and 50%. The simulation results are presented in Table 3.
Upcoding Risk Simulation Outcomes of BR1 and ES2 Disease Group.
In the BR1 disease group, the small patient count and limited simulation iterations resulted in random outcomes. Consequently, the 4 grouping methods showed no significant differences in the number of the upcoded individuals within this disease group. Only when P is set at 50%, did the number of the upcoded individuals obtained using ML-DRG fall to less than half that of CHS-DRG and PCCL. However, ECC exhibited a large decrease factor, resulting in an increased probability that any additional diagnostic code would alter patient grouping outcomes. In the ES2 disease records, regardless of the upcoding probability (P), the upcoding risk associated with ML-DRG was significantly lower than that of the other 3 grouping methods. These findings indicate that the ML-DRG proposed in this study demonstrated the best robustness among various grouping methods in response to changes in diagnostic coding, maintaining the lowest upcoding risk. This advantage was particularly evident in disease records with a large patient population.
Discussion
This study developed a data-driven approach to detect and mitigate DRG upcoding, aiming to reduce inappropriate expenditures. Furthermore, it optimized DRG grouping outcomes and expanded the relevant grouping methods. Managerial insights and implications can be derived from our findings.
Complication & Comorbidity (CC) is critical grouping characteristics in DRG categorization. Upcoding is also related to the differences in the methods used to handle CC. For instance, adding non-existent CC may directly alter the patient's DRG grouping results, leading to DRG upcoding. A key issue in DRG upcoding risk management is to reduce the sensitivity of the grouping method to both intentional or unintentional changes in diagnostic coding. Data-driven grouping can minimize the artificial manipulation involved in adding non-existent comorbidities, thereby fulfilling the requirements of the DRG system. In contrast to existing studies, 20 this article performs data analysis and simulation using real medical data, achieving relatively consistent grouping accuracy. Moreover, it offers advantages of high transparency and low cost in both the design and update process.
The proposed method effectively characterizes the nonlinear and complex impacts of variables such as gender, age, CC, hospital grade, and medical insurance type on patient resource consumption levels. Existing studies utilizing the CHS-DRG grouping scheme employed the E-CHAID algorithm to categorize costs, calculating a CV of 1.18 for the ADRG group of cerebral ischemic disease. 12 This value significantly exceeds .8, indicating that the grouping effect is inferior to that of the method presented in this article. This improvement illustrates that distinguishing between medical insurance types and hospital grades has led to a significant enhancement in the CV. Different disease groups may influence patient selection strategies in hospitals at varying levels, potentially facilitating the process of graded diagnosis and treatment to some extent.
Historically, most machine learning methods were unsuitable for DRG grouping because they could not generate “if-then” rules. However, this study employs machine learning methods and decision tree to generate rules that are easily interpretable. This finding aligns with existing research indicating that early and accurate DRG classification using ML methods can enhance the utilization of resources such as operating rooms and beds. 21 This demonstrates that the DRG grouping designed with machine learning clarifies the complex nonlinear relationships among various grouping features. Furthermore, this approach provides a more accurate and rational description of patient resource consumption, facilitating the mitigation of upcoding risk caused by information asymmetry.
Conclusions
This study examines the upcoding risk in the DRG practices and proposes a two-stage grouping method utilizing machine learning (ML-DRG). The proposed method effectively characterizes the nonlinear complex impacts of variables such as gender, age, CC, hospital grade, and medical insurance type on patient resource consumption level. Furthermore, it compares the 3 prevailing grouping methods (CHS-DRG, PCCL, ECC) with the two-stage grouping method proposed in this study. ML-DRG demonstrated the best performance and the lowest upcoding risk among the 4 grouping methods. Scientific and rational DRG grouping based on a data-driven approach holds significant practical importance for the implementation the DRG payment system while also providing a theoretical foundation and support for other regions during the DRG promotion phase.
Supplemental Material
sj-docx-1-inq-10.1177_00469580251389813 – Supplemental material for Reducing the Risk of Upcoding in DRG Grouping Through a Two-Stage DRG Grouper Based on Machine Learning
Supplemental material, sj-docx-1-inq-10.1177_00469580251389813 for Reducing the Risk of Upcoding in DRG Grouping Through a Two-Stage DRG Grouper Based on Machine Learning by Haitian Wang, Li Luo, Dongyuan Ma, Zhecheng Xie and Yuanchen Fang in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Supplemental Material
sj-pdf-2-inq-10.1177_00469580251389813 – Supplemental material for Reducing the Risk of Upcoding in DRG Grouping Through a Two-Stage DRG Grouper Based on Machine Learning
Supplemental material, sj-pdf-2-inq-10.1177_00469580251389813 for Reducing the Risk of Upcoding in DRG Grouping Through a Two-Stage DRG Grouper Based on Machine Learning by Haitian Wang, Li Luo, Dongyuan Ma, Zhecheng Xie and Yuanchen Fang in INQUIRY: The Journal of Health Care Organization, Provision, and Financing
Footnotes
Acknowledgements
We would like to thank Dr. Jie Xiang and Dr. Xiaofei Liu for their support for this research.
Authors’ Note
Zhecheng Xie is now affiliated to Fu Foundation School of Engineering and Applied Science, Columbia University, New York, NY, USA.
Ethical Considerations
This study primarily involves the development and comparison of a DRG grouping method based on machine learning using existing, aggregated data on patient characteristics (such as age, complications and comorbidities, hospital grade, and medical insurance type), hospital costs, and reimbursement rates. There is no direct interaction with individuals, no collection of new personal data, and no interventions on or involving patients. Due to the absence of direct human subject involvement, the retrospective and non-interventional nature of the research, and the safeguarding of data privacy, this study meets the criteria for exemption from ethical review.
Author Contributions
Haitian Wang: Methodology, formal analysis, visualization and writing original draft. Li Luo: Conceptualization, methodology, software, visualization and original draft writing. Dongyuan Ma: Software, visualization and original draft writing. Zhecheng Xie: Software, formal analysis and original draft writing. Yuanchen Fang: Supervision, validation and review & editing writing.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was sponsored by the National Natural Science Foundation of China (grant number 72342014, 72371176 and 72102159), and the Key Research Plan of Sichuan Province (grant number 2023YFS0317).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the Chengdu Healthcare Security Administration of China.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
