Sage Journals: Discover world-class research

Abstract

Objective

This study aims to develop a customized severity adjustment tool for hospital deaths in pneumonia patients considering characteristics of Korean discharged patients using representative data from the Korea Disease Control and Prevention Agency's Korea National Hospital Discharge In-Depth Injury Survey (KNHDIS).

Methods

We analyzed 46,286 cases of pneumonia hospitalization among KNHDIS data from 2013 to 2022 and developed a model after adjusting for the severity of comorbidities using SAS and Python programs.

Results

Analysis results showed that among three complication adjustment tools, including the existing complication index K-CCI (Korean-Charlson Comorbidity Index) and newly developed m-K-CCI (modified-Korean-Charlson Comorbidity Index) and m-K-CCS (modified-Korean-Clinical Classification Software), m-K-CCS was the best. For model development and evaluation, least absolute shrinkage and selection operator (LASSO), logistic regression, classification and regression tree (CART), random forests, gradient-boosted model (GBM), and artificial neural network (ANN) analyses were performed. Analysis of the validation dataset showed that GBM's m-K-CCS had the highest AUC value of 0.910.

Conclusion

These results suggest that further research is needed on models that adjust for the severity of comorbidities for each diagnosis to more accurately predict health outcomes.

Keywords

Pneumonia comorbidity Charlson comorbidity index (CCI)machine learning big data

Introduction

Approximately 60% of deaths in the United States occur in hospitals. While most deaths are unavoidable, reducing unnecessary deaths is an important goal of health care service.¹ Hospital mortality rate is a representative outcome indicator that evaluates the quality of medical services provided by medical institutions.² To accurately measure and evaluate hospital mortality, it is necessary to adjust for the severity of various patient risk factors that can affect death.³ Severity adjustment is a process of controlling for factors that affect the outcome of interest. Comorbidities are known to be an important factor.⁴ Although comorbidities are not the primary diagnosis that leads to a patient's hospitalization, they are important severity adjustment variables in that they can increase complications, death, length of stay, and medical costs.^5,6

A representative tool for adjusting the severity of comorbidities is the Charlson comorbidity index (CCI). CCI has been used not only in Korea, but also in the US, UK, Canada, and Australia to adjust the severity of hospitalized patients to develop mortality models and evaluate hospital mortality rates.^2,7–10 CCI is a method that can adjust the severity of a patient's condition by giving weight to comorbidities that are highly associated with death. Charlson et al. have selected 19 diseases that are highly predictive of death based on medical records of 604 patients admitted to a New York hospital over a period of one month in 1984. They then assigned weights based on the relative risk of each disease. The sum of these weights became the CCI.^4,11 Since then, CCI has been applied to various diseases and surgeries. Its validity has been proven. Studies have been conducted on how to apply CCI and determine weights.^12–15 Romano et al. have recommended using weights re-estimated from a study population rather than using CCI weights.¹⁶ Quan et al. have suggested the need to update weights developed 30 years ago due to improvements in chronic disease management, treatment, and medical technology. They have updated CCI weights for predicting mortality using data of patients discharged from six countries, including Canada and Australia.¹⁰

Although CCI is a representative severity correction tool used in previous studies, it has several limitations. First, although there are differences in factors affecting a patient's severity depending on the primary diagnosis, the CCI adjusts the patient's severity by giving equal weight to all primary diagnoses.¹⁷ Second, although the CCI developed in 1986 is a widely used tool, it has been pointed out that it has limitations in that it does not reflect many advances or changes in medical technology since its development long time ago.¹⁸ Third, the CCI only reflects 17 comorbidities that can affect mortality. There are cases where comorbidities shown to affect health outcomes in previous studies are excluded.¹⁹ In the United Kingdom, a study was conducted to develop a severity adjustment tool based on hospital discharge data in order to reflect characteristics of discharged patients and improve weighting of comorbidities that could affect mortality over time.²⁰

In Korea, the pneumonia mortality rate was 44.4 per 100,000 people in 2021. Pneumonia was the third leading cause of death. The number of pneumonia patients and medical expenses are expected to gradually increase due to rapid aging.²¹ Pneumonia is a disease for which empirical diagnosis and treatment are important. The Infectious Diseases Society of America (IDSA) has reported that the mortality rate can be reduced by applying management guidelines for community-acquired pneumonia.²² In addition, since there are research results showing that compliance with management guidelines has a positive effect by reducing mortality and length of hospital stay, it is necessary to accurately calculate mortality results that adjust for the severity of pneumonia patients.²³ Recently, research on mortality prediction and model development based on machine learning using big data has been actively conducted.^{17,18,24–26} However, there is almost no research on a severity-adjusted mortality model for pneumonia. Therefore, this study aimed to develop a customized severity adjustment tool for hospital deaths in pneumonia patients considering characteristics of Korean discharged patients using representative data from the Korea Disease Control and Prevention Agency's Korea National Hospital Discharge In-Depth Injury Survey (KNHDIS).

Method

This was a cross-sectional study that utilized big data collected from medical institutions at the national level to develop a severity adjustment tool for accurate health outcome measurement of pneumonia patients.

Study design and data

This study used 1,803,611 cases of Korea National Hospital Discharge In-Depth Injury Survey data from 2013 to 2022 from the Korea Centers for Disease Control and Prevention (KDCA). KNHDIS, a national survey, has been conducted annually since 2005 in Korea. Its data are nationally approved statistical data (Statistics Korea, approval No. 117060) on patients discharged from general hospitals with 100 or more beds.²⁷ To extract subjects with a primary diagnosis of pneumonia, the Agency for Healthcare Research and Quality (AHRQ) Clinical Classification System (CCS) criteria were utilized. CCS is a tool that classifies diagnoses or procedures into clinically meaningful categories. It can be used to integrate various forms of statistical reporting.^28–30 Each classification group of CCS is mapped to a list of international standard terminology codes such as ICD-10 (Table 1). A total of 49,387 subjects with a primary diagnosis of pneumonia were extracted. After excluding those who were discharged such as those who were transferred to another hospital, escaped, others, and 3111 cases who were admitted and discharged on the same day, 46,286 cases were the final analysis subjects. To verify the comorbidity severity adjustment model, we analyzed 24,008 data from 2013, 2015, 2017, 2019, and 2021 for training purposes and 22,278 data from 2014, 2016, 2018, 2020, and 2022 for verification purposes by referring to previous studies. Data for verification purposes was not used as training data for model tuning, but was simply used as validation data for testing (Figure 1).^17,18

Figure 1.

Study design and data.

Table 1.

Definition of pneumonia according to CCS criteria

CCS group	CCS-Diagnosis Description	ICD-10 code
122	Pneumonia (except that caused by tuberculosis or sexually transmitted disease)	A20.2, A21.2, A22.1, A31.0, A42.0, A43.0, A48.1, A78, B01.2+, B05.2+, B25.0+, B58.3+, B59, B67.1, J12.0, J12.1, J12.2, J12.8, J12.9, J13, J14, J15.0, J15.1, J15.2, J15.3, J15.4, J15.5, J15.6, J15.7, J15.8, J15.9, J16.0, J16.8, J17.0, J17.1, J17.2, J17.3, J17.8*, J18.0, J18.1, J18.2, J18.8, J18.9, J85.0, J85.1

CCS, Clinical classification system; ICD, International Classification of Diseases.

Variables

The dependent variable was defined as whether a patient admitted with a primary diagnosis of pneumonia died from any cause during their hospital stay, i.e., overall mortality among patients admitted with pneumonia. Independent variables were gender, age, insurance type, whether surgery was performed, whether emergency hospitalization was performed, and comorbidity, referring to previous studies.^13,18,31 Insurance types were categorized into health insurance, medical aid, and others. CCI and CCS tools were used to calculate comorbidity index variables, including the Korean-Charlson Comorbidity Index (K-CCI) composed of weights of 12 diseases updated through previous studies,^4,10 the modified-Korean-Charlson Comorbidity Index (m-K-CCI) calculated by readjusting the existing 17 CCI comorbidity indices, and the modified-Korean-Clinical Classification Software (m-K-CCS), a comorbidity index calculated using the CCS tool.

Statistical analysis

To derive new comorbidity weights, we used the Cox regression analysis method, a survival analysis method, by referring to previous studies.^31–33 To calculate the CCI weight (m-K-CCI), we used hazard ratio (HR) and 95% confidence interval (CI) considering sex, age, insurance type, surgery, emergency hospitalization, and 17 CCI comorbidities. The final CCI weight, m-K-CCI, was calculated by dividing the HR of statistically significant (p < 0.05) comorbidities by the HR of the lowest disease. To calculate CCS comorbidity weight (m-K-CCS), we used HR and 95% CI considering sex, age, insurance type, surgery, emergency hospitalization, and 260 CCS diseases. In order to solve the problem of multiple hypothesis testing for more than 260 predictor variables including 260 CCS, the Bonferroni Correction suggested in a previous study was applied to set the adjusted significance level.³⁴ The adjusted significance level calculated according to the Bonferroni Correction formula was 0.0002. The final CCS comorbidity weight, m-K-CCS, was calculated by dividing the HR of statistically significant (p < 0.0002) comorbidities by the HR of the lowest disease.

To develop a mortality prediction model adjusted for the severity of comorbidities, logistic regression analysis, least absolute shrinkage and selection operator (LASSO) regression, classification and regression tree (CART), random forest, gradient-boosted model (GBM), and artificial neural network (ANN) analysis techniques were used. Logistic Regression is a traditional statistical model that serves as a baseline for binary classification tasks. It is widely used in clinical research for its interpretability. LASSO Regression is a regularized linear regression method that performs both variable selection and shrinkage, which is particularly useful when dealing with high-dimensional comorbidity data. CART is a decision tree algorithm that allows for easy interpretation and captures nonlinear relationships and interactions between variables. Random forest is an ensemble method that builds multiple decision trees and aggregates their predictions, improving accuracy and reducing overfitting. GBM is a boosting algorithm that builds trees sequentially to minimize prediction error, often showing strong predictive performance in medical data contexts. ANN is a deep learning model capable of capturing complex, non-linear patterns in data. It was included to compare traditional models with more advanced, data-driven approaches.²⁴ These models were selected to provide a comprehensive comparison across linear, regularized, tree-based, and neural network approaches in terms of their predictive power for mortality. The predictive power for model fit evaluation and selection was evaluated using the area under the curve of receiver operating characteristic (ROC) (AUC) value. AUC has a value between 0.5 and 1. When AUC has a value of 0.5, it means that there is no predictive power. When AUC is over 0.8, it means that there is good predictive power.¹⁸ SAS 9.4 for Windows software and Python for Windows software 3.10.0 were used for all statistical analyses.

Results

General characteristics

The total number of subjects in this study was 46,286. Data of 24,008 subjects were used as training data and data of 22,278 subjects were used as validation data. There were slightly more males (56.2%) than females (43.8%). Their average age was 34 years. Surgery was performed in 0.9% of cases. Hospitalization through the emergency room (55.8%) was higher than that through ambulatory care (44.2%) (Table 2).

Table 2.

General characteristics of study subjects

Variables	Total (n = 46,286)	Training data (n = 24,008)	Validation data (n = 22,278)
Gender, n(%)
Male	26,011 (56.2)	13,402 (55.8)	12,609 (56.6)
Female	20,275 (43.8)	10,606 (44.2)	9669 (43.4)
Age, mean ± SD	34.3 ± 35.0	32.8 ± 34.7	35.9 ± 35.3
Insurance type, n(%)
Health insurance	43,322 (93.6)	22,584 (94.1)	20,738 (93.1)
Medicare	2681 (5.8)	1290 (5.4)	1391 (6.2)
Others	283 (0.6)	134 (0.6)	149 (0.7)
Operation, n(%)
No	45,870 (99.1)	23,823 (99.2)	22,047 (99.0)
Yes	416 (0.9)	185 (0.8)	231 (1.0)
Admission route, n(%)
Ambulatory	20,453 (44.2)	10,975 (45.7)	9478 (42.5)
Emergency	25,833 (55.8)	13,033 (54.3)	12,800 (57.5)
Time to death(day), mean ± SD	21 ± 49	23 ± 54	20 ± 42

SD, standard deviation.

CCI and CCS modification using survival analysis

Cox regression analysis was used to recalibrate the comorbidity index to predict mortality in patients with pneumonia. Table 3 shows hazard ratios and adjusted weights of each comorbidity. Myocardial infarction, chronic pulmonary disease, liver disease, diabetes with end organ damage, diabetes, hemiplegia, renal disease, any tumor, and metastatic solid tumor were significantly associated with mortality (all p < 0.05). Compared with weights in the K-CCI, updated weights for myocardial infarction, diabetes, moderate or severe renal disease, and any tumor increased, while congestive heart failure, dementia, connective tissue disease, hemiplegia, moderate or severe liver disease, metastatic solid tumor, and AIDS decreased in the m-K-CCI. However, chronic pulmonary disease, mild liver disease, and diabetes with end organ damage did not show significant difference between the two (K-CCI vs. m-K-CCI) (Table 3).

Table 3.

Results of CCI modification using Cox regression analysis (n = 46,286)

Comorbidity	n	Parameter estimate	Standard error	Hazard Ratio	p value	m-K-CCI	K-CCI
Myocardial infarction	217	0.422	0.225	1.526	0.000	3	−
Congestive heart failure	1235	0.032	0.061	1.033	0.599	−	2
Peripheral vascular disease	220	0.198	0.146	1.219	0.176	−	−
Cerebral vascular disease	940	−0.033	0.068	0.968	0.632	−	−
Dementia	919	−0.057	0.068	0.944	0.401	−	2
Chronic pulmonary disease	5687	−0.444	0.052	0.642	<.0001	1	1
Connective tissue disease	289	0.200	0.151	1.222	0.186	−	1
Ulcer disease	194	−0.133	0.148	0.875	0.369	−	−
Mild liver disease	611	0.321	0.102	1.378	0.002	2	2
Diabetes with end organ damage	549	−0.242	0.105	0.785	0.022	1	1
Diabetes	3493	−0.158	0.048	0.854	0.001	1	−
Hemiplegia	176	−0.554	0.166	0.575	0.001	1	2
Moderate or severe renal disease	1670	0.116	0.059	1.123	0.045	2	1
Any tumor, leukemia, lymphoma	2118	0.437	0.059	1.548	<.0001	3	2
Moderate or severe liver disease	63	0.526	0.228	1.693	0.021	3	4
Metastatic solid tumor	649	0.454	0.088	1.575	<.0001	3	6
AIDS	17	0.896	0.579	2.450	0.121	−	4

m-K-CCI, modified-Korean Charlson Comorbidity Index; K-CCI, Korean-Charlson Comorbidity Index; AIDS, Acquired immunodeficiency syndrome. Adjusted for gender, age, insurance type, operation, admission route, and all comorbidities. For CCI comorbidities with p-values less than 0.05, the comorbidity index was readjusted using hazard ratio.

Cox regression analysis results for predicting severity-adjusted mortality in patients with pneumonia showed that hazard ratios of 15 diseases were significant (p < 0.0002) among 260 comorbidities. The significance level was set to the adjusted significance level (0.0002) by applying the Bonferroni Correction suggested in a previous study to solve the problem of multiple hypothesis testing.³⁴ The HR (hazard ratio) value of other diseases of bladder and urethra was the lowest at 0.347, while that of cardiac arrest and ventricular fibrillation was the highest at 3.207, followed by shock at 2.698, septicemia at 2.164, and leukemia at 1.947. The new weighted score of comorbidities (m-K-CCS) calculated by dividing by the lowest HR value was 1–9 (Table 4).

Table 4.

Results of calculating the CCS comorbidity index using Cox regression analysis (n = 46,286)

Comorbidity	n	Parameter estimate	Standard error	Hazard Ratio	p value	m-K-CCS
Other diseases of bladder and urethra	173	−1.058	0.255	0.347	<.0001	1
Other gastrointestinal disorders	586	−0.702	0.167	0.496	<.0001	1
Paralysis	241	−0.662	0.158	0.516	<.0001	1
Bacterial infection; unspecified site	1312	−0.524	0.073	0.592	<.0001	2
Asthma	2314	−0.521	0.115	0.594	<.0001	2
Chronic obstructive pulmonary disease and bronchiectasis	3370	−0.333	0.058	0.717	<.0001	2
Fluid and electrolyte disorders	871	0.277	0.069	1.320	<.0001	4
Cancer of bronchus; lung	865	0.341	0.079	1.406	<.0001	4
Acute and unspecified renal failure	1120	0.371	0.052	1.449	<.0001	4
Respiratory failure; insufficiency; arrest	611	0.421	0.063	1.523	<.0001	4
Secondary malignancies	646	0.530	0.086	1.699	<.0001	5
Leukemia	153	0.666	0.151	1.947	<.0001	6
Septicemia	1593	0.772	0.049	2.164	<.0001	6
Shock	45	0.993	0.214	2.698	<.0001	8
Cardiac arrest and ventricular fibrillation	160	1.165	0.092	3.207	<.0001	9

This model reflects all 260 CCS (Clinical classification software) comorbidities, including gender, age, insurance type, surgery, admission route, and more. For 15 CCS comorbidities with p-values less than 0.0002, the comorbidity index was readjusted using hazard ratio.

Validation of a severity-adjusted comorbidity model using survival analysis and machine learning

Cox regression analysis by severity adjustment methods showed that all severity indices (K-CCI, m-K-CCI, m-K-CCS) had a statistically significant effect on the risk of death (p < 0.0001). When the comorbidity index increased by 1 point, the risk of death increased by 9.1%, 10.1%, and 13.8%, respectively. m-K-CCS showed the highest sensitivity and the strongest severity adjustment effect (Table 5). Machine learning techniques were used to evaluate the comorbidity severity adjustment model by referring to prior studies.²⁶ For model development and evaluation, Least absolute shrinkage and selection operator (LASSO), logistic regression, classification and regression tree (CART), random forests, gradient-boosted model (GBM), and artificial neural network (ANN) analyses were performed. In the training dataset (n = 24,008), there were 1472 death cases, resulting in a mortality rate of 6.1%. In the validation dataset (n = 22,278), 1623 deaths were recorded, with a mortality rate of 7.3%. Both datasets exhibited class imbalance, with the number of death cases being relatively small compared to survival cases. Analysis using the validation dataset showed that GBM's m-K-CCS had the highest AUC value of 0.910, followed by logistic regression and ANN. Among the six modeling approaches, the comorbidity index-adjusted m-K-CCS consistently provided the highest AUC value among the five approaches except CART (Figure 2). In addition, most AUC values were higher than 0.7, confirming the performance and stability of the prediction model. Although Random Forest had a slightly lower AUC (0.857), it achieved relatively higher recall and F1 scores (Table 6).

Figure 2.

Receiver of operating characteristic (ROC) curves for various machine learning techniques.

Table 5.

Cox regression analysis by severity adjustment methods (n = 46,286)

Variables	K-CCI		m-K-CCI		m-K-CCS
Variables	Hazard Ratio	p value	Hazard Ratio	p value	Hazard Ratio	p value
Gender(ref : Male)	1		1		1
Female	0.923	0.034	0.943	0.124	0.912	0.015
Age	1.045	<0.0001	1.045	<0.0001	1.043	<0.0001
Insurance type(ref : Health insurance)	1		1		1
Medicare	1.046	0.425	1.042	0.463	1.067	0.247
Others	0.892	0.510	0.873	0.433	0.819	0.249
Operation(ref : No)	1		1		1
Yes	0.723	0.000	0.712	0.000	0.694	<0.0001
Admission route(ref : Ambulatory)	1		1		1
Emergency	2.282	<0.0001	2.282	<0.0001	2.241	<0.0001
Comorbidity Index	1.091	<0.0001	1.101	<0.0001	1.138	<0.0001

Table 6.

Comparison of models in predicting outcomes of patients with pneumonia using machine learning

Model	Training data (n = 24,008)					Validation data (n = 22,278)
Model	AUC	Accuracy	Precision	Recall	F1 Score	AUC	Accuracy	Precision	Recall	F1 Score
Logistic regression
K-CCI	0.888	0.939	0.495	0.031	0.058	0.877	0.926	0.395	0.028	0.052
m-K-CCI	0.887	0.939	0.474	0.025	0.048	0.877	0.927	0.481	0.023	0.044
m-K-CCS	0.919	0.944	0.630	0.209	0.314	0.906	0.932	0.597	0.199	0.299
LASSO regression
K-CCI	0.874	0.899	0.235	0.287	0.258	0.858	0.881	0.238	0.288	0.261
m-K-CCI	0.871	0.893	0.234	0.329	0.273	0.856	0.874	0.247	0.353	0.290
m-K-CCS	0.906	0.911	0.340	0.475	0.397	0.889	0.894	0.338	0.475	0.395
CART
K-CCI	0.939	0.949	0.917	0.181	0.302	0.761	0.919	0.272	0.064	0.104
m-K-CCI	0.939	0.948	0.916	0.170	0.287	0.763	0.919	0.270	0.068	0.108
m-K-CCS	0.963	0.960	0.907	0.390	0.545	0.760	0.923	0.439	0.199	0.274
Random forest
K-CCI	0.936	0.949	0.838	0.204	0.328	0.823	0.919	0.286	0.073	0.116
m-K-CCI	0.936	0.948	0.823	0.196	0.317	0.826	0.920	0.307	0.080	0.127
m-K-CCS	0.959	0.960	0.849	0.425	0.567	0.857	0.925	0.467	0.235	0.313
GBM
K-CCI	0.897	0.941	0.843	0.040	0.077	0.878	0.926	0.323	0.012	0.024
m-K-CCI	0.894	0.940	0.778	0.038	0.073	0.879	0.927	0.446	0.023	0.043
m-K-CCS	0.928	0.946	0.684	0.217	0.330	0.910	0.932	0.598	0.184	0.282
ANN
K-CCI	0.888	0.939	0.531	0.012	0.023	0.876	0.927	0.575	0.014	0.028
m-K-CCI	0.887	0.939	0.520	0.009	0.017	0.876	0.927	0.545	0.007	0.015
m-K-CCS	0.921	0.944	0.693	0.145	0.240	0.905	0.932	0.666	0.126	0.212

AUC, area under the curve; K-CCI, Korean-Charlson comorbidity index; m-K-CCI, modified-Korean Charlson Comorbidity Index; m-K-CCS, modified-Korean Clinical classification software; LASSO, least absolute shrinkage and selection operator; CART, classification and regression tree; GBM, gradient-boosted model; ANN, artificial neural network.

Discussion

This study was conducted to more accurately predict and evaluate comorbidities affecting mortality by adjusting for disease severity using a nationwide database. Risk-adjusted mortality rates, which measure health outcomes and quality of care, are critical to health policy but difficult to measure accurately.³⁵ It is known that the influence of a patient's comorbidities is very important in calculating risk-adjusted mortality rates. However, severity-adjusted comorbidities are not customized for each disease. Although CCI has been widely used in the past, applying it to all diseases has limitations.^4,18,31

Therefore, in this study, we developed and verified a comorbidity adjustment model that applied new weights suited to characteristics of pneumonia patients by utilizing the existing known comorbidity index CCI and CCS diagnostic group criteria. As a result of Cox regression analysis of 46,286 pneumonia patients, m-K-CCI was developed by applying new weights to 11 diseases in the existing Korean comorbidity index K-CCI model. Out of 260 CCS groups, 15 disease groups were statistically significant (p < 0.0002) and m-K-CCS model with new weights was developed. To verify the comorbidity index model, six machine learning techniques, including logistic regression analysis, least absolute shrinkage and selection operator (LASSO), classification and regression tree (CART), random forest, gradient-boosted model (GBM), and artificial neural network (ANN) analysis referring to previous studies,^24,26 were used. Verification results showed that the m-K-CCS model had the highest AUC among all five machine learning techniques except the CART technique. This was consistent with results of a previous study that developed a comorbidity model by applying new weights to specific diseases such as liver disease, chronic pulmonary disease, diabetes, and renal disease, in which the C statistic of the new weighted model was higher than that of the existing comorbidity model.³¹ The AUC of the m-K-CCS model of the GBM technique was the highest at 0.910, consistent with results of a previous study that developed and verified a mortality prediction model for heart failure patients, in which the C statistic of GBM was the highest.²⁴ The AUC of m-K-CCS of logistic regression analysis and the AUC of m-K-CCS of ANN were 0.906, 0.905 each and the AUC of LASSO was 0.889. Gradient-Boosted model (GBM) is a supervised learning algorithm that sequentially fits a new model. It complements the weaknesses of a previous model using gradients (or residuals) and then linearly combines them to create the resulting model.²⁴ It is one of the models in the machine learning series that has good performance. However, it has the disadvantage of being slow. In this study, the validation results showed that the machine learning models achieved high AUC, recall, and F1 scores, indicating overall good performance in predicting mortality. However, the precision values were relatively low across models, suggesting limitations in accurately identifying actual death cases. This discrepancy is likely due to class imbalance, as the number of death cases was substantially smaller than non-death cases. Therefore, future studies should consider applying oversampling techniques or other methods to balance the dataset and improve precision in mortality prediction.

These study results suggest the need for developing a customized severity adjustment tool with better predictive power that matches characteristics of the disease. It is necessary to clearly present comorbidities that affect mortality so that clinicians can utilize them in diagnosis and patient management. However, the existing severity adjustment tool was applied uniformly to all diseases without considering the characteristics of each disease. In the hospital standardized mortality rate evaluation conducted to evaluate the quality level of Korean medical institutions, the AUC of the pneumonia patient death model was 0.870, which is lower than the AUC of this study.³⁶ This is the result of adjusting the severity using the CCI method, which does not reflect the characteristics of Korean pneumonia death patients and equally applies the weights of 17 comorbidities to deaths of all diseases. In the era of artificial intelligence, it is necessary to collect big data necessary for problem solving, manage high-quality data to enable artificial intelligence learning, and utilize it for decision-making to solve problems. Many countries including Korea are developing a severity adjustment model to calculate severity adjustment deaths and making them public to the entire population.^37–39 Therefore, calculating an accurate severity adjustment ratio based on a highly predictive comorbidity adjustment model will provide very important basic data for national health and medical policies.

This study has several limitations. First, because it utilized chart-based administrative data of hospitalized patients, several test results including biochemical tests related to the severity of the disease were not reflected in the variables. Second, due to limitations in the number of analysis subjects, more artificial intelligence techniques such as deep learning could not be applied. Third, although a model was developed by deriving new weights for each comorbidity through statistical analysis, a consensus among clinical experts was not reached. Thus, follow-up studies that complement these limitations are needed in the future. Future follow-up studies will be needed that include more diverse clinical data predicting variables that may affect the mortality of pneumonia patients. Despite these limitations, this study is significant in that results of this study can be used as basic data that can contribute to health care policies. This study also developed a comorbidity risk adjustment model suitable for each disease by recalibrating existing comorbidity tools using representative big data followed by a verification process using machine learning techniques.

Conclusions

The objective of this study was to develop a comorbidity-adjusted tool that could estimate the severity-adjusted mortality rate of hospitalized pneumonia patients using representative big data. Among the three comorbidity adjustment tools, including the existing comorbidity index K-CCI and the newly developed m-K-CCI and m-K-CCS, m-K-CCS was found to be the best. Model evaluation results using logistic regression analysis, LASSO regression analysis, CART, random forest, GBM, and artificial neural network analysis techniques confirmed that the GBM version method showed the best explanatory power. These study results suggest the need for developing models that adjust for the severity of comorbidities each principal diagnosis and continuing follow-up research on models that can more accurately predict health outcomes.

Footnotes

ORCID iD

Jihye Lim

Ethical approval

This study was conducted in accordance with the Declaration of Helsinki. Ethical review and approval were waived for this study because it used anonymous public open indicator data, not an individual's personal data.

Author contributions

JL and JP conceptualized and designed the study. JP obtained funding. JP data collection and data analysis. JL wrote the first draft of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00212182).

Informed consent

No patient consent was required for this study. All data used in this study were obtained from a public open database and data that complete anonymity guaranteed.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

Restrictions apply to the availability of these data. Data were obtained from the Korea Disease Control and Prevention Agency (KDCA) and are available from o.

References

Campbell

Jacques

Fotheringham

, et al. Developing a summary hospital mortality index: retrospective analysis in English hospitals over five years. Br Med J 2012; 344: 1–11.

Information CI for H. HSMR: a new approach for measuring hospital mortality trends in Canada. Canadian Institute for Health Information, 2007.

Kim

K-H

Ahn

L-S

. A comparative study on comorbidity measurements with lookback period using health insurance database: focused on patients who underwent percutaneous coronary intervention. J Prev Med Pub Health 2009; 42: 267–273.

Kim

. Comorbidity adjustment in health insurance claim database. Health Policy and Management 2016; 26: 71–78.

Charlson

Carrozzino

Guidi

, et al. Charlson comorbidity index: a critical review of clinimetric properties. Psychother Psychosom 2022; 91: 8–35.

Feinstein

. The pre-therapeutic classification of co-morbidity in chronic disease. J Chronic Dis 1970; 23: 455–468.

Alexandrescu

Bottle

Hua Jen

, et al. The US hospital standardised mortality ratio: retrospective database study of Massachusetts hospitals. JRSM Open 2015; 6: 2054270414559083.

Ben-Tovim

Woodman

Harrison

, et al. Measuring and reporting mortality in hospital patients. Canberra: Australian Institute of Health and Welfare , 2009, https://www.researchgate.net/profile/Paul-Hakendorf/publication/228647633_Measuring_and_reporting_mortality_in_hospital_patients/links/09e4150bfbba0e878d000000/Measuring-and-reporting-mortality-in-hospital-patients.pdf (accessed 16 December 2024).

Intelligence

. Understanding HSMRs: A toolkit on hospital standardised mortality ratios. London: Dr Foster Intelligence, 2011.

10.

Quan

Couris

, et al. Updating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol 2011; 173: 676–682.

11.

Charlson

Pompei

Ales

, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987; 40: 373–383.

12.

Deyo

Cherkin

Ciol

. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992; 45: 613–619.

13.

Kim

K-H

. Comparative study on three algorithms of the ICD-10 charlson comorbidity index with myocardial infarction patients. J Prev Med Pub Health 2010; 43: 42–49.

14.

Quan

Sundararajan

Halfon

, et al. Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005; 43: 1130–1139.

15.

Sundararajan

Quan

Halfon

, et al. Cross-national comparative performance of three versions of the ICD-10 charlson index. Med Care 2007; 45: 1210–1215.

16.

Romano

Roos

Jollis

. Presentation adapting a clinical comorbidity index for use with ICD-9-CM administrative data: differing perspectives. J Clin Epidemiol 1993; 46: 1075–1079.

17.

Baek

S-K

Park

H-J

Kang

S-H

, et al. Convergence study in development of severity adjustment method for death with acute myocardial infarction patients using machine learning. Journal of Digital Convergence 2019; 17: 217–230.

18.

Baek

S-K

Park

J-H

Kang

S-H

, et al. A study on the development of severity-adjusted mortality prediction model for discharged patient with acute stroke using machine learning. Journal of the Korea Academia-Industrial Cooperation Society 2018; 19: 126–136.

19.

Lim

J-H

Nam

M-H

. Development of mortality model of severity-adjustment method of AMI patients. Journal of the Korea Academia-Industrial Cooperation Society 2012; 13: 2672–2679.

20.

Team

. Summary Hospital-level Mortality Indicator (SHMI). NHS Digital, 2017.

21.

Korea

. Annual report on the causes of death statistics by provinces. Daejeon, Korea: Statistics Korea, 2022.

22.

Mandell

Wunderink

Anzueto

, et al. Infectious Diseases Society of America/American thoracic society consensus guidelines on the management of community-acquired pneumonia in adults. Clin Infect Dis 2007; 44: S27–S72.

23.

Costantini

Allara

Patrucco

, et al. Adherence to guidelines for hospitalized community-acquired pneumonia over time and its impact on health outcomes and mortality. Intern Emerg Med 2016; 11: 929–940.

24.

Desai

Wang

Vaduganathan

, et al. Comparison of machine learning methods with traditional models for use of administrative claims with electronic medical records to predict heart failure outcomes. JAMA network Open 2020; 3: e1918962–e1918962.

25.

Shahidi

Rennert-May

D’Souza

, et al. Machine learning risk estimation and prediction of death in continuing care facilities using administrative data. Sci Rep 2023; 13: 17708.

26.

Zhang

Xie

, et al. Machine learning-based prediction of mortality in acute myocardial infarction with cardiogenic shock. Front Cardiovasc Med 2024; 11: 1402503.

27.

Kim

S-S

Kim

H-S

. The impact of the association between cancer and diabetes Mellitus on mortality. J Pers Med 2022; 12: 1099.

28.

Horwitz

Partovian

Lin

, et al. Hospital-wide (all-condition) 30-day risk-standardized readmission measure. New Haven, CT, https://sites.dartmouth.edu/dac/files/2018/08/mmshospital-wideall-conditionreadmissionrate-z6qxhh.pdf (2011, accessed 10 January 2025).

29.

Jain

Khera

Mortensen

, et al. Readmissions of adults within three age groups following hospitalization for pneumonia: analysis from the nationwide readmissions database. PLoS One 2018; 13: e0203375.

30.

Ryu

Yoo

Kim

, et al. Development of prediction models for unplanned hospital readmission within 30 days based on common data model: a feasibility study. Methods Inf Med 2021; 60: e65–e75.

31.

Choi

Kim

M-H

Kim

, et al. Recalibration and validation of the charlson comorbidity index in an Asian population: the national health insurance service-national sample cohort study. Sci Rep 2020; 10: 13715.

32.

Lee

Jung

Lee

, et al. Recalibration and validation of the charlson comorbidity Index in acute kidney injury patients underwent continuous renal replacement therapy. Kidney Res Clin Pract 2022; 41: 332.

33.

Fraccaro Me

Mallend

Urban

, et al. Predicting mortality from change-over-time in the charlson comorbidity Index. Medicine (Baltimore) 2016; 95: e4973.

34.

VanderWeele

Mathur

. Some desirable properties of the Bonferroni correction: is the Bonferroni correction really so bad? Am J Epidemiol 2019; 188: 617–618.

35.

Choi

Kim

S-H

Ock

, et al. Evaluation of the validity of risk-adjustment model of acute stroke mortality for comparing hospital performance. Health Policy and Management 2016; 26: 359–372.

36.

Health insurance review & assessment service. The hospital standardized mortality ratio (HSMR) adequacy assessment, https://www.hira.or.kr/cms/open/04/04/12/2023_9.pdf (2023).

37.

Lee

E-J

Hwang

S-H

Lee

J-A

, et al. Variations in the hospital standardized mortality ratios in Korea. J Prev Med Pub Health 2014; 47: 206.

38.

Ngantcha

Le-Pogam

M-A

Calmus

, et al.

Hospital quality measures: are process indicators associated with hospital standardized mortality ratios in French acute care hospitals?

BMC Health Serv Res 2017; 17: 578.

39.

Shinjo

Fushimi

. The degree of severity and trends in hospital standardized mortality ratios in Japan between 2008 and 2012: a retrospective observational study. Int J Qual Health Care 2017; 29: 705–712.

A machine learning model for predicting severity-adjusted in-hospital mortality in pneumonia patients

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Method

Study design and data

Variables

Statistical analysis

Results

General characteristics

CCI and CCS modification using survival analysis

Validation of a severity-adjusted comorbidity model using survival analysis and machine learning

Discussion

Conclusions

Footnotes

ORCID iD

Ethical approval

Author contributions

Funding

Informed consent

Declaration of conflicting interests

Data availability statement

References