Early prediction of 30- and 14-day all-cause unplanned readmissions

Abstract

Background

An unplanned readmission is a dual metric for both the cost and quality of medical care.

Methods

We employed the random forest (RF) method to build a prediction model using a large dataset from patients’ electronic health records (EHRs) from a medical center in Taiwan. The discrimination abilities between the RF and regression-based models were compared using the areas under the ROC curves (AUROC).

Results

When compared with standardized risk prediction tools, the RF constructed using data readily available at admission had a marginally yet significantly better ability to identify high-risk readmissions within 30 and 14 days without compromising sensitivity and specificity. The most important predictor for 30-day readmissions was directly related to the representing factors of index hospitalization, whereas for 14-day readmissions the most important predictor was associated with a higher chronic illness burden.

Conclusions

Identifying dominant risk factors based on index admission and different readmission time intervals is crucial for healthcare planning.

Keywords

Healthcare quality 30-and 14-day readmission random forest (RF)electronic health records (EHRs)

Introduction

Unplanned readmissions are increasingly being regarded as a dual indicator of medical cost and the quality of care. In line with this, health authorities in several countries have adjusted reimbursements to hospitals in accordance with readmission rates, providing a direct financial incentive to reduce the excessive health care costs caused by readmissions and to enhance the quality of medical care.¹ It is widely recognized that the predictive modeling technique can be used as a decision support tool for hospitals to identify patients with a high risk of readmission and to coordinate targeted interventions for such patients.^2–5 However, most studies dedicated to accurately predicting readmissions are focused on a time frame based on a 30-day interval and are limited by lack of longitudinal data availability following discharge.^3,6

On the one hand, it remains unclear whether hospitals should be held responsible for the quality of healthcare for such a long period of time after a patient’s discharge. While various time frames are used for identifying readmissions in several studies,^7–10 analyses conducted using regression-based models often have poor prediction performance.⁵ On the other hand, in order to be effectively helpful in a clinical operation, it is imperative to be able to predict patients with a high risk of readmission at the time of the admission, rather than after completion of the hospitalization, since early identification of high-risk patients for readmission would position clinicians to strategically coordinate timely tailored interventions. Recent studies utilize data available upon admission from electronic health records (EHRs) that might make possible early identification of high-risk patients.^11–15 However, the analyses of such studies are mostly limited to specific diseases or cohorts and thus have potentially limited generalizability. Although machine learning (ML) algorithms have been used successfully to predict 30-day unplanned readmissions using large datasets at the time of admission,^16–19 there are only few studies applying ML approach to identify patients at high risk of earlier unplanned readmissions.

The aim of this study is therefore twofold. First, we use an ML algorithm, which takes advantage of big data that are readily accessible at the initial stage of admission from EHRs, to facilitate an early and prospective risk identification of readmission for more coordinated, efficient in-hospital care. Second, we evaluate the likelihood of readmission and the importance of its risk factors which may vary according to the length of the post-discharge time frames.

Methods

Data source, outcome and predictive variables

For this retrospective cohort study, data were obtained from an academic medical center in Taiwan. We collected information on admission, discharge and readmission recorded by the hospital’s computer system for all of its patients who were discharged alive between 1 January, 2018 to 31 December, 2019. We examined a total of 99,436 hospitalizations.

The first outcome of interest was all-cause 30-day readmissions defined as readmissions due to the same or related diagnosis within 30 days after index hospitalization. Although a 30-day readmission is widely used both in the literature and as a quality indicator adopted by several countries, ¹ we additionally considered a shorter length of the post-discharge time frames to measure readmission risk for two reasons. First, most studies have defined a preventable readmission as a readmission that takes place within the first 15 days^20,21 Second, Taiwan’s Ministry of Health and Welfare (MHW) has adopted 14-day unplanned readmissions as an indicator of quality of medical care. To determine whether risk factors would differ between 30- and 14-day readmissions, we also employed a 14-day unplanned readmission as a second outcome measure in this study.

Predictors that are readily available within hours of the initial admission were selected for model construction. The ICD-10-CM/PCS (International Statistical Classification of Disease and Related Health Problems, 10th Revision, Clinical Modification/Procedure Coding System) codes in the primary or secondary diagnosis fields on hospital intake forms at index admission are used to identify patients’ comorbidities. These comorbidities are listed in Charlson comorbidity index (CCI) table. Note that although LOS and some laboratory tests are important factors related to readmissions, these data were not available at the time of index admission. Thus, our final model excluded these two factors for a real-time assessment of readmission upon admission. In addition, we also excluded predictors related to medical expenses such as coverage from National Health Insurance (NHI), deductibles, out-of-pockets, compensation from additional private health insurance and the patients’ payment method in the final model since these variables are also not readily accessible upon admission. We also excluded observation-unit stay, elective readmission, transit to another hospital, and discharge against medical advice or death (Figure 1).

Figure 1.

Sample selection criteria.

The random forest

We employed the Random Forest²² (RF) method to develop the predictive model. Several studies have found that the RF method tends to perform better than other data-driven approaches for solving classification problems.^18-19 It is less computationally expensive than other ML techniques and its built-in variable-importance score can provide some insight into how each variable contributes to the prediction model.

The RF model is comprised of an ensemble of individual decision trees. The ensemble learning algorithm averages predictions over many decision trees to produce more accurate and stable predictions. Each tree is built on a different bootstrap sample rather than on the original sample. At each split of the internal node, only a random subset of predictors was selected rather than the full set of all predictors. We used entropy as the splitting criterion and stopping condition at each split of the classification tree as follows:

E n t r o p y = - \sum_{i = 1}^{c} p_{i} \times \log_{2} (p_{i})

where c is the number of groups of patients with and without readmission in the current study, and

p_{i}

is the prior probability of each readmitted and non-readmitted group. At each partition of the decision tree, the entropy was maximized to obtain the most information gain.²³

Through the process of bagging and random predictor selection, the RF encourages diversity among the trees and hence reduces overfitting and the variance of the estimator to improve the prediction accuracy of the model. The bootstrap process randomly leaves out about one-third of the observations, which constitute the out-of-bag (OOB) sample.^22,23 This is due to the fact that the probability of an observation being omitted is ${(1 - \frac{1}{n})}^{n}$ , and $\lim_{n \to \infty} {(1 - \frac{1}{n})}^{n} = \exp (- 1) = 0.3679 \approx \frac{1}{3}, w h e n n \to \infty$ . Although there is no need to cross-validate the RF since it generates an internal unbiased estimate of the OOB error that provides model accuracy as the forest building progresses, we evaluated the RF prospectively and externally using the validation dataset in 2019 to improve the generalization of the model. Hyper-parameter tuning was conducted using a grid search approach.

The variable importance score was calculated separately for each predictor by summing up the contributions to the information gain in the above split-criterion over all internal nodes of a tree and across all trees in the forest.²¹ We normalized the variable importance score fall to within 100% by dividing all scores over the maximum score.

Statistical analysis and model evaluation

We use STATA, version 16.1²⁴ to perform all statistical operations and to implement RF modeling. Covariates were compared between the readmitted/non-readmitted groups using the Chi-square test for categorical variables and t-test for continuous variables. Final RF models were evaluated against the independent hold-out set of 27,719 patients in 2019, those that were not used in the development of the RF. We compared discrimination abilities of the RF and regression-based models with the areas under the ROC curves (AUROC) using DeLong’s test.²⁵ With regard to the predictive capabilities of the same RF approach between 30- and 14-day readmissions, we used the bootstrap method to compare the differences in the two AUCs. The optimal cut-point value in the ROC analysis for the current study was determined following Youden’s index rule.²⁶ At this point, the difference between true positive rate (sensitivity) and false positive rate (1-specificity) is maximized.

Results

Sample characteristics

The final data set after exclusion consisted of 53 204 hospitalizations for the 2 years of 2018 and 2019 (Figure 1). Exclusions of 46,232 hospitalizations included 43,107 for planned readmission and 3,125 for deaths, transit and discharge against medical advice during hospitalization. Overall, the 30-day unplanned readmission rate was 4.80% (2,556 out of 53,204) in the current study for the 2 years 2018 and 2019. Of the unplanned readmissions, 2.93% (1,558 out of 53,204) and 1.88% (998 out of 53,204) occurred within 14 days and 15–30 days, respectively.

The Chi-squared and t-tests demonstrated significant differences in the prediction factors between those who were readmitted versus those who were not, both for 30 and 14 days in 2018 and 2019, respectively (Table 1 and Table 2). Compared with non-readmitted patients, the readmitted patients within 30 or 14 days were mostly male and older, frequently admitted from the ER or oncology department, and more likely to have a major illness identified but less likely to have received surgical procedures. In addition, patients readmitted within 30 or 14 days also had significantly more frequent emergency department (ER) visits in the past 6 months, larger numbers of hospitalizations during the previous 1 year, higher Charlson Comorbidity Index (CCI) scores and longer length of stay (LOS).

Table 1.

Descriptive statistics for the year 2018.

	ALL 25,485		30-day readmission					14-day readmission
	ALL 25,485		y = 0 (24,247)		y = 1 (1238,4.86%)		p	y = 0 (24,746)		y = 1 (739,2.90%)		p
Age							<0.001					<0.001
Mean (SD)	60.24	20.56	60.04	20.46	64.20	22.20		60.15	20.50	63.46	22.25
Min max	0.00	106.50	0.00	106.50	0.00	103.92		0.00	106.50	0.00	103.83
Age²/1000							<0.001					<0.001
Mean (SD)	3.99	2.20	3.96	2.19	4.54	2.41		3.97	2.19	4.45	2.40
Min max	0.00	11.24	0.00	11.24	0.00	10.61		0.00	11.24	0.00	10.61
LOS							<0.001					<0.001
Mean (SD)	9.21	11.55	8.98	11.25	13.80	15.71		9.09	11.44	13.39	14.29
Min max	1.00	228.00	1.00	208.00	1.00	228.00		1.00	228.00	1.00	137.00
ED visits during past 6 months							<0.001					<0.001
Mean (SD)	0.87	1.55	0.82	1.46	1.93	2.52		0.85	1.51	1.82	2.44
Min max	0.00	31.00	0.00	31.00	0.00	18.00		0.00	31.00	0.00	16.00
No. of hospitalizations in past 1 year							<0.001					<0.001
Mean (SD)	0.84	1.74	0.80	1.70	1.58	2.30		0.82	1.71	1.58	2.45
Min max	0.00	39.00	0.00	39.00	0.00	22.00		0.00	39.00	0.00	22.00
CCI							<0.001					<0.001
Mean (SD)	1.76	2.38	1.69	2.32	3.09	3.06		1.71	2.34	3.19	3.12
Min max	0.00	14.00	0.00	14.00	0.00	12.00		0.00	14.00	0.00	11.00
Gender							0.017					0.045
Male	14,291	56.08%	13,556	55.91%	735	59.37%		13,850	55.97%	441	59.68%
Female	11,194	43.92%	10,691	44.09%	503	40.63%		10,896	44.03%	298	40.32%
Type of index admission							<0.001					<0.001
Emergency	9078	35.62%	8328	34.35%	750	60.58%		8,648	34.95%	430	58.19%
Outpatient	16,407	64.38%	15,919	65.65%	488	39.42%		16,098	65.05%	309	41.81%
Surgical procedure							<0.001					<0.001
No	5041	19.78%	4671	19.26%	370	29.89%		4,825	19.50%	216	29.23%
Yes	20,444	80.22%	19,576	80.74%	868	70.11%		19,921	80.50%	523	70.77%
Oncology dischrge							<0.001					<0.001
No	25,309	99.31%	24,110	99.43%	1199	96.85%		24,592	99.38%	717	97.02%
Yes	176	0.69%	137	0.57%	39	3.15%		154	0.62%	22	2.98%
Catastrophic illness identity							<0.001					<0.001
No	19,534	76.65%	18,776	77.44%	758	61.23%		19,096	77.17%	438	59.27%
Yes	5951	23.35%	5471	22.56%	480	38.77%		5,650	22.83%	301	40.73%
Ward type							0.263					0.067
Public	13,775	54.05%	13,125	54.13%	650	52.50%		13,400	54.15%	375	50.74%
Private	11,710	45.95%	11,122	45.87%	588	47.50%		11,346	45.85%	364	49.26%
Additional health insurance							0.199					0.653
No	11,140	43.71%	10,577	43.62%	563	45.48%		10,811	43.69%	329	44.52%
Yes	14,345	56.29%	13,670	56.38%	675	54.52%		13,935	56.31%	410	55.48%
CCI group 1							0.837					0.517
No	24,950	97.90%	23,739	97.90%	1211	97.82%		24,229	97.91%	721	97.56%
Yes	535	2.10%	508	2.10%	27	2.18%		517	2.09%	18	2.44%
CCI group 2							<0.001					0.012
No	24,375	95.64%	23,227	95.79%	1148	92.73%		23,682	95.70%	693	93.78%
Yes	1110	4.36%	1020	4.21%	90	7.27%		1064	4.30%	46	6.22%
CCI group 3							0.485					0.933
No	25,131	98.61%	23,913	98.62%	1218	98.38%		24,402	98.61%	729	98.65%
Yes	354	1.39%	334	1.38%	20	1.62%		344	1.39%	10	1.35%
CCI group 4							0.265					0.844
No	23,924	93.87%	22,771	93.91%	1153	93.13%		23,229	93.87%	695	94.05%
Yes	1561	6.13%	1476	6.09%	85	6.87%		1517	6.13%	44	5.95%
CCI group 5							<0.001					0.037
No	24,853	97.52%	23,669	97.62%	1184	95.64%		24,141	97.56%	712	96.35%
Yes	632	2.48%	578	2.38%	54	4.36%		605	2.44%	27	3.65%
CCI group 6							<0.001					0.001
No	24,394	95.72%	23,240	95.85%	1154	93.21%		23,705	95.79%	689	93.23%
Yes	1091	4.28%	1007	4.15%	84	6.79%		1041	4.21%	50	6.77%
CCI group 7							0.166					0.279
No	25,091	98.45%	23,878	98.48%	1213	97.98%		24,367	98.47%	724	97.97%
Yes	394	1.55%	369	1.52%	25	2.02%		379	1.53%	15	2.03%
CCI group 8							0.014					0.006
No	24,466	96.00%	23,294	96.07%	1172	94.67%		23,771	96.06%	695	94.05%
Yes	1019	4.00%	953	3.93%	66	5.33%		975	3.94%	44	5.95%
CCI group 9							<0.001					0.002
No	23,780	93.31%	22,656	93.44%	1124	90.79%		23,111	93.39%	669	90.53%
Yes	1705	6.69%	1591	6.56%	114	9.21%		1635	6.61%	70	9.47%
CCI group 10							0.031					0.018
No	21,146	82.97%	20,091	82.86%	1055	85.22%		20,509	82.88%	637	86.20%
Yes	4339	17.03%	4156	17.14%	183	14.78%		4237	17.12%	102	13.80%
CCI group 11							<0.001					0.037
No	24,333	95.48%	23,178	95.59%	1155	93.30%		23,639	95.53%	694	93.91%
Yes	1152	4.52%	1069	4.41%	83	6.70%		1107	4.47%	45	6.09%
CCI group 12							0.641					0.191
No	25,138	98.64%	23,915	98.63%	1223	98.79%		24,405	98.62%	733	99.19%
Yes	347	1.36%	332	1.37%	15	1.21%		341	1.38%	6	0.81%
CCI group 13							<0.001					0.002
No	23,571	92.49%	22,466	92.65%	1105	89.26%		22,909	92.58%	662	89.58%
Yes	1914	7.51%	1781	7.35%	133	10.74%		1837	7.42%	77	10.42%
CCI group 14							<0.001					<0.001
No	19,416	76.19%	18,678	77.03%	738	59.61%		18,992	76.75%	424	57.37%
Yes	6069	23.81%	5569	22.97%	500	40.39%		5754	23.25%	315	42.63%
CCI group 15							<0.001					<0.001
No	25,210	98.92%	24,014	99.04%	1196	96.61%		24,497	98.99%	713	96.48%
Yes	275	1.08%	233	0.96%	42	3.39%		249	1.01%	26	3.52%
CCI group 16							<0.001					<0.001
No	23,518	92.28%	22,519	92.87%	999	80.69%		22,937	92.69%	581	78.62%
Yes	1967	7.72%	1728	7.13%	239	19.31%		1809	7.31%	158	21.38%
CCI group 17							<0.001					0.425
No	25,419	99.74%	24,190	99.76%	1229	99.27%		24,683	99.75%	736	99.59%
Yes	66	0.26%	57	0.24%	9	0.73%		63	0.25%	3	0.41%

Table 2.

Descriptive statistics for the year 2019.

	All 27,719		30-day readmission					14-day readmission
	All 27,719		y = 0 (26,401)		y = 1 (1318,4.75%)		p	y = 0 (26,900)		y = 1 (819,2.95%)		p
Age							<0.001					<0.001
Mean (SD)	60.06	20.62	59.79	20.51	65.38	21.96		59.93	20.55	64.31	22.35
Min max	0.00	107.67	0.00	107.67	0.00	102.92		0.00	107.67	0.00	100.83
Age²/1000							<0.001					<0.001
Mean (SD)	3.97	2.19	3.93	2.17	4.69	2.39		3.95	2.18	4.57	2.40
Min max	0.00	11.45	0.00	11.45	0.00	10.40		0.00	11.45	0.00	10.00
LOS							<0.001					<0.001
Mean (SD)	8.73	11.06	8.49	10.70	13.43	15.98		8.59	10.93	13.43	13.76
Min max	1.00	315.00	1.00	173.00	1.00	315.00		1.00	315.00	1.00	122.00
ED visits in past 6 months							<0.001					<0.001
Mean (SD)	0.89	1.71	0.82	1.54	2.29	3.40		0.85	1.61	2.28	3.37
Min max	0.00	53.00	0.00	38.00	0.00	53.00		0.00	53.00	0.00	50.00
No. of hospitalizations in past 1 year							<0.001					<0.001
Mean (SD)	0.91	1.93	0.86	1.88	1.81	2.60		0.88	1.90	1.76	2.70
Min max	0.00	33.00	0.00	33.00	0.00	26.00		0.00	33.00	0.00	26.00
CCI							<0.001					<0.001
Mean (SD)	1.74	2.37	1.67	2.32	3.06	3.02		1.70	2.34	2.99	3.01
Min max	0.00	16.00	0.00	16.00	0.00	13.00		0.00	16.00	0.00	12.00
Gender							0.01					0.026
Male	15,424	55.64%	14,645	55.47%	779	59.10%		14,937	55.53%	487	59.46%
Female	12,295	44.36%	11,756	44.53%	539	40.90%		11,963	44.47%	332	40.54%
Type of index admission							<0.001					<0.001
Emergency	9,678	34.91%	8,841	33.49%	837	63.51%		9,161	34.06%	517	63.13%
Outpatient	18,041	65.09%	17,560	66.51%	481	36.49%		17,739	65.94%	302	36.87%
Surgical procedure							0.002					0.04
No	12,825	46.27%	12,271	46.48%	554	42.03%		12,475	46.38%	350	42.74%
Yes	14,894	53.73%	14,130	53.52%	764	57.97%		14,425	53.62%	469	57.26%
Oncology discharge							<0.001					<0.001
No	162	0.58%	129	0.49%	33	2.50%		147	0.55%	15	1.83%
Yes	27,557	99.42%	26,272	99.51%	1,285	97.50%		26,753	99.45%	804	98.17%
Catastrophic illness identity							<0.001					<0.001
No	6,642	23.96%	6,128	23.21%	514	39.00%		6,320	23.49%	322	39.32%
Yes	21,077	76.04%	20,273	76.79%	804	61.00%		20,580	76.51%	497	60.68%
Ward type							0.249					0.784
Public	14,819	53.46%	14,094	53.38%	725	55.01%		14,385	53.48%	434	52.99%
Private	12,900	46.54%	12,307	46.62%	593	44.99%		12,515	46.52%	385	47.01%
Additional health insurance							0.491					0.231
No	11,734	42.33%	11,164	42.29%	570	43.25%		11,404	42.39%	330	40.29%
Yes	15,985	57.67%	15,237	57.71%	748	56.75%		15,496	57.61%	489	59.71%
CCI group 1							0.024					0.113
No	27,170	98.02%	25,867	97.98%	1303	98.86%		26,361	98.00%	809	98.78%
Yes	549	1.98%	534	2.02%	15	1.14%		539	2.00%	10	1.22%
CCI group 2							<0.001					0.086
No	26,408	95.27%	25,190	95.41%	1218	92.41%		25,638	95.31%	770	94.02%
Yes	1311	4.73%	1211	4.59%	100	7.59%		1262	4.69%	49	5.98%
CCI group 3							0.198
No	27,164	98.00%	25,866	97.97%	1298	98.48%		26,356	97.98%	98.66%
Yes	555	2.00%	535	2.03%	20	1.52%		544	2.02%	11	1.34%
CCI group 4							0.053					0.109
No	25,958	93.65%	24,707	93.58%	1251	94.92%		25,180	93.61%	778	94.99%
Yes	1761	6.35%	1694	6.42%	67	5.08%		1720	6.39%	41	5.01%
CCI group 5							0.235					0.581
No	27,055	97.60%	25,775	97.63%	1280	97.12%		26,258	97.61%	797	97.31%
Yes	664	2.40%	626	2.37%	38	2.88%		642	2.39%	22	2.69%
CCI group 6							<0.001					<0.001
No	26,567	95.84%	25,355	96.04%	1212	91.96%		25,817	95.97%	750	91.58%
Yes	1152	4.16%	1046	3.96%	106	8.04%		1083	4.03%	69	8.42%
CCI group 7							0.058					0.019
No	27,337	98.62%	26,045	98.65%	1292	98.03%		26,537	98.65%	800	97.68%
Yes	382	1.38%	356	1.35%	26	1.97%		363	1.35%	19	2.32%
CCI group 8							<0.001					0.028
No	26,750	96.50%	25,503	96.60%	1247	94.61%		25,971	96.55%	779	95.12%
Yes	969	3.50%	898	3.40%	71	5.39%		929	3.45%	40	4.88%
CCI group 9							<0.001					<0.001
No	25,903	93.45%	24,749	93.74%	1154	87.56%		25,181	93.61%	722	88.16%
Yes	1816	6.55%	1652	6.26%	164	12.44%		1719	6.39%	97	11.84%
CCI group 10							0.035					0.222
No	23,228	83.80%	22,096	83.69%	1132	85.89%		22,529	83.75%	699	85.35%
Yes	4491	16.20%	4305	16.31%	186	14.11%		4371	16.25%	120	14.65%
CCI group 11							<0.001					0.001
No	26,313	94.93%	25,098	95.06%	1215	92.19%		25,556	95.00%	757	92.43%
Yes	1406	5.07%	1303	4.94%	103	7.81%		1344	5.00%	62	7.57%
CCI group 12							0.005					0.013
No	27,290	98.41%	25,980	98.41%	1310	99.39%		26,475	98.42%	815	99.51%
Yes	429	1.59%	421	1.59%	8	0.61%		425	1.58%	4	0.49%
CCI group 13							<0.001					<0.001
No	25,491	91.96%	24,336	92.18%	1155	87.63%		24,772	92.09%	719	87.79%
Yes	2228	8.04%	2065	7.82%	163	12.37%		2128	7.91%	100	12.21%
CCI group 14							<0.001					<0.001
No	21,474	77.47%	20,642	78.19%	832	63.13%		20,952	77.89%	522	63.74%
Yes	6245	22.53%	5759	21.81%	486	36.87%		5948	22.11%	36.26%
CCI group 15							<0.001					<0.001
No	27,418	98.91%	26,169	99.12%	1249	94.76%		26,642	99.04%	776	94.75%
Yes	301	1.09%	232	0.88%	69	5.24%		258	0.96%	43	5.25%
CCI group 16							<0.001					<0.001
No	25,616	92.41%	24,545	92.97%	1071	81.26%		24,949	92.75%	667	81.44%
Yes	2103	7.59%	1856	7.03%	247	18.74%		1951	7.25%	152	18.56%
CCI group 17							0.313					0.469
No	27,651	99.75%	26,338	99.76%	1313	99.62%		26,833	99.75%	818	99.88%
Yes	68	0.25%	63	0.24%	5	0.38%		67	0.25%	1	0.12%

RF training and variable importance

Figure 2 shows that both the OOB and the validation errors stabilized below 2% when the numbers of subtrees, i.e., iterations, reached 500. We, therefore, set the numbers of iterations at 500 since we found that more than 500 iterations could result in overfitting. The numbers of trees, tree depth, and the size of the subset of predictors for the best-performing RF model were 500, 5 and 6, respectively.

Figure 2.

Out-of-Bag and validation errors versus iterations (the number of subtrees). Note: The number of subtrees starts at 5 and is incremented by 5 every time until it reaches 500. Each run produces a pair of OOB and validation errors in the plot.

The variable-importance scores for the predictors are presented in Figure 3 and Figure 4 for 30- and 14-day readmission predictions. The three most important predictors for 30-day readmissions were type of index admission, ER visits in the past 6 months, and comorbidities of metastatic solid tumors, while those for 14-day readmissions were comorbidities of any malignancy and metastatic solid tumors and ER visits in the past 6 months. Final predictors included in the RF model were those that had high variable importance scores and were available at the initial stage of admission, as shown in Figures 4(a),(b).

Figure 3.

(a). Variable importance scores for 30-day readmissions using all predictors. (b) Variable importance scores for 14-day readmissions using all predictors. Note: ED, emergency department; LOS, length of stay; CCI group 1–17, 17 groups by Charlson Comorbidities; HGB, hemoglobin; cc, credit card.

Figure 4.

(a). Variable importance scores for 30-day readmissions using predictors available at admission. (b). Variable importance scores for 14-day readmissions using predictors available at admission. Note: ED, emergency department; LOS, length of stay; CCI group 1–17, 17 groups by Charlson Comorbidities.

Performance of RF for predictions of 30- and 14-days readmissions

We found that RF algorithm to be capable of capturing nonlinearities and higher order interactions among variables, handling multicollinearity, and thus making better predictions for readmission when applied to large and high-dimensional datasets from EHRs in this study. Figures 5(a)–(d) and Figures 6(a)–(d) show the ROC curves of the RF and logistic regression (LG) in the eight models for the 30- and 14-day readmission predictions, respectively. Model 1 employed four variables in the LACE index, i.e., LOS, type of index admission, the Charlson comorbidity index and ER visits in the past 6 months. Model 2 added gender and age to Model 1. The full model consisted of all variables retrieved from patients’ EHRs while the final model included only predictors that were readily available at the initial stage of admission.

Figure 5.

Model comparisons for 30-day readmissions. (a). Model 1 for 30-day readmission. (b). Model 2 for 30-day readmission. (c). Full model for 30-day readmission. (d). Final model for 30-day readmission.

Figure 6.

Model comparisons for 14-day readmissions. (a). Model 1 for 14-day readmission. (b). Model 2 for 14-day readmission. (c). Full model for 14-day readmission. (d). Final model for 14-day readmission.

The RF had higher AUCs (0.77–0.75) than those of logistic regressions (0.75–0.74) in all models in predicting both 30- and 14-day readmissions. It can be seen from Figures 5(a)–(d) and Figures 6(a)–(d) that the ROC curve of the RF was entirely above that of the other regression-based models, suggesting that the RF had better prediction abilities. The DeLong’s test²³ also revealed the differences in the C-statistics between the RF and the logistic regression to be significant (p < 0.001). Note that although LOS had the highest variable importance score, models without this variable performed similarly. It is possible that the proposed RF model could reach higher sensitivity (66.62% vs. 68.38%) at its optimal threshold as determined by Youden’s index rule²⁴ without compromising the specificity (73.37% vs. 69.70%) and accuracy (73.05% vs. 69.66%) for 30- vs. 14-day readmissions, respectively. In all models, the RF proved much more informative, as evidenced by its higher positive (LR+) and negative (LR-) likelihood ratio (see Table 3 and Table 4). The bootstrap result showed that the differences in the C-statistics between 30-and 14-day readmission of the same RF approach were not significant in any of the four models.

Table 3.

Comparisons of AUROC and relevant metrics at optimal cutoff point for 30-day readmission prediction.

		AUROC	Std. Err.	95% Conf. Interval	Sensitivity, %	Specificity, %	Accuracy, %	LR+	LR−
Model 1	LG	0.7525	0.0067	0.7394–0.7657	68.21	70.26	70.16	2.29	0.45
Model 1	RF	0.7686	0.0068	0.7553–0.7820	65.25	75.37	74.89	2.65	0.46
Model 2	LG	0.7517	0.0069	0.7381–0.7653	64.42	73.32	72.89	2.27	0.46
Model 2	RF	0.7727	0.0067	0.7597–0.7858	66.54	74.53	74.15	2.61	0.45
Full model	LG	0.7665	0.0068	0.7532–0.7798	69.50%	70.30%	70.27%	2.34	0.43
Full model	RF	0.7786	0.0067	0.7656–0.7917	69.80%	72.14%	72.03%	2.51	0.42
Final model	LG	0.7545	0.007	0.7408–0.7681	66.08	73.21	72.87	2.47	0.46
Final model	RF	0.7653	0.007	0.7516–0.7790	66.62	73.37	73.05	2.50	0.46

Note: LG, logistic regression; RF, random forest.

Table 4.

Comparisons of AUROC and relevant metrics at optimal cutoff point for 14-day readmission prediction.

		AUROC	Std. Err.	95% Conf. Interval	p	Sensitivity, %	Specificity, %	Accuracy, %	LR+	LR−
Model 1	LG	0.7449	0.0082	0.7288–0.7610	0.000	67.16	69.52	69.45	2.20	0.47
Model 1	RF	0.7585	0.0085	0.7419–0.7751	0.000	65.20	74.20	73.93	2.47	0.47
Model 2	LG	0.7406	0.0087	0.7236–0.7577	0.000	62.76	72.56	72.27	2.15	0.50
Model 2	RF	0.7638	0.0082	0.7477–0.7799	0.000	66.18	73.28	73.07	2.48	0.46
Full model	LG	0.7527	0.0086	0.7360–0.7695	0.001	72.28	64.24	64.48	2.02	0.43
Full model	RF	0.7661	0.0082	0.7500–0.7822	0.001	71.06	67.57	67.67	2.19	0.43
Final model	LG	0.7420	0.0086	0.7250–0.7589	0.004	67.28	69.25	69.19	2.19	0.47
Final model	RF	0.7522	0.0088	0.7350–0.7694	0.004	68.38	69.70	69.66	2.26	0.45

Note: LG, logistic regression; RF, random forest.

Discussion

The novelty of this study is that it develops and prospectively validates predictive models using the RF for the identification of 30- and 14-day readmissions and the associated dominant risk factors at the time of admission. We focused on a subset of predictors that are readily available at the time of admission to construct models without sacrificing predictive performance. These predictors are easily accessible in most patients’ EHRs, which means that the results of this study may provide valuable insights to other medical institutions for generalization and serve as a decision support tool.

There are several key observations to be made from the results. First, the 30- and 14-day readmission rates in the current study were respectively 4.86% and 2.90% in 2018, and 4.75% and 2.95% in 2019. Nearly two-thirds of the 30-day readmissions occurred within 14 days, which is consistent with literature reporting that the likelihood of unplanned readmissions is higher in the early period of post-discharge.²⁷ In addition, the prediction performance was better for 30-day readmissions (AUC = 0.77 for RF) than for 14-day readmissions (AUC = 0.75 for RF). These findings are in agreement with previous studies which found that readmissions within a shorter period post-discharge were harder to predict because the class distribution is much more skewed with an imbalanced number of non-readmissions and unplanned readmissions.⁹

Second, ER visits in the past 6 months was an important predictor for all scenarios, i.e., for predictions using all predictors and those that are available at admission regardless of 30- or 14-day readmissions. However, the importance of other risk factors for readmissions differed between the 30- and 14-day time frames. Recently, several studies have indicated that readmissions within a shorter period of time may be more preventable than those in the later 30-day period.^{10,20,21,28,29} We found the most important predictor for 30-day readmissions to be directly related to the factors representing the index for hospitalization, i.e., the ER admissions. However, the most important predictor of 14-day readmissions was associated with a higher chronic illness burden of refractory malignancies and lower socioeconomic status such as ward type or gender, variables beyond a hospital’s control. This could be explained by the retrospective nature of the pre-existing data employed in the current study and by the fact that our results are based on data obtained from a medical center in Taiwan, where NHI reimbursement is reduced if patients are readmitted due to the same or related diagnosis within 14 days after discharge. Such financial pressure may create an incentive for hospitals to reduce preventable readmissions within 14 days. Thus, risk factors of 14-day readmission that can be attributed to quality of hospital’s care were rare in the current dataset. For example, every patient in Taiwan has a follow-up visit in an ambulatory care department within 2 weeks after discharge. The 14-day readmissions adopted by Taiwan’s MHW appears to have an effect in terms of reducing preventable readmissions related to index admissions or acute illness burden. Our findings confirm the arguments from the literature that shorter-term readmissions motivate clinicians, who generally feel more responsible for events that are related to illness acuity.³⁰

Third, in contrast to most risk prediction models derived using data available at discharge,^3,6 we have selected variables based on the purpose for which the patients with high risk of readmission can be identified earlier during their hospitalization, especially using the index diagnosis codes to classify patients’ disease status based on Charlson comorbidities index. Our RF prediction model successfully identified several useful predictors that were easily accessible at the time of admission using the patients’ EHRs. Using predictors that are readily available at admission allows more time for interventions to be started, thereby helping clinicians provide timely and appropriate treatment for their patients. Such interventions should improve the quality of healthcare within the hospital’s control. It is suggested that the main factors influencing the variability in 30-day readmission rates are hard to change by hospitals since they are considered to be an intrinsic part of a hospital’s patient population. The accessibility to healthcare system as well as patients’ socio-economics determinants of health also play a crucial role.³¹ If we are to hold hospitals responsible for readmissions and yet not to penalize those that care for more vulnerable patients, then a shorter time window may be considered to be a more equitable metric of accountability for hospitals.^30,31

The findings of the current study can potentially contribute to the development of mobile health (mHealth) clinical decision support tools for reducing readmission. The dominant risk factors identified in the current study upon admission can be directly integrated into the EHRs or be made available to the frontline clinical staff via an mHealth App to initiate just-in-time pre-discharge interventions such as patient education and medication reconciliation to achieve a better quality of care. The penetration rate for Internet and mobile phone use in Taiwan is high, so the use of mHealth Apps hold great potential for healthcare providers. These Apps can be used to enhance the control of risk factors for targeted patients, to improve patients’ medication adherence, and to encourage a better communication between patients and clinical professionals. Recently, throughout the covid-19 pandemic, people in Taiwan have become much more familiar with the mHealth technologies. Further investigation could be conducted to assess the cost and benefit of integrating our findings into mHealth App in reducing the unplanned readmissions.

There are limitations to this study that should be noted. For clinical operational purposes, our focus is on developing and implementing a real-time readmission prediction model. As a result, some popular readmission predictors, such as length-of-stay and laboratory tests, are not included in the Final model, since these variables are not available upon admission. The RF constructed in the current study using data that is readily available at admission has a marginally yet significantly better ability to identify high-risk readmissions within 30 and 14 days without penalizing the sensitivity and specificity. However, caution should be taken when interpreting the prediction performance of ML technique since its benefit may depend on factors like sample size, data dimension and disease investigated.³² It is well recognized that healthcare data are often not linear and are difficult to capture using simple algorithms, especially in view of the complexity in the patterns of readmission.³³ Thus, the improvement in prediction performance between RF and logistic regression methods in the present study, though modest, could be due to the potential greater nonlinearity and higher-order interaction between predictors and outcome variable of readmission as well as missing values. However, a few studies have reported that ML algorithms may not outperform traditional regression approaches in a low-dimensional setting for outcome prediction because it has been suggested that ML techniques perform better when a large number of predictors are being considered.³² The relatively low-dimensional setting in the current study may potentially limit the performance of RF over logistic regression. The reason that we only considered a relatively small number of predictors available upon admission was mainly to obtain a real-time assessment of high-risk readmission for clinical practice. Another reason for not including more predictors was related to the validation purpose because clinical variables often have different definitions, notations, or units, which complicate the validation procedure with a large number of predictors.³⁴ Thus, we believe that the low-dimensional setting in our study might be more clinically meaningful. Further investigation may be focused on the thresholds to decide whether an improvement in the AUC is considered as clinically important in model performance through cost–benefit analysis when applying the predictive model.

Moreover, as mentioned earlier, Taiwan’s MHW has adopted the 14-day readmission rate as a quality indicator and has tied it to the NHI reimbursements. In response to such financial penalties, hospitals might take active measures to reduce readmissions within 14 days, which may potentially lead to a selection bias in identifying the importance of risk factors for readmissions. Most retrospective studies have the same problem. Although we have explicitly validated the RF method using data from different time period, the generalizability of our findings could be limited. Further investigations are needed to see whether and how differences among hospital regulation and care practices impact external generalizability of readmission prediction models. Notwithstanding the limitations of this study, our results suggest that this real-time prediction model holds great potential without sacrificing the discriminatory capabilities and should offer some insights regarding the clinical utility of different readmission time intervals as a decision support tool.

Conclusion

It is crucial for healthcare planners to identify dominant risk factors based on index admission and different readmission time intervals because the causal pathways between earlier and later readmissions may be distinctly different. They may, therefore, require different evaluation criteria and subsequent interventions.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Science and Technology (grant numbers MOST109-2410-H-992-005, MOST 110-2410-H-992-015); and the Kaohsiung Veterans General Hospital (grant numbers KSVGH110-D07-1).

Ethical approval

This study was approved by the Institutional Review Board of Kaohsiung Veterans General Hospital (IRB No.: KSVGH20-CT4-11; KSVGH22-CT7-06).

ORCID iD

Shuofen Hsu

References

Kristensen

Bech

Quentin

. A roadmap for comparing readmission policies with application to Denmark, England, Germany and the United States. Health Policy 2015; 119(3): 264–273.

Van Walraven

Dhalla

Bell

, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. 2010; 182(6): 551–557.

Kansagara

Englander

Salanitro

, et al. Risk prediction models for hospital readmission: A systematic review. JAMA 2011; 306: 1688–1698.

Van Walraven

Wong

Forster

. LACE+ index: Extension of a validated index to predict early death or urgent readmission after hospital discharge using administrative data. Open Med 2012; 6(3): e80–e90.

Donzé

Williams

Robinson

, et al. International validity of the HOSPITAL score to predict 30-day potentially avoidable hospital readmissions. JAMA Intern Med 2016; 176(4): 496–502.

Zhou

Della

Roberts

, et al. Utility of models to predict 28-day or 30-day unplanned hospital readmissions: An updated systematic review. BMJ Open 2016; 6(6): e011060.

Chin

Bang

Manickam

, et al. Rethinking thirty-day hospital readmissions: Shorter intervals might be better indicators of quality of care. Health Aff 2016; 35(10): 1867–1875.

Graham

Dike

Doctoroff

, et al. Preventability of early vs. late readmissions in an academic medical center. PLoS One 2017; 12(6): e0178718.

Maali

Perez-Concha

Coiera

, et al. Predicting 7-day, 30-day and 60-day all-cause unplanned readmission: A case study of a Sydney hospital. BMC Med Inform Decis Mak 2018; 18(1): 1.

10.

Chang

Chen

, et al. Factors associated with early 14-day unplanned hospital readmission: A matched case–control study. BMC Health Serv Res 2021; 21: 870.

11.

Cronin

Greenwald

Crevensten

, et al. Development and implementation of a real-time 30-day readmission predictive model. AMIA Annu Symp Proc 2014; 2014: 424–431.

12.

Cai

Perez-Concha

Coiera

, et al. Real-time prediction of mortality, readmission, and length of stay using electronic health record data. J Am Med Inform Assoc 2016; 23(3): 553–561.

13.

Shadmi

Flaks-Manov

Hoshen

, et al. Predicting 30-day readmissions with preadmission electronic health record data. Med Care 2015; 53(3): 283–289.

14.

Frizzell

Liang

Schulte

, et al. Prediction of 30-day all-cause readmissions in patients hospitalized for heart failure: Comparison of machine learning and other statistical approaches. JAMA Cardiol 2017; 2(2): 204–209.

15.

Benuzillo

Caine

Evans

, et al. Predicting readmission risk shortly after admission for CABG surgery. J Card Surg. 2018; 33: 163–170.

16.

Lin

Hsu

, et al. Comparison of back-propagation neural network, LACE Index and HOSPITAL score in predicting all-cause risk of 30-day readmission. Risk Manag Healthc Policy 2021; 14: 3853–3864.

17.

Jamei

Nisnevich

Wetchler

, et al. Predicting all-cause risk of 30-day hospital readmission using artificial neural networks. PLoS One 2017; 12(7): e0181173.

18.

Bleich

Cole

Kapelner

, et al. Using random forests with asymmetric costs to predict hospital readmissions. medRxiv 2021; 2021–03, DOI: 10.1101/2021.03.15.21253416

19.

Deschepper

Eeckloo

Vogelaers

, et al. A hospital wide predictive model for unplanned readmission using hierarchical ICD data. Comput Methods Programs Biomed 2019; 173: 177–183. DOI: 10.1016/j.cmpb.2019.02.007

20.

Goldfield

McCullough

Hughes

, et al. Identifying potentially preventable readmissions. Health Care Financ Rev 2008; 30: 75–91.

21.

Graham

Wilker

Howell

, et al. Differences between early and late readmissions among patients: A cohort study. Ann Intern Med 2015; 162(11): 741–749.

22.

Breiman

Random forests. Mach Learn 2001; 45: 5–32.

23.

Schonlau

Zou

. The random forest algorithm for statistical learning. The Stata Journal 2020; 20(1): 3–29.

24.

StataCorp . Stata Statistical Software: Release. 16. College Station, TX: StataCorp LLC 2019.

25.

DeLong

Clarke-Pearson

. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988; 44(3): 837–845.

26.

Youden

. Index for rating diagnostic tests. Cancer 1950; 3(1): 32–35.

27.

Jencks

Williams

Coleman

. Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med 2009; 360(14): 1418–1428.

28.

Auerbach

Kripalani

Vasilevskis

, et al. Preventability and causes of readmissions in a national cohort of general medicine patients. JAMA Intern Med 2016; 176(4): 484–493.

29.

Graham

Auerbach

Schnipper

, et al. Preventability of early versus late hospital readmissions in a national cohort of general medicine patients. Ann Intern Med 2018; 168(11): 766–774.

30.

Joynt

Jha

. Thirty-day readmissions--truth and consequences. N Engl J Med 2012; 366(15): 1366–1369.

31.

Joynt

Jha

. A path forward on Medicare readmissions. N Engl J Med 2013; 368(13): 1175–1177.

32.

Liew

BXW

Kovacs

Rügamer

, et al. Machine learning versus logistic regression for prognostic modelling in individuals with non-specific neck pain. Eur Spine J 2022; 31(8): 2082–2091.

33.

Brunner-La Rocca

Peden

Soong

, et al. Reasons for readmission after hospital discharge in patients with chronic diseases-Information from an international dataset. PLoS One 2020; 15(6): e0233457.

34.

Gravesteijn

Nieboer

Ercole

CENTER-TBI collaborators , et al.. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J Clin Epidemiol 2020; 122: 95–107.