Abstract
Purpose
This study developed machine learning models to predict in-hospital mortality, initiation of acute renal replacement therapy, and mechanical ventilation in patients with acute heart failure receiving furosemide in intensive care units.
Method
An extensive database comprising static and dynamic features obtained from a Japanese hospital chain was used to construct and train the machine learning models.
Results
The results revealed that the proposed machine learning models predict in-hospital mortality, initiation of acute renal replacement therapy, and mechanical ventilation with good accuracy. However, the optimal models vary depending on the predicted outcomes. The linear support vector machine classification models exhibited the highest in-hospital mortality and mechanical ventilation prediction accuracy, with the area under the receiver operating characteristic curve of 0.73 and 0.73, respectively, whereas the multi-layer neural network exhibited the highest accuracy for acute renal replacement therapy initiation prediction with an area under the receiver operating characteristic curve of 0.70.
Conclusions
In conclusion, this study demonstrated that machine learning models could help predict the clinical outcomes of patients with acute heart failure receiving furosemide. However, the optimal models may differ depending on the outcome of interest.
Introduction
Acute heart failure (AHF) remains a leading global health concern, with an estimated prevalence exceeding 37.7 million individuals worldwide. 1 AHF is a heterogeneous clinical syndrome with a complex and incomplete pathophysiology. Recommended treatments for the early management of AHF, including intravenous diuretics, oxygen, and vasodilators, have been associated with better outcomes.2–4
Diuretics are essential for patients with AHF because they provide rapid symptomatic relief and control fluid retention. However, determining whether patients with AHF actually benefit from these drugs is complex in clinical practice. Relevant factors include age, ejection fraction, renal function, and disease severity. Therefore, accurate prediction of outcomes is a challenging task for physicians. If patients with AHF do not respond adequately to furosemide therapy, mechanical ventilation, and dialysis may become necessary.
Predicting outcomes and determining the risks of events help identify patients that are most likely to benefit from targeted interventions, thereby preventing future events. Recently, machine learning (ML) approaches have been applied to various aspects of medicine to predict the risks of clinical events.5,6 Electronic health records (EHRs) contain important information about individual patients and have, thus, been widely used to predict patient outcomes. They commonly contain diagnostic information, drug administration, vital signals, and details of laboratory tests. Based on learning from existing data, ML approaches have also been applied to improve the accuracy and efficiency of predictions of diagnosis, mortality, and rehospitalization in patients with HF.7–9 Establishing an accurate prediction model for patients with AHF based on EHRs may help physicians with early alerts and judgment; thus, ML techniques may play an essential role in the contemporary management of patients with AHF. 10
Several studies have been conducted on this topic based on data from a single center or a few centers. In addition, most existing studies use only time-independent (static) data, such as demographics and background disease, to analyze and predict clinical outcomes. Few studies have predicted cases of invasive treatment, such as acute hemodialysis or mechanical ventilation.
In this context, this study used ML methods to develop and validate models for predicting the need for in-hospital mortality, invasive mechanical ventilation, and acute kidney injury requiring dialysis among patients with AHF undergoing furosemide therapy in intensive care units (ICUs). This study aimed to determine the optimal prediction model for each outcome based on data, including static and dynamic features obtained from an extensive database of a Japanese hospital chain.
Method
Data source
We conducted a retrospective cohort study using data obtained from the Tokushukai Medical Database between 2013 and 2019. The Tokushukai Medical Group operates 70 hospitals and 340 facilities throughout Japan and is Japan's most prominent private medical service operator (https://www.tokushukai.or.jp/en/, Accessed May 30, 2022). For the current study, we used the data from 20 hospitals with ICUs (Supplemental Table S1), participating in the Diagnosis Procedure Combination system, retrospectively. 11 The database includes the following information: patient age, sex, vital signs, all measured in-patient laboratory values, admission and discharge dates, discharge outcome (alive or dead), primary admission diagnosis, comorbidities recorded by the attending physician at the time of discharge using International Classification of Disease of 10th revision (ICD-10) codes, comorbidities recorded by the doctor on admissions, post-admission illnesses, drugs, and hospital procedures, such as acute dialysis and ventilators.12,13 In most cases, laboratory data measurements were performed every 24 h, and additional measurements were conducted within a 24-h period if deemed necessary by the clinician.
This study was approved by the Tokushukai Group Joint Ethics Review Committee (approval number: TGE01426-024) and the Terumo Research Ethics Review Committee (processing number: CR20-R006) and was conducted in agreement with the principles of the Declaration of Helsinki. Information on this study, including its objectives, has been made publicly available.
Study participants
The study participants included adult patients (≥ 20 years of age) with AHF who underwent emergency or unscheduled admissions between 2013 and 2019 in 20 hospitals and who received (either bolus or continuous) furosemide in ICUs. We considered emergency care units (ECUs) as ICUs because, in the current Japanese system, ECUs tend to serve a similar role to ICUs, and there is no clear distinction between them. Patients with AHF were defined by DPC codes, with “heart failure” as the leading cause of hospitalization. Patients undergoing hemodialysis, who had an implanted pacemaker, underwent surgical or interventional treatment, or underwent intra-aortic balloon pumping or extracorporeal membrane oxygenation (ECMO) prior to ICU admission were excluded. Patients who exhibited acute renal replacement therapy (RRT) or died in the ICU on the same day of furosemide administration were also excluded.
In the current study, each patient was administered furosemide several times in ICUs. We regarded each furosemide administration as an independent event; therefore, multiple datasets could be generated for each patient (Supplemental Figure S1A). For all bolus and continuous administrations of furosemide, the recorded administration time was regarded as the start of administration.
Outcome variables
We evaluated (i) in-hospital mortality, (ii) initiation of acute RRT, and (iii) initiation of mechanical ventilation within 72 h of receiving furosemide. Initiation of RRT was defined as the initiation of RRT within 72 h of furosemide administration in the ICU. The initiation of mechanical ventilation was defined as the introduction of ventilators within 72 h of furosemide administration in the ICU.
Predictor variables
Thirty-four variables expected to predict the study outcomes were obtained from the database: age, sex; height; SPO2; systolic, diastolic, and mean blood pressure; respiratory rate; body temperature; pulse rate; platelets; hemoglobin; leukocytes; blood urea nitrogen; potassium; HCO3; PCO2; PO2; lactate; calcium; sodium; chlorine; troponin I; high-sensitivity troponin I; brain natriuretic peptide; creatinine; urine output amount; furosemide dosage; pulmonary sound; and cyanosis. We used two variables for systolic and diastolic blood pressure and urine output because they were recorded in the database under two different names or in different tables. For furosemide dosage, the active ingredient and volume were used as variables. The dosage at the time of administration was used as a variable for bolus administration. For continuous administration, we used the amount between the times recorded at the start and end of administration as a variable. Processed data were used for variables dependent on measurement time (e.g. blood pressure). We used the data obtained before furosemide administration in the ICU.
The variables were handled as three types of data: (i) time-series data, (ii) static data, and (iii) a combination of time-series and static data. When used as time-series data, the data were processed via extrapolation, comprising inter-fall spline interpolation and the median of five neighboring points. Thus, 24 data points were obtained for each hour following furosemide administration. Urinary volume was converted to an hourly figure. The missing values between measurements were interpolated using subsequent measured values. Extrapolation was performed as described previously.
The most recent data measured before furosemide administration were treated as static data. When using a combination of time-series and static data, variables related to vital signs, balanced outputs, and patient states were treated as time-series data. In contrast, the patient, laboratory, and medication information were treated as static data. The total number of 34 variables treated by each imputation method was 678 for the time-series data and 310 for the combination of time-series and static data.
Datasets
As mentioned above, separate datasets were created corresponding to each furosemide administration in the ICU (Supplemental Figure S1A). When explanatory variables were unavailable for 24 h, datasets whose objective variable period exceeded the ICU discharge were excluded. The datasets were divided into training datasets, used for model construction, and test datasets, used for accuracy assessment, in a ratio of 6:4 (Supplemental Figure S2B).
Oversampling was performed on the training data to adjust for an imbalance in the number of data points in the objective variable. To predict the initiation of acute RRT, data were randomly selected and removed to achieve a 1:1 ratio of induction case data to non-induction case data.
Construction of the prognostic model
A prognostic model was constructed using the multi-neural network, linear support vector machine classification (LSVC), and eXtreme gradient boosting (XGB).
Data processing and manipulation were performed using Python (version 3.8.10).
Performance evaluation
Standard performance metrics were used to evaluate the models: F1, precision, recall, receiver operating characteristic (ROC), area under the curve (AUC), and precision–recall (PR) AUC. Five different split-test datasets were used for evaluation, and the accuracy was averaged over the five cases. The Youden index was used as the threshold for each model, and the learning curve ensured that the multi-neural network was adequately trained.
For the models with the highest ROC AUC, we calculated the contributions of the top five highest predictor variables to the model using feature importance for the LSVC and XGB models and Shapley Additive explanations for multi-neural network models.
Results
Study participants
In aggregate, 1416 patients were admitted to the ICU for AHF and treated with furosemide. Among them, 446, 503, and 1056 patients were excluded because they did not satisfy the eligibility criteria for the predictive tasks of in-hospital mortality, dialysis initiation, and ventilator initiation, respectively (Figure 1). Consequently, 970, 913, and 360 patients were used for model construction and evaluation of the prediction tasks for in-hospital mortality, acute RRT initiation, and ventilator initiation, respectively, comprising 6632, 5787, and 1689 data points, respectively, starting from furosemide administration (Figure 1). Table 1 lists the patient backgrounds in the datasets used to construct and evaluate the prediction models.

Data processing and model construction procedure.
Patient characteristics.
SPO2: oxygen saturation of peripheral artery; HCO3: hydrogen carbonate; PO2: partial pressure of oxygen; PCO2: partial pressure of carbon dioxide.
Data are presented as n (%) or mean.
Prediction of in-hospital mortality
The performance evaluation of the in-hospital mortality prediction models is presented in Table 2 and Figure 2.

Final model performance. The best ROC curves of each ML model were obtained by testing various configurations in terms of feature extraction, data balancing, feature selection, and filtering for in-hospital mortality. (A) ROC curve for the crisis prediction task. (B) Precision–recall curve for the crisis prediction task with the same characteristics as (A).
Predictive performance of in-hospital mortality.
LSVC: linear support vector machine classification; XGB: eXtreme gradient boosting; ROC: receiver operating characteristic; AUC: area under the curve; PR: precision–recall.
The LSVC model with data imputed based on combining time series and static data exhibited the best ROC AUC (0.74) (Table 2). The model with data imputed from static data also exhibited an almost identical ROC AUC (0.74). The contributions of these variables to the model are discussed in Table 5. Of the 310 variables treated by combining 34 variables with time series and static data, the contributions of lactate, blood urea nitrogen, platelets, creatinine, and age to the model were the highest.
Prediction of acute RRT initiation
The performance score results for the initiation of the acute RRT prediction model are presented in Table 3 and Figure 3. The multi-neural network model with data imputed as static data exhibited the best performance with an ROC AUC of 0.70 (Table 4). This model exhibited a similar or better F1 score (0.04) and PR AUC (0.05) compared to other data imputation methods and constructed models, with recall being the only exception. The contributions of these variables to the model are described in Table 5. Of the 34 variables treated as static data, sex, furosemide dosage, amount of active ingredient, sodium, and creatinine were the highest contributors to model performance.

Final model performance. The best ROC curves for each ML model were obtained by testing various configurations in terms of feature extraction, data balancing, feature selection, and filtering for acute RRT initiation. (A) ROC curve for the crisis prediction task. (B) Precision–recall curve for the crisis prediction task with the same characteristics as (A).
Predictive performance of initiation of acute RRT.
LSVC: linear support vector machine classification; XGB: eXtreme gradient boosting; ROC: receiver operating characteristic; AUC: area under the curve; PR: precision–recall; RRT: renal replacement therapy.
Predictive performance of initiation of mechanical ventilation.
LSVC: linear support vector machine classification; XGB: eXtreme gradient boosting; ROC: receiver operating characteristic; AUC: area under the curve; PR: precision–recall.
List of contributing variables for each predictive model.
The contributions to each prediction model were calculated as follows: For in-hospital mortality prediction, the LSVC model with data imputed based on a combination of time series and static data was used. For the initiation of acute RRT prediction, a multi-neural network model with data imputed as static data was used. To predict the initiation of mechanical ventilation, the LSVC model with data imputed as time-series data was used.
LSVC: linear support vector machine classification; XGB: eXtreme gradient boosting; RRT: renal replacement therapy; SPO2: oxygen saturation of peripheral artery.
Because the incidence outcome of initiating acute RRT was more imbalanced than that of other prediction tasks, with an incidence outcome of 1.1% (Table 1), oversampling was performed to compare the accuracy of the data. F1 and precision scores were similar, recall and ROC AUC improved, and PR AUC decreased (Supplemental Table S2).
Prediction of mechanical ventilation initiation
The performance evaluation of the initiation of the mechanical ventilation prediction model is presented in Table 4 and Figure 4. The LSVC model with data imputed as time-series data exhibited the best performance, with an ROC AUC of 0.732 (Table 4). This model exhibited the highest F1 score (0.15), precision (0.09), and PR AUC (0.07). However, the model did not exhibit the highest recall (Table 4).

Final model performance. The best ROC curves for each ML model were obtained by testing various configurations in terms of feature extraction, data balancing, feature selection, and filtering for mechanical ventilation initiation. (A) ROC curve for the crisis prediction task. (B) Precision–recall curve for the crisis prediction task with the same characteristics as (A).
The contributions of these variables to the model are presented in Table 5. Of the 678 variables treated as time-series data, age, urine output, diastolic blood pressure, SPO2, furosemide amount of the active ingredient, and SPO2 contributed the most to the model.
Discussion
In this study, ML models were developed to predict in-hospital mortality, initiation of acute RRT, and mechanical ventilation using data obtained from patients with AHF undergoing furosemide therapy after ICU admission. We demonstrated that machine learning-based models could predict the outcomes accurately. However, optimal models vary depending on the outcome to be predicted. The LSVC model exhibited the highest accuracy (ROC AUC: 0.73) for in-hospital mortality prediction, the multi-layer neural network exhibited the highest accuracy (ROC AUC: 0.70) for acute RRT initiation prediction, and the LSVC model exhibited the highest accuracy (ROC AUC: 0.73) for mechanical ventilation initiation prediction.
Several studies have reported the use of ML, including deep neural networks, to predict prognoses such as mortality, cardiogenic shock, and hospitalization in patients with AHF.9,14,15 However, to the best of our knowledge, this is the first study to predict acute RRT initiation and mechanical ventilation initiation among AHF patients receiving furosemide therapy using ML models. There are many approaches to improving the accuracy of machine learning models, but even extremely accurate predictive models may not lead to better clinical care. Even if a study does not achieve the desired level of accuracy, ML models predicting clinical outcomes may potentially assist in the risk stratification of patients with AHF and aid clinical decision-making for furosemide medication. We also identified factors that contributed to the accuracy of each predictive model in order to verify their consistency with clinical practice. Predictive algorithms using machine learning methods may also contribute to stratifying patient populations by risk in order to inform discrete decision-making. 16
Thirty-four variables were considered in this study to predict the three aforementioned outcomes. These variables were treated as (i) time-series data, (ii) static data, and (iii) a combination of time-series and static data. Using the Gini index, we identified systolic blood pressure, sex, and age as the most significant predictors of in-hospital mortality, dialysis, and ventilation, respectively. These findings are consistent with recent evidence suggesting that elderly patients exhibit worse outcomes following AKI and the male sex is associated with lower survival rates. 17 The association between systolic blood pressure and in-hospital mortality also agrees with the results of previous studies. 18
A small variation was observed in the accuracy of the ROC AUCs for the three prediction tasks. This could be attributed to the variations in dataset sizes and the imbalanced distribution of outcomes within the dataset. Specifically, the number of data points used for the mechanical ventilation prediction task was the lowest, comprising just under 2000 instances. Previous studies have indicated that prediction models utilizing machine learning demonstrate improved accuracy when trained on larger datasets for ICU cases involving mortality and complications. 19 Hence, the disparity in dataset sizes could have influenced the results. Additionally, the distribution of outcomes for acute RRT and mechanical ventilation predictions was imbalanced, constituting 1.1% and 2.8% of the data, respectively. Imbalanced data are known to pose challenges for accurate classification, particularly when the number of instances is small. 20 Consequently, this factor may have impacted the overall prediction accuracy.
One study applied deep learning to structured and unstructured patient data obtained from EMR and developed a risk prediction model to predict 30-day readmission in patients with HF. 21 The developed model performed moderately well, with an AUC of 0.705, compared to previously published models. Although only structured data were used in our study, its accuracy may be improved by including both structured and unstructured data as features. Further, although other studies have been conducted to predict RRT, most of them were conducted on patients with chronic kidney disease. 22 In particular, Zhang et al. 23 analyzed 2935 patients in the MIMIC-III database and 499 in a local database to develop and validate an AKI risk prediction model. The authors used five methods to develop risk prediction models for AKI: XGBoost, adaptive boosting, random forest, logistic regression, and multi-layer perception. They demonstrated that XGBoost exhibited a higher AUROC (0.88); however, they used only the clinical factors in the first 24 h after admission. AKI and AHF are dynamic diseases that require predictive models fed by dynamic parameters of the clinical course.
This study highlighted the value of ML models for prognostic prediction. The study had the following strengths and limitations. The strengths included (i) the use of data obtained from an extensive database of 20 hospitals belonging to a major Japanese hospital chain, (ii) the use of data including both static and dynamic features, and (iii) the exploration of the contributing factors of the models to reveal important predictors of several outcomes over different periods. On the other hand, the following limitations were noted. Firstly, we used a relatively small sample size from the perspective of ML analysis. Because ML models scale very well with data, a significant improvement in diagnostic accuracy can be expected by increasing the sample size. Secondly, data replication was performed starting with furosemide administration. Because of the limited data available in the medical field, data replication is sometimes performed by treating the clinical examination as independent events. 24 However, this introduces the risk of biased data because data from the same cases may be replicated. Finally, the models used in this study may have been biased due to unclear reporting, assumptions made to manage missing values, and model overfitting.
Conclusions
We developed ML models to predict in-hospital mortality, initiation of acute RRT, and mechanical ventilation using data obtained from patients with AHF starting furosemide therapy after ICU admission. Our results suggested that ML models can help predict the clinical outcomes of patients with AHF. However, the optimal models may differ depending on the outcome of interest. Further research is required to determine the generalizability of these conclusions to other populations and settings.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231194933 - Supplemental material for Machine learning-based prognostic modeling of patients with acute heart failure receiving furosemide in intensive care units
Supplemental material, sj-docx-1-dhj-10.1177_20552076231194933 for Machine learning-based prognostic modeling of patients with acute heart failure receiving furosemide in intensive care units by Tadashi Kamio, Masaru Ikegami, Yoshihito Machida, Tomoko Uemura, Naotaka Chino and Masao Iwagami in DIGITAL HEALTH
Footnotes
Acknowledgements
Contributorship
TK was responsible for the manuscript content, including figures and tables. MI, YM, TU, and NC performed data pre-processing, feature engineering, model construction, hyperparameter tuning, and evaluation metric calculations. MI contributed to the preparation and revision of the manuscript. All the authors have read and approved the final version of the manuscript.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Code availability
Not applicable.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics approval
This study was approved by the Institutional Review Board of the Shonan Kamakura General Hospital. The hospital review committee waived the need for a review because of the anonymity of the data collected from the database and the lack of a link to individual patient information.
Funding
The authors(s) received no financial support for the research, authorship, or publication of this article.
Guarantor statement
Dr Tadashi Kamio was responsible for the content of this manuscript, including the data, analysis, and draft and final versions.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
