Abstract
Objective
Estimating the diverse symptoms of patients with advanced cancer is helpful for young physicians and medical teams in planning appropriate palliative care. We evaluated the use of medication, comorbidities, laboratory test results, and vital signs in hospitalized patients to predict death within 14 days.
Methods
We retrospectively selected hospitalized patients with advanced cancer who were admitted to the hospice ward. We are using extreme gradient boosting (XGBoost) and a combination of random forest (RF) and XGBoost (RF-XGBoost) models to analyze sixteen comorbidities, eighteen types of medications, twenty-six laboratory tests, and six vital signs. Finally, SHapley Additive exPlanations (SHAP) analysis was employed to interpret the contribution of each feature to survival prediction.
Results
Among the 2276 patients, 73% survived less than 14 days. The Area under the curve (AUC) of the XGBoost and RF-XGBoost models was 0.82 and 0.81 (
Conclusions
Our results suggest that the types of medications used by patients, especially stool softeners, antiemetics, and sedatives, are valuable in predicting survival beyond 14 days for hospitalized patients with advanced cancer. This result may assist young physicians and medical teams in developing appropriate palliative care plans for patients and their families.
Introduction
Advanced cancer patients often experience diverse symptoms with different levels of severity. 1 In the assessment of palliative care outcomes, physicians frequently employ medication to manage symptom burden. 2 Appropriate pharmacologic management constitutes a cornerstone of high-quality palliative care, and accurate symptom assessment and control may even contribute to improved patient survival. 3 Nevertheless, the classification of medications that manage different symptoms and their impact on patient survival remains unclear.
In clinical practice, to evaluate the life expectancy of hospitalized advanced cancer patients, common prognostic tools such as the Palliative Prognostic Score (Pap) 4 and the Palliative Prognostic Index (PPI) 5 require clinicians with more experience 6 to increase accuracy. 7
The accuracy of these prognostic tools was 73% in patients who survived more than 14 days, 8 even with clinicians who had extensive experience. Moreover, junior physicians often lack sufficient clinical experience in decision-making for symptom alleviation, 9 which increases the uncertainty in life expectancy evaluation. 10 However, multidisciplinary teams outperform individual clinicians in predicting survival. 11 The aim of this study was to use machine learning to assess the important factors affecting the survival of palliative care patients by medical records, including comorbidities, medication, biochemistry, and vital signs.
Machine learning models have recently been powerful tools for analysis for assisting clinicians in identifying high-risk patients, such as those with acute kidney injury with sepsis, 12 predicting the etiologic agent of chronic kidney disease, 13 diabetes, 14 and delirium in palliative care patients, 15 and implementing early interventions to reduce mortality. Furthermore, ML models have been used to predict the survival of patients with advanced cancers,16,17 such as pancreatic, 18 lung, 19 and hepatocellular carncers. 20 Compared with logistic regression analysis, Extreme Gradient Boosting (XGBoost) 12 and Random Forest (RF) 21 are common machine learning models that have better accuracy metrics. Moreover, the combination of the RF and XGBoost (RF-XGBoost) models may achieve better performance in predicting delirium in palliative care than other models. 15 However, only a limited number of studies have used machine learning to analyze poor disease prognosis predictions on the basis of prescribed medication records in hospitalized patients. 21 Therefore, our objective was to evaluate the symptom management medication for predicting 14-day survival in cancer patients in hospice wards, and whether predictive models could assist physicians in clinical settings.
Methods
Data source and study population
This study is a retrospective analysis of cancer patients who were admitted to the hospice ward for continued treatment at China Medical University Hospital between January 2003 and December 2020. The inclusion criteria for the study are:
Inclusion criteria
Disease Type: Cancer patients
Setting: Patients admitted to hospice wards
Documentation: Patients with clear death records.
Exclusion criteria
Disease Type: Noncancer patients
Symptom management in our study was based on the National Comprehensive Cancer Network (NCCN) guidelines, which underwent minimal changes between 2003 and 2020. In this study, each medication was recorded as used if administered at least once during hospitalization, irrespective of dosing frequency or survival duration. Moreover, the survival classification in this study was derived from the criteria established by Clinician Predictions of Survival (CPS) within a threshold range. 22
We collected clinical features, including sex, age, 16 comorbidities, 18 types of medications used during this hospitalization, 26 blood laboratory results, and 6 vital signs. Our original dataset contained a total of 68 fields, 32 of which have missing values in blood laboratory and vital signs. However, this study accepted data with missing values for training and testing.
Software
These experiments were performed utilizing algorithms from Python's XGBoost 2.0.2 library. This study employed two distinct machine learning models within the XGBoost framework: the XGBClassifier and the XGBRFClassifier.
Data balance
To address the class imbalance in the dataset, we employed the Synthetic Minority Over-sampling Technique (SMOTE) to combine with XGBoost. 23 The core idea of SMOTE is to generate synthetic samples by interpolating between existing minority class samples and their nearest neighbor, thereby achieving a more balanced class distribution. 24
We leveraged Cross-Validation in GridSearchCV to systematically explore and identify the optimal hyperparameter combinations for improving model performance. SMOTE and Min-Max normalization were applied ONLY to the training data within each fold of the 5-fold cross-validation,
25
NOT to the entire dataset before splitting. Final performance metrics represent the average across all 5 folds. The actual implementation followed this pipeline:
The complete dataset ( For each iteration, one fold was designated as the test set (which remained untouched, ∼623 patients), and the remaining four folds formed the training set (∼2490 patients). SMOTE was applied exclusively to the training set to balance classes. Min-Max normalization parameters (Supplementary Figure S1) were calculated from the training set only. These normalization parameters were then applied to both training and test sets. Model training and evaluation were performed.
Missing data handling
Variables with >50% missing values, we did not exclude any variableson this criterion, as all 68 variables had missing rates below 50%. For the 32 variables with missing values (out of 68 total in laboratory tests and vital signs):
We utilized XGBoost's built-in capability to handle missing values natively. XGBoost treats missing values as a separate category during tree construction and learns the optimal direction (left or right split) for missing values during training.
This approach is more appropriate than simple imputation methods because: (a) it preserves the information that values are missing, which may itself be clinically meaningful; (b) it allows the model to learn patterns associated with missingness; and (c) it avoids introducing artificial values that could bias the model.
Evaluation
All performance metrics reported are calculated exclusively on the test sets (validation folds) during the 5-fold cross-validation process, not on the training data. In each of the 5 cross-validation iterations: Training set (80% of data, ∼2490 patients) was used only for model training and hyperparameter tuning. Test set (20% of data, ∼623 patients) was used only for performance evaluation. Performance metrics (receiver operating characteristic curve, sensitivity, specificity, accuracy) were calculated on the test set only.
Model explanation
To explore the impact of each feature on the machine learning model, SHapley Additive exPlanations (SHAP) analysis was used to enhance model interpretability by quantifying the contribution of each feature to survival prediction.26,27
Statistical analysis
A general linear model was used to compare group differences. Min–max normalization was applied to continuous variables. Student's
Results
In total, 3113 patients (57% men) met the inclusion criteria for cancer diagnosis during hospitalization. GridSearchCV was then applied to reduce dataset bias, and the SHAP-XGBoost model was used to analyze the clinical characteristics including comorbidities, medications, biochemistry, and vital signs (Figure 1). Table 1 lists the clinical characteristics of patients stratified on the basis of their 14-day survival. In total, 2276 patients (73.11%) were categorized into the less than 14-day survival group, whereas 837 patients (26.89%) were classified into the other groups. Compared with the other groups, the group of patients who survived for less than 14 days had significantly greater comorbidity, including cancers, dementia, moderate to severe renal disease, mild liver disease, and liver disease, as well as a higher frequency of medication except for antispasmodic drugs (2.9% versus 8.2%).

Flow chart of study subjects.
Frequency of physical comorbidities and medications in patients.
(% of all patients within the group).
*Student's
Moderate or strong opioid analgesics: fentanyl (patch), meperidine, morphine, and nalbuphine; Weak opioid analgesics: codeine, tramadol, and tramadol/acetaminophen; Non-opioid analgesics: acetaminophen, diclofenac, and ketorolac; Stool softeners: bisacodyl, lactulose, glycerin oil, and senna glycosides; Sedative: alprazolam, estazolam, haloperidol, lorazepam, midazolam, prochlorperazine, and oxazepam; Diuretics: furosemide, and spironolactone; Antibiotics: ampicillin/clavulanic acid, cefazolin, cefepime, flomoxef, gentamicin, metronidazole, and piperacillin/tazobactam; Acid suppressing: esomeprazole, omeprazole, pantoprazole and ranitidine; Antiemetics: metoclopramide,; Steroids: dexamethasone and hydrocortisone; Antiflatulents: simethicone; Antispasmodics: butylscopolamine; Antihistamine: diphenhydramine; Bronchodilator: salbutamol.
The laboratory test results and vital signs of the patients at admission are shown in Table 2. The ≤14-day survival group had higher values of AST (136.1 IU/L), ALT (67.1 IU/L), total bilirubin (4.7 mg/dL), C-reactive protein (12.2 mg/dL), blood urea nitrogen (42.7 mg/dL) and serum creatinine (1.8 mg/dL), compared with the >14-day survival group (63.4 IU/L, 38.9 IU/L, 2.0 mg/dL, 11.0 mg/dL, 24.5 mg/dL and 1.1 mg/dL). No significant differences were observed in the albumin or estimated GFR between the two groups. Moreover, the vital signs of pulse rate (103.6 bpm) and respiration (20.6 bpm) in the ≤ 14-day survival group were higher than the other group (99.6 bpm and 19.5 bpm, respectively).
Laboratory results and vital signs of patients.
*Student's
ALT, alanine aminotransferase; AST, aspartate aminotransferase; DBP, diastolic blood pressure; GFR, glomerular filtration rate; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; RDW, red cell distribution width; SBP, systolic blood pressure; SpO2, peripheral oxygen saturation.
Figure 2 shows survival prediction curves of the XGBoost and RF-XGBoost models. The areas under the curve (AUCs) of the XGBoost and RF-XGBoost models were 0.82 (95% CI: 0.78–0.84) and 0.81 (95% CI: 0.77–0.84) (

Feature importance for palliative care survival based on SHAP values. The mean absolute SHAP values show the top 10 feature importance for (A) XGBoost and (B) RF-XGBoost models. The summary plot shows the relationship between a characteristic and survival outcome for (C) XGBoost and (D) RF-XGBoost models. Positive SHAP values are indicative of positive correlation with survival (red color), while negative SHAP values are indicative of negative correlation (purple color).

Calibration curve.

Decision curves.
Comparison of the accuracy, sensitivity and specificity of the XGBoost and RF-XGBoost models (K-fold cross validation) for the 14-day survival of palliative care patients.
AUC, area under the curve; 95% CI, 95% confidence interval.
Mean values representing the average of the 5 test set performances, and 95% CI reflecting variability across the 5 test folds.
Among the top 10 most important feature identified by both machine learning models in SHAP analysis, 7 were related to medication use and 3 to laboratory test results. In the SHAP-XGBoost model analysis, the top 3 features ranked by their mean SHAP values, were stool softener medication (0.32), antiemetic medication (0.25) and sedative medication (0.13) (Figure 2A). Similarly, the results of RF with SHAP-XGBoost model analysis were 0.0065, 0.0054 and 0.0032, respectively (Figure 2B). Patients who received medication generally have a strong positive correlation with survival beyond 14 days, although some outliers show a negative trend (Figure 2C and D).
Discussion
In this study, we sought to compare the relationships between survival extension and four categories of medical records (namely comorbidity, medication, biochemistry and vital signs) using machine learning applied to palliative care cancer patients. Our results from the two machine learning models revealed that XGBoost provided the better performance. Evaluation of the calibration and decision curves revealed that the XGBoost model exhibited a better fit and greater net benefit compared to the RF-XGBoost model. However, these findings differed from some previous reports. The discrepancy may be attributed to the fact that prior studies considered no more than three categories, thus limiting comparability.15,27 However, after explaining the feature importance of the two models using SHAP analysis, our results show that both models have similar results in terms of the top ten most important features, including seven features related to medication and three features related to biochemistry. Moreover, the same three features—use of stool softener, antiemetic, and sedatives – ranked highest in both models. Survival beyond 14 days was positively associated with seven of the top ten features with the highest impact factors. This study revealed that the three key medications used for symptom management during hospitalization were linked to improved survival outcomes. Patients treated with stool softeners, antiemetics, and sedatives tended to survive longer than 14 days.
In this study, 87–93% of patients had taken moderate or strong opioid medication for pain relief, which is consistent with the literature. 28 However, the evidence surrounding the use of opioids for pain control is disappointingly limited. 29 In our study, 36–61% of patients took non-opioid analgesics for adjuvant analgesics in addition to opioids. Among patients who survived more than 14 days, 61% received adjuvant analgesics, whereas only 31% of those who survived less than 14 days did. Moreover, non-opioid analgesics were among the top seven predictive factors associated with patient survival (ranking 7th in XGBoost and 5th in RF-XGBoost). In other clinical experiences, over 80% of patients who survived more than 90 days were treated for cancer pain with adjuvant agents. 30 Therefore, future survival evaluations in patients with pain should consider the potential impact of adjuvant analgesics.
There was a positive correlation between the use of stool softeners and antiemetic medications and survival longer than 14 days. In our hospice ward, depending on the patient's condition, different formulations of laxatives, such as oral medications or suppositories, are administered. The recommended management of constipation in end-of-life care involves both prophylactic and therapeutic use of stimulant laxatives, which are typically initiated concurrently with opioid therapy, given that constipation is a predictable adverse effect in this population. These findings are consistent with previous studies, such as those by Ostan et al.,31,32 which showed that stool softeners and antiemetic treatments contribute to both improved quality of life and survival benefits during opioid therapy and chemotherapy. However, when patients have less than two weeks of life, a significant decrease in symptoms such as constipation, nausea, and vomiting is observed, 32 and consequently, the use of related medications declines accordingly. This likely reflects decreased energy, alertness, and communication ability among terminally ill patients, making symptom expression and response less frequent.33,34 Based on the finding of our study and with the aim of improving quality of life, further research is warranted given the limited direct evidence.
Sedatives are commonly prescribed to manage symptoms such as anxiety, 35 restlessness, and delirium in palliative care patients. 36 There is evidence that the survival times of sedated and non-sedated patients do not differ significantly.36,37 Sedative use ranked as the third strongest positive correlate of survival longer than 14 days in our study. This may be related to the variability in the state of consciousness among patients undergoing sedation. 38 The differences in palliative sedation may be attributed to patients receiving intermittent or mild sedation, 39 which can relieve symptoms without impairing the patient's ability to interact or express their subjective assessment 40 while remaining conscious. 41 In contrast, patients incapable of responding had a median survival times of less than 7 days.34,42 The relationship among sedation dose, patient consciousness, and survival duration may warrant further investigation.
This study also has some limitations. First, data on medication dosage and frequency were inadequate in this study. Although we focused on analyzing patients who used or did not use different categories of medications rather than their clinical symptoms at that time. When medication data encompass the entire hospitalization period, patients with longer lengths of stay inherently have greater exposure opportunities, such that the duration of the data collection period itself influences medication use. Consequently, medications may be overestimated as predictors of survival. Second, temporal bias is a potential limitation. Even though we followed the NCCN guidelines for symptom management and the primary treatment patterns remained relatively stable at our institution, palliative care practices and medication availability may have changed over the 18-year study period. 43 Third, a prognostic model needs external validation to ensure its clinical utility. 44 External validation, often performed through multi-institutional studies, is crucial to enhance generalizability and reduce the risk of overfitting and systematic bias. 45 However, as this study was limited to single-center data, future research requires multi-center external validation to increase model robustness. Finally, although medication records were included in this study, these conventional prognostic tools were not included 10 : These conventional indicators include the Palliative Performance Scale (PPS) for assessing functional status, the PPI, based on PPS, oral intake, edema, and delirium, and the Laboratory Prognostic Score based on biochemistry. 46 Therefore, future studies should include these conventional metrics to allow for a more comparative validation.
Conclusion
Our analysis revealed that incorporating medication use, especially stool softeners, antiemetics, and sedatives, into predictive survival models for advanced cancer patients significantly improves prognostic accuracy. Supported by the net benefit observed in the decision curve analysis, this validated model can assist junior physicians and medical teams in developing appropriate palliative care plans for patients and their families.
Supplemental Material
sj-pptx-1-dhj-10.1177_20552076261419945 - Supplemental material for Evaluation of symptom-management medications for predicting short-term survival in advanced cancer patients with machine learning
Supplemental material, sj-pptx-1-dhj-10.1177_20552076261419945 for Evaluation of symptom-management medications for predicting short-term survival in advanced cancer patients with machine learning by Hua-Shui Hsu, Chia-Hung Kao, Shih-Sheng Chang, Kuo-Chen Wu, Po-Tsung Huang, Shen-Ju Tsai, Ya-Zhu Tang and Wen-Yuan Lin in DIGITAL HEALTH
Footnotes
Acknowledgements
The authors gratefully acknowledge Yi-Chun Yeh and Tai-Hsien Wu, PhD for their assistance with the data management, providing programming consultation, and result analysis.
Ethics approval
This study follows the TRIPOD + AI Statement, and was approved by the Research Ethics Committee of China Medical University and Hospital (Protocol No./CMUH REC No.: 1110105/CMUH111-REC3-019).
Author contributorship
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded primarily by China Medical University (CMU103-S-10).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets generated and analyzed during the present study are available from the corresponding author upon reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
